Add exported onnx model 'model_qint8_avx512_vnni.onnx'

#30

by tomaarsen HF staff - opened 4 days ago

base: refs/heads/main

←

from: refs/pr/30

Discussion Files changed

-0

tomaarsen

4 days ago

⚙️

sentence-transformers/backend-export

Hello!

This pull request has been automatically generated from the export_dynamic_quantized_onnx_model function from the Sentence Transformers library.

Config

QuantizationConfig(
    is_static=False,
    format=<QuantFormat.QOperator: 0>,
    mode=<QuantizationMode.IntegerOps: 0>,
    activations_dtype=<QuantType.QUInt8: 1>,
    activations_symmetric=False,
    weights_dtype=<QuantType.QInt8: 0>,
    weights_symmetric=True,
    per_channel=True,
    reduce_range=False,
    nodes_to_quantize=[],
    nodes_to_exclude=[],
    operators_to_quantize=['Conv',
    'MatMul',
    'Attention',
    'LSTM',
    'Gather',
    'Transpose',
    'EmbedLayerNormalization'],
    qdq_add_pair_to_weight=False,
    qdq_dedicated_pair=False,
    qdq_op_type_per_channel_support_to_axis={'MatMul': 1}
)

Tip:

Consider testing this pull request before merging by loading the model from this PR with the revision argument:

from sentence_transformers import SentenceTransformer

# TODO: Fill in the PR number
pr_number = 2
model = SentenceTransformer(
    "shibing624/text2vec-base-chinese",
    revision=f"refs/pr/{pr_number}",
    backend="onnx",
    model_kwargs={"file_name": "model_qint8_avx512_vnni.onnx"},
)

# Verify that everything works as expected
embeddings = model.encode(["The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium."])
print(embeddings.shape)

similarities = model.similarity(embeddings, embeddings)
print(similarities)

Add exported onnx model 'model_qint8_avx512_vnni.onnx'c309d749

shibing624

Owner 4 days ago

效果损失太大，emb模型不建议量化。Quantization is not recommended for embedding models as it results in significant performance degradation.

shibing624 changed pull request status to closed 4 days ago

tomaarsen

4 days ago

Hello @shibing624 !

I ran some tests to verify this, and you're right that it results in a performance degradation. Here are my findings from all of the PRs that I opened:

ID	Model	ATEC	BQ	LCQMC	PAWSX	STSB
0	shibing624/text2vec-base-chinese (fp32, baseline)	0.31928	0.42672	0.70157	0.17214	0.79296
1	shibing624/text2vec-base-chinese (onnx-O4, #29)	0.31928	0.42672	0.70157	0.17214	0.79296
2	shibing624/text2vec-base-chinese (onnx-qint8, #30)	0.29752 (-6.81%)	0.42642 (-0.070%)	0.68601 (-2.21%)	0.16428 (-4.57%)	0.79153 (-0.18%)
3	shibing624/text2vec-base-chinese (ov, #27)	0.31928	0.42672	0.70157	0.17214	0.79296
4	shibing624/text2vec-base-chinese (ov-qint8, #28)	0.28376 (-11.13%)	0.40071 (-6.10%)	0.68363 (-2.55%)	0.16537 (-3.93%)	0.78810 (-0.61%)
5	shibing624/text2vec-base-chinese (ov-qint8-zh, new)	0.30778 (-3.60%)	0.43474 (+1.88%)	0.69620 (-0.77%)	0.16662 (-3.20%)	0.79396 (+0.13%)

In short:

✅ ONNX Optimized to O4 does not reduce performance, but gives a ~2x speedup on GPU.
🟡 int8 quantization with ONNX incurs a ~4% performance hit, but results in ~2.3x speedup on CPU.
✅ OpenVINO does not reduce performance, but gives a 1.12x speedup on CPU.
❌ int8 quantization with OV incurs a sizable performance hit because I did the quantization incorrectly, i.e. with English texts. Apologies for that.
🟡 int8 quantization with OV incurs a small performance hit on some tasks, and a tiny performance gain on others, when quantizing with Chinese STSB. Additionally, it results in a 4.78x speedup on CPU.

If you're interested, perhaps you can consider reopening/merging #29 (ONNX-O4) and #27 (OpenVINO).
And additionally, the int8 quantization options are interesting in my opinion, especially Static int8 quantization using OpenVINO with Chinese texts, but it's up to you if you're interested in having those.

Tom Aarsen

shibing624

Owner 4 days ago

merged #29 (ONNX-O4) and #27 (OpenVINO)

shibing624 changed pull request status to open 4 days ago

shibing624

Owner 4 days ago

merged shibing624/text2vec-base-chinese (ov-qint8-zh, new)

shibing624 changed pull request status to merged 4 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment