gte-multilingual-reranker-base-onnx-op14-opt-gpu-int8-quantized
This model is a quantized ONNX version of Alibaba-NLP/gte-multilingual-reranker-base using ONNX opset 14.
Model Details
- Quantization Type: INT8
- ONNX Opset: 14
- Task: text-classification
- Target Device: GPU
- Optimized: Yes
- Framework: ONNX Runtime
- Original Model: Alibaba-NLP/gte-multilingual-reranker-base
- Quantized On: 2025-03-27
Environment and Package Versions
Package | Version |
---|---|
transformers | 4.48.3 |
optimum | 1.24.0 |
onnx | 1.17.0 |
onnxruntime | 1.21.0 |
torch | 2.5.1 |
numpy | 1.26.4 |
huggingface_hub | 0.28.1 |
python | 3.12.9 |
system | Darwin 24.3.0 |
Applied Optimizations
Optimization | Setting |
---|---|
Graph Optimization Level | Extended |
Optimize for GPU | Yes |
Use FP16 | No |
Transformers Specific Optimizations Enabled | Yes |
Gelu Fusion Enabled | Yes |
Layer Norm Fusion Enabled | Yes |
Attention Fusion Enabled | Yes |
Skip Layer Norm Fusion Enabled | Yes |
Gelu Approximation Enabled | Yes |
Usage
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
# Load model and tokenizer
model = ORTModelForSequenceClassification.from_pretrained("quantized_model")
tokenizer = AutoTokenizer.from_pretrained("quantized_model")
# Prepare input
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt")
# Run inference
outputs = model(**inputs)
Quantization Process
This model was quantized using ONNX Runtime with int8 quantization. The quantization was performed using the Optimum library from Hugging Face with opset 14. Graph optimization was applied during export, targeting GPU devices.
Performance Comparison
Quantized models generally offer better inference speed with a slight trade-off in accuracy. This INT8 quantized model should provide significantly faster inference than the original model.
- Downloads last month
- 95
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.