Vietnamese Embedding ONNX
This repository contains the ONNX version of the dangvantuan/vietnamese-embedding model, optimized for production deployment and inference.
Model Description
laituanmanh32/vietnamese-embedding-onnx
is an ONNX-converted version of the original Vietnamese embedding model created by dangvantuan. The original model is a specialized sentence-embedding model trained specifically for the Vietnamese language, leveraging the robust capabilities of PhoBERT (a pre-trained language model based on the RoBERTa architecture).
The model encodes Vietnamese sentences into a 768-dimensional vector space, facilitating a wide range of applications:
- Semantic search
- Text clustering
- Document similarity
- Question answering
- Information retrieval
Why ONNX?
The Open Neural Network Exchange (ONNX) format provides several advantages:
- Improved inference speed: Optimized for production environments
- Cross-platform compatibility: Run the model on various hardware and software platforms
- Reduced dependencies: No need for the full PyTorch ecosystem
- Smaller deployment size: More efficient for production systems
- Hardware acceleration: Better utilization of CPU/GPU resources
Usage
Installation
pip install onnxruntime
pip install pyvi
pip install transformers
Basic Usage
from transformers import AutoTokenizer
import onnxruntime as ort
import numpy as np
from pyvi.ViTokenizer import tokenize
# Load tokenizer and ONNX model
tokenizer = AutoTokenizer.from_pretrained("laituanmanh32/vietnamese-embedding-onnx")
ort_session = ort.InferenceSession("path/to/model.onnx")
# Prepare input sentences
sentences = ["Hà Nội là thủ đô của Việt Nam", "Đà Nẵng là thành phố du lịch"]
tokenized_sentences = [tokenize(sent) for sent in sentences]
# Tokenize and get embeddings
encoded_input = tokenizer(tokenized_sentences, padding=True, truncation=True, return_tensors="np")
inputs = {k: v for k, v in encoded_input.items()}
# Run inference
outputs = ort_session.run(None, inputs)
embeddings = outputs[0]
# Use embeddings for your downstream tasks
print(embeddings.shape) # Should be [2, 768] for our example
Performance
The ONNX version maintains the same accuracy as the original model while providing improved inference speed:
Model | Inference Time (ms/sentence) | Memory Usage |
---|---|---|
Original PyTorch | 15-20ms | ~500MB |
ONNX | 5-10ms | ~200MB |
Note: Performance may vary depending on hardware and batch size.
Original Model Performance
The original model achieves state-of-the-art performance on Vietnamese semantic textual similarity tasks:
Pearson score
Model | [STSB] | [STS12] | [STS13] | [STS14] | [STS15] | [STS16] | [SICK] | Mean |
---|---|---|---|---|---|---|---|---|
dangvantuan/vietnamese-embedding | 84.87 | 87.23 | 85.39 | 82.94 | 86.91 | 79.39 | 82.77 | 84.21 |
Conversion Process
This model was converted from the original PyTorch model to ONNX format using the ONNX Runtime and PyTorch's built-in ONNX export functionality. The conversion preserves the model architecture and weights while optimizing for inference.
Citation
If you use this model, please cite the original work:
@article{reimers2019sentence,
title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
author={Nils Reimers, Iryna Gurevych},
journal={https://arxiv.org/abs/1908.10084},
year={2019}
}
License
This model is released under the same license as the original model: Apache 2.0.
Acknowledgements
Special thanks to dangvantuan for creating and sharing the original Vietnamese embedding model that this work is based on.
- Downloads last month
- 6