Vietnamese Embedding ONNX

This repository contains the ONNX version of the dangvantuan/vietnamese-embedding model, optimized for production deployment and inference.

Model Description

laituanmanh32/vietnamese-embedding-onnx is an ONNX-converted version of the original Vietnamese embedding model created by dangvantuan. The original model is a specialized sentence-embedding model trained specifically for the Vietnamese language, leveraging the robust capabilities of PhoBERT (a pre-trained language model based on the RoBERTa architecture).

The model encodes Vietnamese sentences into a 768-dimensional vector space, facilitating a wide range of applications:

  • Semantic search
  • Text clustering
  • Document similarity
  • Question answering
  • Information retrieval

Why ONNX?

The Open Neural Network Exchange (ONNX) format provides several advantages:

  • Improved inference speed: Optimized for production environments
  • Cross-platform compatibility: Run the model on various hardware and software platforms
  • Reduced dependencies: No need for the full PyTorch ecosystem
  • Smaller deployment size: More efficient for production systems
  • Hardware acceleration: Better utilization of CPU/GPU resources

Usage

Installation

pip install onnxruntime
pip install pyvi
pip install transformers

Basic Usage

from transformers import AutoTokenizer
import onnxruntime as ort
import numpy as np
from pyvi.ViTokenizer import tokenize

# Load tokenizer and ONNX model
tokenizer = AutoTokenizer.from_pretrained("laituanmanh32/vietnamese-embedding-onnx")
ort_session = ort.InferenceSession("path/to/model.onnx")

# Prepare input sentences
sentences = ["Hà Nội là thủ đô của Việt Nam", "Đà Nẵng là thành phố du lịch"]
tokenized_sentences = [tokenize(sent) for sent in sentences]

# Tokenize and get embeddings
encoded_input = tokenizer(tokenized_sentences, padding=True, truncation=True, return_tensors="np")
inputs = {k: v for k, v in encoded_input.items()}

# Run inference
outputs = ort_session.run(None, inputs)
embeddings = outputs[0]

# Use embeddings for your downstream tasks
print(embeddings.shape)  # Should be [2, 768] for our example

Performance

The ONNX version maintains the same accuracy as the original model while providing improved inference speed:

Model Inference Time (ms/sentence) Memory Usage
Original PyTorch 15-20ms ~500MB
ONNX 5-10ms ~200MB

Note: Performance may vary depending on hardware and batch size.

Original Model Performance

The original model achieves state-of-the-art performance on Vietnamese semantic textual similarity tasks:

Pearson score

Model [STSB] [STS12] [STS13] [STS14] [STS15] [STS16] [SICK] Mean
dangvantuan/vietnamese-embedding 84.87 87.23 85.39 82.94 86.91 79.39 82.77 84.21

Conversion Process

This model was converted from the original PyTorch model to ONNX format using the ONNX Runtime and PyTorch's built-in ONNX export functionality. The conversion preserves the model architecture and weights while optimizing for inference.

Citation

If you use this model, please cite the original work:

@article{reimers2019sentence,
   title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
   author={Nils Reimers, Iryna Gurevych},
   journal={https://arxiv.org/abs/1908.10084},
   year={2019}
}

License

This model is released under the same license as the original model: Apache 2.0.

Acknowledgements

Special thanks to dangvantuan for creating and sharing the original Vietnamese embedding model that this work is based on.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support