This model provides a Tigre–English quality checker built on a fine-tuned SONAR encoder. It produces embeddings for both Tigre and English text and scores their similarity with cosine distance. The result is a fast, lightweight tool for filtering parallel data, validating translations, and supporting Tigre–English NLP workflows.
pip install transformers torch
<pre>
```python
from transformers import AutoTokenizer, M2M100ForConditionalGeneration
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
# load your Tigre-trained encoder
model_id = "BeitTigreAI/tigre-sonar-encoder"
seq2seq = M2M100ForConditionalGeneration.from_pretrained(model_id)
encoder = seq2seq.get_encoder().to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)
@torch.inference_mode()
def embed(texts, lang):
tokenizer.src_lang = lang
batch = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
out = encoder(**batch, return_dict=True)
mask = batch["attention_mask"].unsqueeze(-1).float()
pooled = (out.last_hidden_state * mask).sum(dim=1) / mask.sum(dim=1).clamp_min(1.0)
return torch.nn.functional.normalize(pooled, p=2, dim=1)
def score_pair(tig, eng):
t = embed([tig], "tig_Ethi")
e = embed([eng], "eng_Latn")
sim = float((t*e).sum())
return round(sim*100, 1)
print(score_pair("እት እድንየ እግል ትርኤ ተሐዜዮ ተቅዪር ግበእ", "Be the change that you wish to see in the world"))
print(score_pair("ክል ዶል ኢገብእ መስል እስከ ይከለስ", "It always seems impossible until it's done"))
- Downloads last month
- 47
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for BeitTigreAI/tigre-sonar-encoder
Base model
facebook/SONAR