BLASER QE (Ported)
This is a ported version of the BLASER quality estimation (QE) model originally developed in BLASER: Bilingual Language-Agnostic Sentence Representations.
- Ported to Hugging Face Transformers: no dependency on Fairseq.
- Uses embeddings from the ported SONAR 200 multilingual text encoder (cointegrated/SONAR_200_text_encoder).
- Supports the same 202 languages as SONAR / NLLB-200.
- Outputs BLASER scores on a 1–5 scale for a source–MT sentence pair.
⚠️ This is not the original implementation. Attribution goes to the original BLASER authors.
How to compute QE scores
# !pip install transformers sentencepiece torch -q
import torch
from transformers import AutoTokenizer, AutoModel
from transformers.models.m2m_100.modeling_m2m_100 import M2M100Encoder
# 1. Load SONAR encoder
sonar_model_name = "cointegrated/SONAR_200_text_encoder"
encoder = M2M100Encoder.from_pretrained(sonar_model_name)
tokenizer = AutoTokenizer.from_pretrained(sonar_model_name)
def encode_mean_pool(texts, tokenizer, encoder, lang='eng_Latn', norm=False):
tokenizer.src_lang = lang
with torch.inference_mode():
batch = tokenizer(texts, return_tensors='pt', padding=True)
seq_embs = encoder(**batch).last_hidden_state
mask = batch.attention_mask
mean_emb = (seq_embs * mask.unsqueeze(-1)).sum(1) / mask.unsqueeze(-1).sum(1)
if norm:
mean_emb = torch.nn.functional.normalize(mean_emb)
return mean_emb
# Example sentences
src_sentences = ["Le chat s'assit sur le tapis."]
mt_sentences = ["The cat sat down on the carpet."] # Example MT output
# Encode source and MT sentences
src_embs = encode_mean_pool(src_sentences, tokenizer, encoder, lang="fra_Latn")
mt_embs = encode_mean_pool(mt_sentences, tokenizer, encoder, lang="eng_Latn")
# 2. Load BLASER QE model (ported)
qe_model_name = "oist/blaser-2.0-qe-ported"
qe_model = AutoModel.from_pretrained(qe_model_name, trust_remote_code=True)
qe_model.eval() # set to evaluation mode
# 3. Compute QE scores
with torch.inference_mode():
qe_scores = qe_model(src_embs, mt_embs) # expects source and MT embeddings
print("Blaser score shape:", qe_scores.shape)
print("Blaser scores:", qe_scores[0])
- Downloads last month
- 59