Model Card for mbart-french-fon-opus
This model is a fine-tuned mBart model for French to Fon translation, trained on the comprehensive OPUS dataset. It represents the first large-scale neural machine translation model for the French-Fon language pair.
Model Details
Model Description
This model enables translation from French to Fon (a language primarily spoken in Benin). It was fine-tuned from the multilingual mBart-50 model using over 586,000 parallel sentence pairs from multiple OPUS corpora, making it the largest dataset used for French-Fon translation to date.
- Developed by: Nazif Toure
- Model type: Sequence-to-sequence transformer (mBart)
- Language(s) (NLP): French (fr), Fon (fon)
- License: MIT
- Finetuned from model: facebook/mbart-large-50-many-to-many-mmt
Model Sources
Uses
Direct Use
This model is intended for direct French-to-Fon translation tasks, including:
- Document translation
- Educational materials localization
- Digital content accessibility for Fon speakers
- Research in African language NLP
Downstream Use
The model can be integrated into:
- Translation services and applications
- Multilingual chatbots and virtual assistants
- Language learning platforms
- Cross-cultural communication tools
Out-of-Scope Use
This model is not suitable for:
- Fon-to-French translation (unidirectional model)
- Real-time simultaneous interpretation
- Translation of highly specialized technical domains not represented in training data
- Generation of creative content in Fon
Bias, Risks, and Limitations
The model may exhibit biases present in the training data, which primarily consists of religious texts (JW materials) and web-scraped content. Performance may be limited on:
- Contemporary slang or very recent terminology
- Highly specialized technical vocabulary
- Regional dialects of Fon not well-represented in training data
Recommendations
Users should be aware that translation quality may vary depending on text domain and should validate outputs, especially for official or sensitive communications.
How to Get Started with the Model
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
# Load model and tokenizer
model = MBartForConditionalGeneration.from_pretrained("NazifToure/mbart-french-fon-opus")
tokenizer = MBart50TokenizerFast.from_pretrained("NazifToure/mbart-french-fon-opus")
def translate_fr_to_fon(text):
# Tokenize input
inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding=True)
# Get Fon language code
forced_bos = tokenizer.lang_code_to_id.get("fon_XX", None)
# Generate translation
outputs = model.generate(
**inputs,
forced_bos_token_id=forced_bos,
max_length=128,
num_beams=5,
early_stopping=True
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
french_text = "Bonjour, comment allez-vous ?"
fon_translation = translate_fr_to_fon(french_text)
print(f"French: {french_text}")
print(f"Fon: {fon_translation}")
- Downloads last month
- 23