NazifToure
/

mbart-french-fon-opus

+---
+language:
+- fr
+- fon
+license: mit
+tags:
+- translation
+- mbart
+- french-fon
+- opus
+- african-languages
+pipeline_tag: translation
+---
+# Model Card for mbart-french-fon-opus
+This model is a fine-tuned mBart model for French to Fon translation, trained on the comprehensive OPUS dataset. It represents the first large-scale neural machine translation model for the French-Fon language pair.
+## Model Details
+### Model Description
+This model enables translation from French to Fon (a language primarily spoken in Benin). It was fine-tuned from the multilingual mBart-50 model using over 586,000 parallel sentence pairs from multiple OPUS corpora, making it the largest dataset used for French-Fon translation to date.
+- **Developed by:** Nazif Toure
+- **Model type:** Sequence-to-sequence transformer (mBart)
+- **Language(s) (NLP):** French (fr), Fon (fon)
+- **License:** MIT
+- **Finetuned from model:** facebook/mbart-large-50-many-to-many-mmt
+### Model Sources
+- **Repository:** https://huggingface.co/NazifToure/mbart-french-fon-opus
+## Uses
+### Direct Use
+This model is intended for direct French-to-Fon translation tasks, including:
+- Document translation
+- Educational materials localization
+- Digital content accessibility for Fon speakers
+- Research in African language NLP
+### Downstream Use
+The model can be integrated into:
+- Translation services and applications
+- Multilingual chatbots and virtual assistants
+- Language learning platforms
+- Cross-cultural communication tools
+### Out-of-Scope Use
+This model is not suitable for:
+- Fon-to-French translation (unidirectional model)
+- Real-time simultaneous interpretation
+- Translation of highly specialized technical domains not represented in training data
+- Generation of creative content in Fon
+## Bias, Risks, and Limitations
+The model may exhibit biases present in the training data, which primarily consists of religious texts (JW materials) and web-scraped content. Performance may be limited on:
+- Contemporary slang or very recent terminology
+- Highly specialized technical vocabulary
+- Regional dialects of Fon not well-represented in training data
+### Recommendations
+Users should be aware that translation quality may vary depending on text domain and should validate outputs, especially for official or sensitive communications.
+## How to Get Started with the Model
+```python
+from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
+# Load model and tokenizer
+model = MBartForConditionalGeneration.from_pretrained("NazifToure/mbart-french-fon-opus")
+tokenizer = MBart50TokenizerFast.from_pretrained("NazifToure/mbart-french-fon-opus")
+def translate_fr_to_fon(text):
+    # Tokenize input
+    inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding=True)
+    # Get Fon language code
+    forced_bos = tokenizer.lang_code_to_id.get("fon_XX", None)
+    # Generate translation
+    outputs = model.generate(
+        **inputs,
+        forced_bos_token_id=forced_bos,
+        max_length=128,
+        num_beams=5,
+        early_stopping=True
+    )
+    return tokenizer.decode(outputs[0], skip_special_tokens=True)
+# Example usage
+french_text = "Bonjour, comment allez-vous ?"
+fon_translation = translate_fr_to_fon(french_text)
+print(f"French: {french_text}")
+print(f"Fon: {fon_translation}")