Model Card for mbart-french-fon-opus

This model is a fine-tuned mBart model for French to Fon translation, trained on the comprehensive OPUS dataset. It represents the first large-scale neural machine translation model for the French-Fon language pair.

Model Details

Model Description

This model enables translation from French to Fon (a language primarily spoken in Benin). It was fine-tuned from the multilingual mBart-50 model using over 586,000 parallel sentence pairs from multiple OPUS corpora, making it the largest dataset used for French-Fon translation to date.

  • Developed by: Nazif Toure
  • Model type: Sequence-to-sequence transformer (mBart)
  • Language(s) (NLP): French (fr), Fon (fon)
  • License: MIT
  • Finetuned from model: facebook/mbart-large-50-many-to-many-mmt

Model Sources

Uses

Direct Use

This model is intended for direct French-to-Fon translation tasks, including:

  • Document translation
  • Educational materials localization
  • Digital content accessibility for Fon speakers
  • Research in African language NLP

Downstream Use

The model can be integrated into:

  • Translation services and applications
  • Multilingual chatbots and virtual assistants
  • Language learning platforms
  • Cross-cultural communication tools

Out-of-Scope Use

This model is not suitable for:

  • Fon-to-French translation (unidirectional model)
  • Real-time simultaneous interpretation
  • Translation of highly specialized technical domains not represented in training data
  • Generation of creative content in Fon

Bias, Risks, and Limitations

The model may exhibit biases present in the training data, which primarily consists of religious texts (JW materials) and web-scraped content. Performance may be limited on:

  • Contemporary slang or very recent terminology
  • Highly specialized technical vocabulary
  • Regional dialects of Fon not well-represented in training data

Recommendations

Users should be aware that translation quality may vary depending on text domain and should validate outputs, especially for official or sensitive communications.

How to Get Started with the Model

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

# Load model and tokenizer
model = MBartForConditionalGeneration.from_pretrained("NazifToure/mbart-french-fon-opus")
tokenizer = MBart50TokenizerFast.from_pretrained("NazifToure/mbart-french-fon-opus")

def translate_fr_to_fon(text):
    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding=True)
    
    # Get Fon language code
    forced_bos = tokenizer.lang_code_to_id.get("fon_XX", None)
    
    # Generate translation
    outputs = model.generate(
        **inputs, 
        forced_bos_token_id=forced_bos, 
        max_length=128, 
        num_beams=5,
        early_stopping=True
    )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
french_text = "Bonjour, comment allez-vous ?"
fon_translation = translate_fr_to_fon(french_text)
print(f"French: {french_text}")
print(f"Fon: {fon_translation}")
Downloads last month
23
Safetensors
Model size
611M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support