Model Card for mbart-french-fon-opus

This model is a fine-tuned mBart model for French to Fon translation, trained on the comprehensive OPUS dataset. It represents the first large-scale neural machine translation model for the French-Fon language pair.

Model Details

Model Description

This model enables translation from French to Fon (a language primarily spoken in Benin). It was fine-tuned from the multilingual mBart-50 model using over 586,000 parallel sentence pairs from multiple OPUS corpora, making it the largest dataset used for French-Fon translation to date.

Developed by: Nazif Toure
Model type: Sequence-to-sequence transformer (mBart)
Language(s) (NLP): French (fr), Fon (fon)
License: MIT
Finetuned from model: facebook/mbart-large-50-many-to-many-mmt

Model Sources

Repository: https://huggingface.co/NazifToure/mbart-french-fon-opus

Uses

Direct Use

This model is intended for direct French-to-Fon translation tasks, including:

Document translation
Educational materials localization
Digital content accessibility for Fon speakers
Research in African language NLP

Downstream Use

The model can be integrated into:

Translation services and applications
Multilingual chatbots and virtual assistants
Language learning platforms
Cross-cultural communication tools

Out-of-Scope Use

This model is not suitable for:

Fon-to-French translation (unidirectional model)
Real-time simultaneous interpretation
Translation of highly specialized technical domains not represented in training data
Generation of creative content in Fon

Bias, Risks, and Limitations

The model may exhibit biases present in the training data, which primarily consists of religious texts (JW materials) and web-scraped content. Performance may be limited on:

Contemporary slang or very recent terminology
Highly specialized technical vocabulary
Regional dialects of Fon not well-represented in training data

Recommendations

Users should be aware that translation quality may vary depending on text domain and should validate outputs, especially for official or sensitive communications.

How to Get Started with the Model

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

# Load model and tokenizer
model = MBartForConditionalGeneration.from_pretrained("NazifToure/mbart-french-fon-opus")
tokenizer = MBart50TokenizerFast.from_pretrained("NazifToure/mbart-french-fon-opus")

def translate_fr_to_fon(text):
    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding=True)
    
    # Get Fon language code
    forced_bos = tokenizer.lang_code_to_id.get("fon_XX", None)
    
    # Generate translation
    outputs = model.generate(
        **inputs, 
        forced_bos_token_id=forced_bos, 
        max_length=128, 
        num_beams=5,
        early_stopping=True
    )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
french_text = "Bonjour, comment allez-vous ?"
fon_translation = translate_fr_to_fon(french_text)
print(f"French: {french_text}")
print(f"Fon: {fon_translation}")

Downloads last month: 23

Safetensors

Model size

611M params

Tensor type

F32