NazifToure commited on
Commit
9f742b8
·
verified ·
1 Parent(s): 2d6a305

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -3
README.md CHANGED
@@ -1,3 +1,102 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - fr
4
+ - fon
5
+ license: mit
6
+ tags:
7
+ - translation
8
+ - mbart
9
+ - french-fon
10
+ - opus
11
+ - african-languages
12
+ pipeline_tag: translation
13
+ ---
14
+
15
+ # Model Card for mbart-french-fon-opus
16
+
17
+ This model is a fine-tuned mBart model for French to Fon translation, trained on the comprehensive OPUS dataset. It represents the first large-scale neural machine translation model for the French-Fon language pair.
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ This model enables translation from French to Fon (a language primarily spoken in Benin). It was fine-tuned from the multilingual mBart-50 model using over 586,000 parallel sentence pairs from multiple OPUS corpora, making it the largest dataset used for French-Fon translation to date.
24
+
25
+ - **Developed by:** Nazif Toure
26
+ - **Model type:** Sequence-to-sequence transformer (mBart)
27
+ - **Language(s) (NLP):** French (fr), Fon (fon)
28
+ - **License:** MIT
29
+ - **Finetuned from model:** facebook/mbart-large-50-many-to-many-mmt
30
+
31
+ ### Model Sources
32
+
33
+ - **Repository:** https://huggingface.co/NazifToure/mbart-french-fon-opus
34
+
35
+ ## Uses
36
+
37
+ ### Direct Use
38
+
39
+ This model is intended for direct French-to-Fon translation tasks, including:
40
+ - Document translation
41
+ - Educational materials localization
42
+ - Digital content accessibility for Fon speakers
43
+ - Research in African language NLP
44
+
45
+ ### Downstream Use
46
+
47
+ The model can be integrated into:
48
+ - Translation services and applications
49
+ - Multilingual chatbots and virtual assistants
50
+ - Language learning platforms
51
+ - Cross-cultural communication tools
52
+
53
+ ### Out-of-Scope Use
54
+
55
+ This model is not suitable for:
56
+ - Fon-to-French translation (unidirectional model)
57
+ - Real-time simultaneous interpretation
58
+ - Translation of highly specialized technical domains not represented in training data
59
+ - Generation of creative content in Fon
60
+
61
+ ## Bias, Risks, and Limitations
62
+
63
+ The model may exhibit biases present in the training data, which primarily consists of religious texts (JW materials) and web-scraped content. Performance may be limited on:
64
+ - Contemporary slang or very recent terminology
65
+ - Highly specialized technical vocabulary
66
+ - Regional dialects of Fon not well-represented in training data
67
+
68
+ ### Recommendations
69
+
70
+ Users should be aware that translation quality may vary depending on text domain and should validate outputs, especially for official or sensitive communications.
71
+
72
+ ## How to Get Started with the Model
73
+ ```python
74
+ from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
75
+
76
+ # Load model and tokenizer
77
+ model = MBartForConditionalGeneration.from_pretrained("NazifToure/mbart-french-fon-opus")
78
+ tokenizer = MBart50TokenizerFast.from_pretrained("NazifToure/mbart-french-fon-opus")
79
+
80
+ def translate_fr_to_fon(text):
81
+ # Tokenize input
82
+ inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding=True)
83
+
84
+ # Get Fon language code
85
+ forced_bos = tokenizer.lang_code_to_id.get("fon_XX", None)
86
+
87
+ # Generate translation
88
+ outputs = model.generate(
89
+ **inputs,
90
+ forced_bos_token_id=forced_bos,
91
+ max_length=128,
92
+ num_beams=5,
93
+ early_stopping=True
94
+ )
95
+
96
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
97
+
98
+ # Example usage
99
+ french_text = "Bonjour, comment allez-vous ?"
100
+ fon_translation = translate_fr_to_fon(french_text)
101
+ print(f"French: {french_text}")
102
+ print(f"Fon: {fon_translation}")