🕌 English → Moroccan Darija Translator

This repository provides a machine translation model for translating English into Moroccan Darija (الدارجة المغربية).
The model is fine-tuned to handle conversational, cultural, and everyday expressions, producing natural Moroccan Darija output.

🚀 Model Details

Model ID: oddadmix/English-Moroccan-Darija-v1
Framework: Hugging Face transformers
Task: English → Moroccan Darija translation
Language Pair: English → Moroccan Darija
Context Window: 32K tokens

📖 Usage

Install the required libraries:

pip install transformers

Run the translation:

from transformers import pipeline

model_id = "oddadmix/English-Moroccan-Darija-v1"
translate = pipeline("text-generation", model=model_id)

messages = [
    {"role": "system", "content": "Translate to Moroccan Darija"},
    {"role": "user", "content": "How are you today?"}
]

translation = translate(
    messages,
    max_new_tokens=8000,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05
)

print(translation)

Example Output:

كيف داير اليوم؟

⚠️ Important Note: The system prompt ({"role": "system", "content": "Translate to Moroccan Darija"}) is crucial.
Without it, the model will not perform at its best capacity.

📊 Benchmark Results

The model has been evaluated against other strong LLMs on the English → Moroccan Darija task as a proxy benchmark.
For Moroccan Darija evaluation, a dataset of 300 sentences manually translated by a Moroccan translator was used.

🧾 Evaluation Dataset Coverage

The dataset spans diverse domains, ensuring wide coverage:

Daily Life & Family: greetings, weather, school, transportation, family meals.
Food & Cooking: couscous, tagines, vegetables, desserts, cooking instructions.
Travel & Geography: Marrakech, Tangier, Casablanca, Rabat, Agadir, public transport.
Work & Business: meetings, HR, finance, reports, management.
Politics & Government: parliament debates, policies, laws, elections.
Arts & Culture: music, painting, poetry, theater, sculpture.
Education & Health: doctors, hospitals, lessons, assignments, public health.

This diversity makes the benchmark a strong representation of real-world translation scenarios.

Model	BLEU	METEOR	chrF	Task
Claude-Sonnet-4	0.312	0.566	62.09	English → Moroccan Darija
GPT-5-mini	0.381	0.637	66.58	English → Moroccan Darija
GPT-5	0.284	0.551	61.73	English → Moroccan Darija
GPT-4.1	0.306	0.575	61.87	English → Moroccan Darija
oddadmix/English-Moroccan-Darija-v1	0.423	0.644	67.31	English → Moroccan Darija

➡️ Our model achieves state-of-the-art performance while delivering specialized Moroccan Darija output across a wide variety of contexts.

🌍 Applications

Translating English educational material into Moroccan Darija.
Supporting Moroccan dialect localization for chatbots, apps, and websites.
Preserving cultural nuances in translations (not just MSA literalism).

⚠️ Notes

Output is optimized for natural conversational Moroccan Darija, not Modern Standard Arabic (MSA).
Since Darija is a primarily spoken dialect, spelling conventions may vary slightly.
Thanks to its 32K context window, the model can handle long documents and complex conversations seamlessly.
Always include the system prompt to unlock the model’s best performance.

🔎 Limitations & Future Work

Spelling Variations: Moroccan Darija lacks standardized spelling. The model may generate slight differences (e.g., "بزاف" vs "بزّاف").
Code-Switching: Common Darija usage mixes in French and occasionally Spanish. The model currently prioritizes pure Darija but may benefit from code-switching support.
Niche Domains: Performance may vary for highly technical or domain-specific text. Future fine-tuning on specialized datasets could improve this.
Evaluation Scope: Current evaluation is based on 300 manually translated sentences across diverse fields. Expanding the dataset will strengthen benchmarks further.
Future Improvements:
- Add multilingual code-switching training.
- Expand context-specific datasets (education, health, e-commerce).
- Release an interactive demo/Colab notebook for easier testing.

📬 Contact

For feedback, contributions, or collaborations, please open an issue or reach out.

oddadmix
/

English-Moroccan-Darija-v1