π English β Moroccan Darija Translator
This repository provides a machine translation model for translating English into Moroccan Darija (Ψ§ΩΨ―Ψ§Ψ±Ψ¬Ψ© Ψ§ΩΩ
ΨΊΨ±Ψ¨ΩΨ©).
The model is fine-tuned to handle conversational, cultural, and everyday expressions, producing natural Moroccan Darija output.
π Model Details
- Model ID:
oddadmix/English-Moroccan-Darija-v1
- Framework: Hugging Face
transformers
- Task: English β Moroccan Darija translation
- Language Pair: English β Moroccan Darija
- Context Window: 32K tokens
π Usage
Install the required libraries:
pip install transformers
Run the translation:
from transformers import pipeline
model_id = "oddadmix/English-Moroccan-Darija-v1"
translate = pipeline("text-generation", model=model_id)
messages = [
{"role": "system", "content": "Translate to Moroccan Darija"},
{"role": "user", "content": "How are you today?"}
]
translation = translate(
messages,
max_new_tokens=8000,
do_sample=True,
temperature=0.3,
min_p=0.15,
repetition_penalty=1.05
)
print(translation)
Example Output:
ΩΩΩ Ψ―Ψ§ΩΨ± Ψ§ΩΩΩΩ
Ψ
β οΈ Important Note: The system prompt ({"role": "system", "content": "Translate to Moroccan Darija"}
) is crucial.
Without it, the model will not perform at its best capacity.
π Benchmark Results
The model has been evaluated against other strong LLMs on the English β Moroccan Darija task as a proxy benchmark.
For Moroccan Darija evaluation, a dataset of 300 sentences manually translated by a Moroccan translator was used.
π§Ύ Evaluation Dataset Coverage
The dataset spans diverse domains, ensuring wide coverage:
- Daily Life & Family: greetings, weather, school, transportation, family meals.
- Food & Cooking: couscous, tagines, vegetables, desserts, cooking instructions.
- Travel & Geography: Marrakech, Tangier, Casablanca, Rabat, Agadir, public transport.
- Work & Business: meetings, HR, finance, reports, management.
- Politics & Government: parliament debates, policies, laws, elections.
- Arts & Culture: music, painting, poetry, theater, sculpture.
- Education & Health: doctors, hospitals, lessons, assignments, public health.
This diversity makes the benchmark a strong representation of real-world translation scenarios.
Model | BLEU | METEOR | chrF | Task |
---|---|---|---|---|
Claude-Sonnet-4 | 0.312 | 0.566 | 62.09 | English β Moroccan Darija |
GPT-5-mini | 0.381 | 0.637 | 66.58 | English β Moroccan Darija |
GPT-5 | 0.284 | 0.551 | 61.73 | English β Moroccan Darija |
GPT-4.1 | 0.306 | 0.575 | 61.87 | English β Moroccan Darija |
oddadmix/English-Moroccan-Darija-v1 | 0.423 | 0.644 | 67.31 | English β Moroccan Darija |
β‘οΈ Our model achieves state-of-the-art performance while delivering specialized Moroccan Darija output across a wide variety of contexts.
π Applications
- Translating English educational material into Moroccan Darija.
- Supporting Moroccan dialect localization for chatbots, apps, and websites.
- Preserving cultural nuances in translations (not just MSA literalism).
β οΈ Notes
- Output is optimized for natural conversational Moroccan Darija, not Modern Standard Arabic (MSA).
- Since Darija is a primarily spoken dialect, spelling conventions may vary slightly.
- Thanks to its 32K context window, the model can handle long documents and complex conversations seamlessly.
- Always include the system prompt to unlock the modelβs best performance.
π Limitations & Future Work
- Spelling Variations: Moroccan Darija lacks standardized spelling. The model may generate slight differences (e.g., "Ψ¨Ψ²Ψ§Ω" vs "Ψ¨Ψ²ΩΨ§Ω").
- Code-Switching: Common Darija usage mixes in French and occasionally Spanish. The model currently prioritizes pure Darija but may benefit from code-switching support.
- Niche Domains: Performance may vary for highly technical or domain-specific text. Future fine-tuning on specialized datasets could improve this.
- Evaluation Scope: Current evaluation is based on 300 manually translated sentences across diverse fields. Expanding the dataset will strengthen benchmarks further.
- Future Improvements:
- Add multilingual code-switching training.
- Expand context-specific datasets (education, health, e-commerce).
- Release an interactive demo/Colab notebook for easier testing.
π¬ Contact
For feedback, contributions, or collaborations, please open an issue or reach out.
- Downloads last month
- 8