---
library_name: transformers
license: cc-by-nc-4.0
base_model: facebook/nllb-200-3.3B
metrics:
- bleu
- chrf
- ter
model-index:
- name: Terjman-Supreme-v2.0
  results: []
datasets:
- BounharAbdelaziz/Terjman-v2-English-Darija-Dataset-350K
language:
- ary
- en
pipeline_tag: translation
---

# 🇲🇦 Terjman-Supreme-v2.0 (3.3B) 🚀  

**Terjman-Ultra-v2.0** is an improved version of [atlasia/Terjman-Ultra-v1](https://huggingface.co/atlasia/Terjman-Ultra-v1), built on the powerful Transformer architecture and fine-tuned for **high-quality, accurate translations**.  

This version is still based on [facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B) but has been trained on a **larger and more refined dataset**, leading to improved translation performance. The model achieves results **on par with gpt-4o-2024-08-06** on [TerjamaBench](https://huggingface.co/datasets/atlasia/TerjamaBench), an evaluation benchmark for English-Moroccan darija translation models, that challenges the models more on the cultural aspect.  


## 🚀 Features  

✅ **Fine-tuned for English->Moroccan darija translation**.  
✅ **State-of-the-art performance** among open-source models.  
✅ **Compatible with 🤗 Transformers** and easily deployable on various hardware setups.  


## 🔥 Performance Comparison  

The following table compares **Terjman-Supreme-v2.0** against proprietary and open-source models using BLEU, chrF, and TER scores. Higher **BLEU/chrF** and lower **TER** indicate better translation quality.  

| **Model** | **Size** | **BLEU↑** | **chrF↑** | **TER↓** |  
|------------|------|-------|-------|------|  
| **Proprietary Models** |  |  |  |  |  
| gemini-exp-1206 | * | **30.69** | **54.16** | 67.62 |  
| claude-3-5-sonnet-20241022 | * | 30.51 | 51.80 | **67.42** |  
| gpt-4o-2024-08-06 | * | 28.30 | 50.13 | 71.77 |  
| **Open-Source Models** |  |  |  |  |  
| Terjman-Ultra-v2.0| 1.3B | **25.00** | **44.70** | **77.20** |  
| **Terjman-Supreme-v2.0 (This model)**  | 3.3B | 23.43 | 44.57 | 78.17 |  
| Terjman-Large-v2.0 | 240M | 22.67 | 42.57 | 83.00 |  
| Terjman-Nano-v2.0 | 77M | 18.84 | 38.41 | 94.73 |  
| atlasia/Terjman-Large-v1.2 | 240M | 16.33 | 37.10 | 89.13 |  
| MBZUAI-Paris/Atlas-Chat-9B | 9B | 14.80 | 35.26 | 93.95 |  
| facebook/nllb-200-3.3B | 3.3B | 14.76 | 34.17 | 94.33 |  
| atlasia/Terjman-Nano | 77M | 09.98 | 26.55 | 106.49 |  


## 🔬 Model Details  

- **Base Model**: [facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)  
- **Architecture**: Transformer-based sequence-to-sequence model  
- **Training Data**: High-quality parallel corpora with high quality translations  
- **Training Precision**: FP16 for efficient inference  

## 🚀 How to Use  

You can use the model with the **Hugging Face Transformers** library:  

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "BounharAbdelaziz/Terjman-Supreme-v2.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

def translate(text, src_lang="eng_Latn", tgt_lang="ary_Arab"):
    inputs = tokenizer(text, return_tensors="pt", src_lang=src_lang, tgt_lang=tgt_lang)
    output = model.generate(**inputs)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example translation
text = "Hello there! Today the weather is so nice in Geneva, couldn't ask for more to enjoy the holidays :)"
translation = translate(text)
print("Translation:", translation)
# prints: صباح الخير! اليوم الطقس زوين بزاف فجنيف، ما قدرتش نطلب أكثر باش نتمتع بالعطلة :)
```


## 🖥️ Deployment  

### Run in a Hugging Face Space
Try the model interactively in the [Terjman-Ultra Space](https://huggingface.co/spaces/BounharAbdelaziz/Terjman-Ultra-v2.0) 🤗  

### Use with Text Generation Inference (TGI) 
For fast inference, use **Hugging Face TGI**:  

```bash
pip install text-generation
text-generation-launcher --model-id BounharAbdelaziz/Terjman-Supreme-v2.0
```

### Run Locally with Transformers & PyTorch
```bash
pip install transformers torch
python -c "from transformers import pipeline; print(pipeline('translation', model='BounharAbdelaziz/Terjman-Supreme-v2.0')('Hello there!'))"
```

### Deploy on an API Server
Use **FastAPI** to serve translations as an API:  

```python
from fastapi import FastAPI
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

app = FastAPI()
model_name = "BounharAbdelaziz/Terjman-Supreme-v2.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

@app.get("/translate/")
def translate(text: str):
    inputs = tokenizer(text, return_tensors="pt", src_lang="eng_Latn", tgt_lang="ary_Arab")
    output = model.generate(**inputs)
    return {"translation": tokenizer.decode(output[0], skip_special_tokens=True)}
```


## 🛠️ Training Details Hyperparameters**  

The model was fine-tuned using the following training settings:  

- **Learning Rate**: `0.0005`  
- **Training Batch Size**: `1`  
- **Evaluation Batch Size**: `1`  
- **Seed**: `42`  
- **Gradient Accumulation Steps**: `64`  
- **Total Effective Batch Size**: `64`  
- **Optimizer**: `AdamW (Torch)` with `betas=(0.9,0.999)`, `epsilon=1e-08`  
- **Learning Rate Scheduler**: `Linear`  
- **Warmup Ratio**: `0.1`  
- **Epochs**: `3`  
- **Precision**: `Mixed FP16` for efficient training

## 📜 License  

This model is released under the **CC BY-NC (Creative Commons Attribution-NonCommercial)** license, meaning it can be used for research and personal projects but not for commercial purposes. For commercial use, please get in touch :)

### Framework versions

- Transformers 4.47.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.21.0

```bibtex
@misc{terjman-v2,
  title = {Terjman-v2: High-Quality English-Moroccan Darija Translation Model},
  author={Abdelaziz Bounhar},
  year={2025},
  howpublished = {\url{https://huggingface.co/BounharAbdelaziz/Terjman-Supreme-v2.0}},
  license = {CC BY-NC}
}
```