ELhadratiOth's picture
Update README.md
526f830 verified
metadata
datasets:
  - atlasia/darija_english

Darija-English Translator

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the darija_finetune_train dataset. It is designed to translate text from Moroccan Darija (a dialect of Arabic) to English.

Model Details

  • Library: PEFT
  • License: Apache 2.0
  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Tags: llama-factory, lora, generated_from_trainer

How to Use

You can load and use the model with the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Define model and tokenizer
base_model_id = "Qwen/Qwen2.5-1.5B-Instruct"
finetuned_model_id = "ELhadratiOth/darija-english-translater"
device = "cuda"  # Change to "cpu" if GPU is not available

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=None
)

# Load the fine-tuned adapter
model.load_adapter(finetuned_model_id)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

def translate_darija(text):
    messages = [
        {"role": "system", "content": "You are a professional NLP data parser. Follow the provided task and output scheme for consistency."},
        {"role": "user", "content": f"## Task:\n{text}\n\n## English Translation:"}
    ]
    
    text_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([text_input], return_tensors="pt").to(device)
    
    generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=1024, do_sample=False, temperature=0.8)
    translation = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
    return translation

# Example usage
query = "Your Darija text here"
response = translate_darija(query)
print(response)

Training Details

Hyperparameters

  • Learning Rate: 0.0001
  • Batch Size:
    • Train: 1
    • Eval: 1
  • Seed: 42
  • Distributed Training: Multi-GPU
  • Number of Devices: 2
  • Gradient Accumulation Steps: 4
  • Total Train Batch Size: 8
  • Total Eval Batch Size: 2
  • Optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-08)
  • LR Scheduler: Cosine
  • Warmup Ratio: 0.1
  • Epochs: 10

Framework Versions

  • PEFT: 0.12.0
  • Transformers: 4.49.0
  • PyTorch: 2.5.1+cu121
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0