Results

models
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the News dataset. It achieves the following results on the evaluation set:
- Loss: 0.2032
Model description
The primary objective of the Qwen2.5-1.5B-Instruct model, which has been fine-tuned, is to automatically extract and summarize critical information from Arabic text inputs, such as news articles, generating structured JSON-like outputs.
Training and evaluation data
Fine-Tuning Dataset: 2001 samples of Arabic technology-related text, used to adapt the model for structured extraction tasks.
Evaluation Dataset: 100 samples of Arabic sports-related text, used to assess performance on a different domain.
Average similarity scores on the evaluation data
- This similarity measure is applied only to one field from the output JSON, namely the "News_title," and the data belongs to a different domain than the one the model was fine-tuned on.
- Mean similarity: 0.6871507857739926
- Note: The average similarity score is quite good, considering that the Qwen model was fine-tuned on a tech dataset. However, since all the test data here is related to sports, testing on data more similar to the tech domain would likely yield even better accuracy.
Training procedure
Since the dataset was not labeled, Llama 4 Scout was employed as a teacher model in a knowledge distillation framework to generate pseudo-labels or guide the training of Qwen2.5-1.5B-Instruct (the student model). Knowledge distillation transfers knowledge from a larger, more capable model (Llama 4 Scout) to a smaller, efficient model (Qwen2.5-1.5B-Instruct).
Role of Llama 4 Scout: Teacher Model: Llama 4 Scout, a powerful language model, was used to process the unlabeled 2001 technology samples and generate high-quality structured outputs (e.g., pseudo-labels for story titles, keywords, summaries, categories, and entities). Output Generation: For each input text, Llama 4 Scout produced: Story Title: A concise headline summarizing the main event. Keywords: Relevant terms extracted based on contextual understanding. Summary: A set of key sentences or abstractive summary points. Category: A predicted category (e.g., “technology” for training data). Entities: Identified entities with types (e.g., person, organization), using its advanced NER capabilities.
Tool: LLaMA-Factory used for streamlined fine-tuning, supporting LoRA (Low-Rank Adaptation).
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.2435 | 0.2061 | 100 | 0.2322 |
0.2341 | 0.4122 | 200 | 0.2187 |
0.2136 | 0.6182 | 300 | 0.2057 |
0.2021 | 0.8243 | 400 | 0.1994 |
0.1384 | 1.0309 | 500 | 0.1992 |
0.1487 | 1.2370 | 600 | 0.1972 |
0.1437 | 1.4431 | 700 | 0.1935 |
0.1371 | 1.6491 | 800 | 0.1927 |
0.147 | 1.8552 | 900 | 0.1883 |
0.0668 | 2.0618 | 1000 | 0.1961 |
0.077 | 2.2679 | 1100 | 0.2072 |
0.0707 | 2.4740 | 1200 | 0.2032 |
0.059 | 2.6801 | 1300 | 0.2037 |
0.0657 | 2.8861 | 1400 | 0.2032 |
Framework versions
- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 6