Results

models

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the News dataset. It achieves the following results on the evaluation set:

Loss: 0.2032

Model description

The primary objective of the Qwen2.5-1.5B-Instruct model, which has been fine-tuned, is to automatically extract and summarize critical information from Arabic text inputs, such as news articles, generating structured JSON-like outputs.

Training and evaluation data

Fine-Tuning Dataset: 2001 samples of Arabic technology-related text, used to adapt the model for structured extraction tasks.

Evaluation Dataset: 100 samples of Arabic sports-related text, used to assess performance on a different domain.

Average similarity scores on the evaluation data

This similarity measure is applied only to one field from the output JSON, namely the "News_title," and the data belongs to a different domain than the one the model was fine-tuned on.
Mean similarity: 0.6871507857739926
Note: The average similarity score is quite good, considering that the Qwen model was fine-tuned on a tech dataset. However, since all the test data here is related to sports, testing on data more similar to the tech domain would likely yield even better accuracy.

Training procedure

Since the dataset was not labeled, Llama 4 Scout was employed as a teacher model in a knowledge distillation framework to generate pseudo-labels or guide the training of Qwen2.5-1.5B-Instruct (the student model). Knowledge distillation transfers knowledge from a larger, more capable model (Llama 4 Scout) to a smaller, efficient model (Qwen2.5-1.5B-Instruct).

Role of Llama 4 Scout: Teacher Model: Llama 4 Scout, a powerful language model, was used to process the unlabeled 2001 technology samples and generate high-quality structured outputs (e.g., pseudo-labels for story titles, keywords, summaries, categories, and entities). Output Generation: For each input text, Llama 4 Scout produced: Story Title: A concise headline summarizing the main event. Keywords: Relevant terms extracted based on contextual understanding. Summary: A set of key sentences or abstractive summary points. Category: A predicted category (e.g., “technology” for training data). Entities: Identified entities with types (e.g., person, organization), using its advanced NER capabilities.
Tool: LLaMA-Factory used for streamlined fine-tuning, supporting LoRA (Low-Rank Adaptation).

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.2435	0.2061	100	0.2322
0.2341	0.4122	200	0.2187
0.2136	0.6182	300	0.2057
0.2021	0.8243	400	0.1994
0.1384	1.0309	500	0.1992
0.1487	1.2370	600	0.1972
0.1437	1.4431	700	0.1935
0.1371	1.6491	800	0.1927
0.147	1.8552	900	0.1883
0.0668	2.0618	1000	0.1961
0.077	2.2679	1100	0.2072
0.0707	2.4740	1200	0.2032
0.059	2.6801	1300	0.2037
0.0657	2.8861	1400	0.2032

Framework versions

PEFT 0.15.1
Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

Alawy21
/

News_analyzer_Qwen2.5_1.5B_Finetuning