Update README.md

Browse files

Files changed (1) hide show

README.md +48 -49

README.md CHANGED Viewed

@@ -34,13 +34,59 @@ The model distinguishes between **positive** and **negative** sentiments in real
 - **Base model**: `bert-base-multilingual-cased`
 - **Fine-tuned on**:
-  - 1,000 Arabic tweets from [UCI Sentiment Dataset 2024](https://data.mendeley.com/datasets/m88gg52wp7/1)
-  - 1,000 English tweets from [Sentiment140 (Stanford)](http://help.sentiment140.com/for-students)
 - **Task**: Binary sentiment classification (0 = Negative, 1 = Positive)
 - **Languages**: Arabic, English
 - **Tokenizer**: `bert-base-multilingual-cased` tokenizer
 - **Accuracy**: Evaluated on 10% holdout from training set
 ## 📦 How to Use
 ```python
@@ -49,7 +95,6 @@ from transformers import pipeline
 classifier = pipeline("sentiment-analysis", model="HatemMoushir/ArEn-TweetSentiment-BERT-Hatem")
 print(classifier("الخدمة كانت ممتازة"))
 print(classifier("I hate this product."))
 ```
 ## Testing
@@ -319,52 +364,6 @@ accuracy = correct / len(samples)
 print(f"✅ Accuracy: {accuracy * 100:.2f}%")
 ```
-🔍 Training Details
-Framework: 🤗 Transformers + PyTorch
-Training Time: ~2 epochs
-Optimizer: AdamW (default in Trainer)
-Batch Size: 16
-Evaluation Metric: Accuracy, F1, Precision, Recall
-Environment: Google Colab
----
-## 📊 Evaluation Results
-| Epoch | Training Loss | Validation Loss | Accuracy | F1 Score | Precision | Recall |
-|:-----:|:-------------:|:---------------:|:--------:|:--------:|:---------:|:------:|
-|   1   |    0.6266     |     0.7536      |  0.5900  |  0.1800  |  0.6429   | 0.1047 |
-|   2   |    0.5127     |     0.5944      |  0.7200  |  0.6667  |  0.6829   | 0.6512 |
----
-✅ Summary:
-The model shows significant improvement after the second epoch.
-F1 score improved from 0.18 → 0.66, and accuracy from 59% → 72%.
-The training and validation losses are decreasing, indicating effective learning without overfitting (yet).
-Precision and Recall both increased, showing that the model is now detecting both classes more reliably.
----
-🧪 How to Reproduce
-The model was fine-tuned using Trainer from the Hugging Face transformers library on a multilingual sentiment dataset (based on Sentiment140 and additional Arabic tweets).
-Training Time: ~1h30min on Colab GPU
-Model: bert-base-multilingual-cased
 ## Development and Assistance
 This model was developed and trained using **Google Colab**, with guidance and technical assistance from **ChatGPT**, which was used for idea generation, code authoring, and troubleshooting throughout the development process.

 - **Base model**: `bert-base-multilingual-cased`
 - **Fine-tuned on**:
+  - Arabic tweets from [UCI Sentiment Dataset 2024](https://data.mendeley.com/datasets/m88gg52wp7/1)
+  - English tweets from [Sentiment140 (Stanford)](http://help.sentiment140.com/for-students)
 - **Task**: Binary sentiment classification (0 = Negative, 1 = Positive)
 - **Languages**: Arabic, English
 - **Tokenizer**: `bert-base-multilingual-cased` tokenizer
 - **Accuracy**: Evaluated on 10% holdout from training set
+🔍 Training Details
+Framework: 🤗 Transformers + PyTorch
+Training Time: ~2 epochs
+Optimizer: AdamW (default in Trainer)
+Batch Size: 16
+Evaluation Metric: Accuracy, F1, Precision, Recall
+Environment: Google Colab
+---
+## 📊 Evaluation Results
+### ✅ Experiment 1 — Initial Run (2K Samples)
+| Epoch | Train Loss | Val Loss | Accuracy | F1 Score | Precision | Recall |
+|-------|------------|----------|----------|----------|-----------|--------|
+| 1     | 0.6266     | 0.7536   | 59.00%   | 0.1800   | 0.6429    | 0.1047 |
+| 2     | 0.5127     | 0.5944   | 72.00%   | 0.6667   | 0.6829    | 0.6512 |
+---
+### ✅ Experiment 2 — Refined Arabic Dataset (20K Samples)
+| Epoch | Train Loss | Val Loss | Accuracy | F1 Score | Precision | Recall |
+|-------|------------|----------|----------|----------|-----------|--------|
+| 1     | 0.5851     | 0.5879   | 70.85%   | 0.6674   | 0.6139    | 0.7312 |
+| 2     | 0.4792     | 0.5007   | 78.65%   | 0.7105   | 0.7763    | 0.6550 |
+---
+### ✅ Experiment 3 — Large-Scale Ar+En Dataset (100K Samples)
+| Epoch | Train Loss | Val Loss | Accuracy | F1 Score | Precision | Recall |
+|-------|------------|----------|----------|----------|-----------|--------|
+| 1     | 0.5231     | 0.5846   | 72.35%   | 0.7127   | 0.6171    | 0.8434 |
+| 2     | 0.4404     | 0.4496   | 79.98%   | 0.7502   | 0.7615    | 0.7394 |
+🔍 Summary:
+ Larger datasets led to higher recall and more robust generalization across languages. The model surpassed 79% accuracy and 0.75 F1 score in the final training run.
+---
+🧪 How to Reproduce
+The model was fine-tuned using Trainer from the Hugging Face transformers library on a multilingual sentiment dataset (based on Sentiment140 and additional Arabic tweets).
+Training Time: ~1h30min on Colab GPU
+Model: bert-base-multilingual-cased
 ## 📦 How to Use
 ```python
 classifier = pipeline("sentiment-analysis", model="HatemMoushir/ArEn-TweetSentiment-BERT-Hatem")
 print(classifier("الخدمة كانت ممتازة"))
 print(classifier("I hate this product."))
 ```
 ## Testing
 print(f"✅ Accuracy: {accuracy * 100:.2f}%")
 ```
 ## Development and Assistance
 This model was developed and trained using **Google Colab**, with guidance and technical assistance from **ChatGPT**, which was used for idea generation, code authoring, and troubleshooting throughout the development process.