ArEn-TweetSentiment-BERT-Hatem
ArEn-TweetSentiment-BERT-Hatem
is a bilingual sentiment analysis model trained on both Arabic and English tweets. It is based on the bert-base-multilingual-cased
model from Hugging Face Transformers.
The model distinguishes between positive and negative sentiments in real-world social media content, specifically Twitter data.
๐ง Model Details
- Base model:
bert-base-multilingual-cased
- Fine-tuned on:
- Arabic tweets from UCI Sentiment Dataset 2024
- English tweets from Sentiment140 (Stanford)
- Task: Binary sentiment classification (0 = Negative, 1 = Positive)
- Languages: Arabic, English
- Tokenizer:
bert-base-multilingual-cased
tokenizer - Accuracy: Evaluated on 10% holdout from training set
๐ Training Details
- Framework: ๐ค Transformers + PyTorch
- Training Time: ~2 epochs
- Optimizer: AdamW (default in Trainer)
- Batch Size: 16
- Evaluation Metric: Accuracy, F1, Precision, Recall
- Environment: Google Colab
๐ Evaluation Results
โ Experiment 1 โ Initial Run (2K Samples)
Epoch | Train Loss | Val Loss | Accuracy | F1 Score | Precision | Recall |
---|---|---|---|---|---|---|
1 | 0.6266 | 0.7536 | 59.00% | 0.1800 | 0.6429 | 0.1047 |
2 | 0.5127 | 0.5944 | 72.00% | 0.6667 | 0.6829 | 0.6512 |
โ Experiment 2 โ Refined Arabic Dataset (20K Samples)
Epoch | Train Loss | Val Loss | Accuracy | F1 Score | Precision | Recall |
---|---|---|---|---|---|---|
1 | 0.5851 | 0.5879 | 70.85% | 0.6674 | 0.6139 | 0.7312 |
2 | 0.4792 | 0.5007 | 78.65% | 0.7105 | 0.7763 | 0.6550 |
โ Experiment 3 โ Large-Scale Ar+En Dataset (100K Samples)
Epoch | Train Loss | Val Loss | Accuracy | F1 Score | Precision | Recall |
---|---|---|---|---|---|---|
1 | 0.5231 | 0.5846 | 72.35% | 0.7127 | 0.6171 | 0.8434 |
2 | 0.4404 | 0.4496 | 79.98% | 0.7502 | 0.7615 | 0.7394 |
๐ Summary: Larger datasets led to higher recall and more robust generalization across languages. The model surpassed 79% accuracy and 0.75 F1 score in the final training run.
๐งช How to Reproduce
The model was fine-tuned using Trainer from the Hugging Face transformers library on a multilingual sentiment dataset (based on Sentiment140 and additional Arabic tweets).
Training Time: ~1h30min on Colab GPU Model: bert-base-multilingual-cased
๐ฆ How to Use
from transformers import pipeline
classifier = pipeline("sentiment-analysis", model="HatemMoushir/ArEn-TweetSentiment-BERT-Hatem")
print(classifier("ุงูุฎุฏู
ุฉ ูุงูุช ู
ู
ุชุงุฒุฉ"))
print(classifier("I hate this product."))
Testing
- example 1 (Arabic)
from transformers import pipeline
# ุชุญู
ูู ุงููู
ูุฐุฌ
classifier = pipeline("sentiment-analysis", model="HatemMoushir/ArEn-TweetSentiment-BERT-Hatem")
# 100 ุฌู
ูุฉ ู
ุน ุงูุชุตูููุงุช ุงูุญููููุฉ (1 = ุฅูุฌุงุจูุ 0 = ุณูุจู)
samples = [
("ุฃูุง ุณุนูุฏ ุฌุฏูุง ุงูููู
", 1),
("ุงูุฌู ู
ู
ุทุฑ ููุฐุง ูุฌุนููู ุญุฒูููุง", 0),
("ูุฌุญุช ูู ุงูุงู
ุชุญุงู!", 1),
("ุฃุดุนุฑ ุจุงูุฅุญุจุงุท ู
ู ุงูุฃุฎุจุงุฑ", 0),
("ุฃุญุจ ุฃุตุฏูุงุฆู ูุซูุฑูุง", 1),
("ูุฐุง ุฃุณูุฃ ููู
ูู ุญูุงุชู", 0),
("ุฃุดุนุฑ ุจุงูุฑุงุญุฉ ูุงูุทู
ุฃูููุฉ", 1),
("ูู
ุฃุชู
ูู ู
ู ุงูููู
ุฌูุฏูุง ุงููููุฉ", 0),
("ุงูููู
ุฌู
ูู ูู
ุดู
ุณ", 1),
("ูู ุดูุก ูุณูุฑ ุจุดูู ุฎุงุทุฆ", 0),
("ุฃุญุจ ู
ุดุงูุฏุฉ ุงูุฃููุงู
ู
ุน ุนุงุฆูุชู", 1),
("ุชุฃุฎุฑุช ุนู ุงูุนู
ู ูููุฏุช ู
ุฒุงุฌู", 0),
("ุฃุดุนุฑ ุจุงููุดุงุท ูุงูุญูููุฉ", 1),
("ุงูู
ูุงู ู
ุฒุฏุญู
ููุง ุฃุณุชุทูุน ุงูุชุญู
ู", 0),
("ูุถูุช ุนุทูุฉ ุฑุงุฆุนุฉ ุนูู ุงูุดุงุทุฆ", 1),
("ุงูุชูู ุงูููู
ุจุดูู ุณูุก", 0),
("ุฃุดุนุฑ ุจุงูุชูุงุคู ุจุดุฃู ุงูู
ุณุชูุจู", 1),
("ูู
ูุนุฌุจูู ุงูุทุนุงู
ุงูููู
", 0),
("ุฃุดุนุฑ ุจุงูุญุจ ู
ู ุงูุฌู
ูุน", 1),
("ุฎุณุฑุช ูู ุดูุก ูู ูุญุธุฉ", 0),
("ุงูู
ูุณููู ุชุฌุนููู ุณุนูุฏูุง", 1),
("ุงูุทุฑูู ู
ุฒุฏุญู
ูุฃูุง ุบุงุถุจ", 0),
("ุฃูุง ู
ู
ุชู ููู ุดูุก ูุฏู", 1),
("ูุงู ููู
ูุง ู
ุฑูููุง ุฌุฏูุง", 0),
("ุฃุดุนุฑ ุจุงูุฃู
ู ุฑุบู
ุงูุตุนูุจุงุช", 1),
("ูุง ุฃุทูู ุงูุงูุชุธุงุฑ ูุฒูุงุฑุฉ ุฃุตุฏูุงุฆู", 1),
("ุชุฌุงูููู ูู ุงูุงุฌุชู
ุงุน ูุดุนุฑุช ุจุงูุฅูุงูุฉ", 0),
("ูุฒุช ูู ุงูู
ุณุงุจูุฉ!", 1),
("ุงูุฌู ุฎุงูู ููุง ููุญุชู
ู", 0),
("ุชูููุช ุฑุณุงูุฉ ุฌู
ููุฉ ู
ู ุตุฏููู", 1),
("ุงููุทุนุช ุงูููุฑุจุงุก ููุงุชูู ุงููููู
", 0),
("ุฃูุง ู
ุญุธูุธ ุจุนุงุฆูุชู", 1),
("ูุง ุฃุญุฏ ููุชู
ุจู", 0),
("ุงููุฏูุก ูู ูุฐุง ุงูู
ูุงู ูุฑูุญูู", 1),
("ุฎุณุฑุช ูุฑุตุชู ุงูุฃุฎูุฑุฉ", 0),
("ุฃุดุนุฑ ุฃููู ู
ุญุจูุจ", 1),
("ุถุงุนุช ุฃู
ุชุนุชู ูู ุงูู
ุทุงุฑ", 0),
("ูู
ุช ุจุนู
ู ุฌูุฏ ุงูููู
", 1),
("ูุง ุฃุฑูุฏ ุงูุชุญุฏุซ ู
ุน ุฃุญุฏ", 0),
("ุฃูุง ู
ู
ุชู ููุญูุงุฉ", 1),
("ููู
ู
ู
ู ูุจูุง ูุงุฆุฏุฉ", 0),
("ุชูููุช ุชุฑููุฉ ูู ุงูุนู
ู", 1),
("ุฃุดุนุฑ ุจุงูุฅุฌูุงุฏ ูุงูุชุนุจ", 0),
("ุงููุฏูุฉ ุฃุณุนุฏุชูู ูุซูุฑูุง", 1),
("ุงููุฑุช ู
ู ุงูุถุบุท", 0),
("ุชูุงููุช ูุฌุจุฉ ูุฐูุฐุฉ", 1),
("ุชุฃุฎุฑุช ุงูุฑุญูุฉ ูุฃุดุนุฑ ุจุงูุถูู", 0),
("ุญููุช ูุฏููุง ููุช ุฃุณุนู ูู", 1),
("ุงูุฎุณุงุฑุฉ ูุงูุช ูุงุณูุฉ", 0),
("ุฃูุง ูุฎูุฑ ุจููุณู", 1),
("ููุฏุช ุงูุซูุฉ ูู ู
ู ุญููู", 0),
("ุนุทูุฉ ููุงูุฉ ุงูุฃุณุจูุน ูุงูุช ุฑุงุฆุนุฉ", 1),
("ูุง ุฃุฌุฏ ุฃู ุฏุงูุน ููุงุณุชู
ุฑุงุฑ", 0),
("ุงุจูู ูุฌุญ ูู ุฏุฑุงุณุชู", 1),
("ูู ู
ู ุญููู ุฎุฐููู", 0),
("ู
ุดูุช ุนูู ุงูุจุญุฑ ููุงู ุงูุฌู ุฌู
ูููุง", 1),
("ุชุนุฑุถุช ูู
ููู ู
ุญุฑุฌ ุฃู
ุงู
ุงูุฌู
ูุน", 0),
("ุฃุดุนุฑ ุจุงูุณุนุงุฏุฉ ูุฃูู ุณุงุนุฏุช ุดุฎุตูุง", 1),
("ุชู
ุชุฌุงููู ุจุงููุงู
ู", 0),
("ูู
ุช ุฌูุฏูุง ูุงุณุชููุธุช ุจูุดุงุท", 1),
("ูุง ุฃุดุนุฑ ุจุฃู ุชูุฏู
", 0),
("ููู
ุฑุงุฆุน ู
ุน ุฃุตุฏูุงุฆู", 1),
("ูุดูุช ู
ุฑุฉ ุฃุฎุฑู", 0),
("ุชูููุช ู
ูุงูู
ุฉ ุฃุณุนุฏุชูู", 1),
("ูู ุดูุก ูููุงุฑ ู
ู ุญููู", 0),
("ุงุณุชู
ุชุนุช ุจุงูุฃุฌูุงุก ุงูููู
", 1),
("ุฃุดุนุฑ ุจุงูููู ุงูู
ุณุชู
ุฑ", 0),
("ูุงู ุงูููุงุก ุฏุงูุฆูุง ูู
ููุฆูุง ุจุงูุญุจ", 1),
("ูุง ุฃุชุญู
ู ุงูุถุบุท ุฃูุซุฑ", 0),
("ูุฌุญ ู
ุดุฑูุนู ุฃุฎูุฑูุง", 1),
("ููุฏุช ุนู
ูู ุงูููู
", 0),
("ูุถูุช ููุชูุง ู
ู
ุชุนูุง ูู ุงูุญุฏููุฉ", 1),
("ุฃูุง ุฎุงุฆู ู
ู
ุง ุณูุฃุชู", 0),
("ุชูููุช ุฏุนู
ูุง ูุจูุฑูุง ู
ู ุฃุตุฏูุงุฆู", 1),
("ุงููุฃุณ ูุณูุทุฑ ุนูู", 0),
("ุฑุญูุชู ูุงูุช ู
ููุฆุฉ ุจุงููุฑุญ", 1),
("ูุง ุดูุก ูุณุนุฏูู ู
ุคุฎุฑูุง", 0),
("ุฃุญุจุจุช ุงููููู
ูุซูุฑูุง", 1),
("ููู
ุงุชูู
ุฌุฑุญุชูู", 0),
("ุชุฐููุช ุทุนุงู
ูุง ุฑุงุฆุนูุง", 1),
("ูุง ุฃุฑู ูุงุฆุฏุฉ ู
ู ุงูู
ุญุงููุฉ", 0),
("ุถุญููุง ูุซูุฑูุง ุงูููู
", 1),
("ุญูู
ู ุชุจุฎุฑ", 0),
("ูุญุธุฉ ุงูููุงุก ูุงูุช ุณุงุญุฑุฉ", 1),
("ุฎุณุฑุช ุฃูุฑุจ ุงููุงุณ ุฅูู", 0),
("ุงูู
ุดู ูู ุงูุทุจูุนุฉ ูุฑูุญ ุฃุนุตุงุจู", 1),
("ูู
ูุตุฏููู ุฃุญุฏ", 0),
("ุงุจุชุณุงู
ุฉ ุทูู ุฌุนูุช ููู
ู ุฃูุถู", 1),
("ูู ุดูุก ุฃุตุจุญ ุตุนุจูุง", 0),
("ุงูููู
ุงุญุชููุช ุจูุฌุงุญู", 1),
("ุงููุงุฑ ูู ุดูุก ูู ูุญุธุฉ", 0),
("ุฃู
ุถูุช ููุชูุง ู
ู
ุชุนูุง ู
ุน ุงูุนุงุฆูุฉ", 1),
("ููุฏุช ุงูุฃู
ู ุชู
ุงู
ูุง", 0),
("ูุถูุช ููู
ูุง ุฑุงุฆุนูุง ูู ุงูุฑูู", 1),
("ุงููุงุณ ูุง ูููู
ูููู", 0),
("ุงุณุชู
ุชุนุช ุจุงูู
ูุณููู ูุงููุฏูุก", 1),
("ูุง ุฃุดุนุฑ ุจุงูุณุนุงุฏุฉ ุฃุจุฏูุง", 0),
("ุงูุฃุตุฏูุงุก ุฌูุจูุง ูู ุงูุณุนุงุฏุฉ", 1),
("ุชุนุจุช ู
ู ุงูู
ุญุงููุฉ", 0),
("ูู ูุญุธุฉ ูุงูุช ุฑุงุฆุนุฉ", 1),
("ูู ุดูุก ูุดู", 0),
("ุงููุฌุงุญ ูุงู ุซู
ุฑุฉ ุฌูุฏู", 1),
("ูุง ุฃู
ูู ุดูุฆูุง ุฃูุฑุญ ุจู", 0)
]
# ุชุฌุฑุจุฉ ุงููู
ูุฐุฌ ูู
ูุงุฑูุฉ ุงููุชูุฌุฉ
correct = 0
for i, (text, true_label) in enumerate(samples):
result = classifier(text)[0]
predicted_label = 1 if result["label"] == ("LABEL_1") else 0
is_correct = predicted_label == true_label
correct += is_correct
print(f"{i+1}. \"{text}\"")
print(f" ๐ Model โ {predicted_label} | ๐ฏ True โ {true_label} | {'โ๏ธ ุตุญ' if is_correct else 'โ ุบูุท'}\n")
# ุญุณุงุจ ุงูุฏูุฉ
accuracy = correct / len(samples)
print(f"โ
Accuracy: {accuracy * 100:.2f}%")
- example 2 (English)
from transformers import pipeline
# ุชุญู
ูู ุงููู
ูุฐุฌ ุงูุฅูุฌููุฒู ุงูู
ุฏุฑุจ ุนูู Sentiment140
classifier = pipeline("sentiment-analysis", model="HatemMoushir/ArEn-TweetSentiment-BERT-Hatem")
# 100 ุฌู
ูุฉ ุฅูุฌููุฒูุฉ ู
ุน ุงูุชุตููู ุงูุญูููู: 1 = Positive, 0 = Negative
samples = [
("I love this place!", 1),
("I hate waiting in traffic.", 0),
("Today is a beautiful day", 1),
("I am really disappointed", 0),
("Feeling great about this opportunity", 1),
("This movie was terrible", 0),
("Absolutely loved the dinner", 1),
("Iโm sad and frustrated", 0),
("My friends make me happy", 1),
("Everything went wrong today", 0),
("What a fantastic game!", 1),
("Worst experience ever", 0),
("The weather is amazing", 1),
("I canโt stand this anymore", 0),
("So proud of my achievements", 1),
("Feeling down", 0),
("Just got a promotion!", 1),
("Why does everything suck?", 0),
("Best vacation ever", 1),
("Iโm tired of this nonsense", 0),
("Such a lovely gesture", 1),
("That was rude and uncalled for", 0),
("Finally some good news!", 1),
("I'm so lonely", 0),
("My cat is the cutest", 1),
("This food tastes awful", 0),
("Celebrating small wins today", 1),
("Not in the mood", 0),
("Grateful for everything", 1),
("I feel useless", 0),
("Such a peaceful morning", 1),
("Another failure, just great", 0),
("Got accepted into college!", 1),
("I hate being ignored", 0),
("The sunset was breathtaking", 1),
("You ruined my day", 0),
("He makes me feel special", 1),
("Everything is falling apart", 0),
("Can't wait for the weekend", 1),
("So much stress right now", 0),
("Iโm in love", 1),
("I donโt care anymore", 0),
("Won first place!", 1),
("This is so frustrating", 0),
("He always cheers me up", 1),
("Feeling stuck", 0),
("Had a wonderful time", 1),
("Nothing matters", 0),
("Looking forward to tomorrow", 1),
("Just leave me alone", 0),
("We made it!", 1),
("Horrible customer service", 0),
("The music lifts my spirits", 1),
("I'm drowning in problems", 0),
("My team won the match", 1),
("I wish I never came", 0),
("Sunshine and good vibes", 1),
("Everything is a mess", 0),
("Love the energy here", 1),
("Feeling hopeless", 0),
("She always makes me smile", 1),
("So many regrets", 0),
("Today was a success", 1),
("Bad day again", 0),
("Iโm truly blessed", 1),
("This is depressing", 0),
("Can't stop smiling", 1),
("Everything hurts", 0),
("So excited for this!", 1),
("I hate myself", 0),
("Best concert ever", 1),
("Life is unfair", 0),
("Happy and content", 1),
("Crying inside", 0),
("Feeling inspired", 1),
("The service was awful", 0),
("Joy all around", 1),
("I feel dead inside", 0),
("Itโs a dream come true", 1),
("Nothing good ever happens", 0),
("Feeling positive", 1),
("That hurt my feelings", 0),
("Success tastes sweet", 1),
("I can't handle this", 0),
("We had a blast", 1),
("Itโs not worth it", 0),
("Heโs such a kind soul", 1),
("I'm broken", 0),
("Everything is perfect", 1),
("So tired of pretending", 0),
("What a nice surprise!", 1),
("I feel empty", 0),
("Canโt wait to start!", 1),
("It's always my fault", 0),
("A new beginning", 1),
("So much pain", 0),
("My heart is full", 1),
("This sucks", 0),
("I feel accomplished", 1),
("Why bother", 0),
("Living my best life", 1),
("I just want to disappear", 0)
]
# ุชุฌุฑุจุฉ ุงููู
ูุฐุฌ ูู
ูุงุฑูุฉ ุงููุชูุฌุฉ
correct = 0
for i, (text, true_label) in enumerate(samples):
result = classifier(text)[0]
predicted_label = 1 if result["label"] == "LABEL_1" else 0
is_correct = predicted_label == true_label
correct += is_correct
print(f"{i+1}. \"{text}\"")
print(f" ๐ Model โ {predicted_label} | ๐ฏ True โ {true_label} | {'โ๏ธ Correct' if is_correct else 'โ Wrong'}\n")
# ุฏูุฉ ุงููู
ูุฐุฌ
accuracy = correct / len(samples)
print(f"โ
Accuracy: {accuracy * 100:.2f}%")
Development and Assistance
This model was developed and trained using Google Colab, with guidance and technical assistance from ChatGPT, which was used for idea generation, code authoring, and troubleshooting throughout the development process.
Source Code
The full code used to prepare and train the model is available on GitHub:
๐ GitHub file source.
๐ License
MIT License. Free to use, modify, and share with attribution.
๐ค Author
Developed by Hatem Moushir Contact: [email protected]
- Downloads last month
- 34