Text Classification
Transformers
Safetensors
PyTorch
Burmese
English
xlm-roberta
sentiment
burmese

Burmese Sentiment Analysis with XLM-RoBERTa

Model Details

Model Description

This model is a fine-tuned version of FacebookAI/xlm-roberta-base for Burmese sentiment analysis.
It classifies Burmese text into one of three sentiment categories:

  • Positive
  • Negative
  • Neutral

The model was trained using publicly available Burmese sentiment datasets and additional manually curated data, with careful preprocessing to normalize encoding (Zawgyi โ†’ Unicode conversion).


Uses

Direct Use

  • Sentiment classification of Burmese text from social media, reviews, comments, and other user-generated content.
  • Building sentiment-aware Burmese NLP applications such as chatbots, analytics dashboards, and content moderation tools.

Limitations

  • May not generalize well to domains significantly different from the training data.
  • May misclassify sentences with mixed sentiments or sarcasm.
  • Performance may drop for code-mixed Burmese-English text with heavy slang or informal spelling.

Training Details

Training Data

Training Procedure

  • Optimizer: AdamW (default in Hugging Face Trainer)
  • Learning rate: 2e-5
  • Batch size: 8 (train & eval)
  • Epochs: 3
  • Weight decay: 0.01
  • Mixed precision (fp16): Enabled when training on GPU
  • Metric for best model: F1 score (weighted average)
  • Evaluation strategy: Per epoch
  • Model selection: Best F1 score checkpoint

Evaluation

Metrics

The model was evaluated on a held-out validation set using accuracy, precision, recall, and F1 score.

Epoch Val Loss Accuracy Precision Recall F1
1 0.6171 0.7859 0.7994 0.7859 0.7875
2 0.4268 0.8470 0.8465 0.8470 0.8464
3 0.4115 0.8451 0.8447 0.8451 0.8448

The final model used is the checkpoint with the highest F1 score.


How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "emilyyy04/burmese-sentiment-xlm-roberta"  # Replace with actual repo name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "แ€’แ€ฎแ€‡แ€ฌแ€แ€บแ€œแ€™แ€บแ€ธแ€€ แ€แ€€แ€šแ€บแ€€แ€ฑแ€ฌแ€„แ€บแ€ธแ€แ€šแ€บแ‹"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()

label_map = {0: "positive", 1: "negative", 2: "neutral"}
print("Predicted Sentiment:", label_map[predicted_class])
Downloads last month
44
Safetensors
Model size
278M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for emilyyy04/burmese-sentiment-xlm-roberta

Finetuned
(3286)
this model

Datasets used to train emilyyy04/burmese-sentiment-xlm-roberta

Space using emilyyy04/burmese-sentiment-xlm-roberta 1