File size: 3,316 Bytes
20e2301
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
language: en
tags:
- text-classification
- sentiment-analysis
- transformers
- pytorch
- multilingual
- xlm-roberta
- Siyovush Mirzoev
- Tajik
- Tajikistan
license: mit
---

# advexon/multilingual-sentiment-classifier

Multilingual text classification model trained on XLM-RoBERTa base for sentiment analysis across English, Russian, Tajik and other languages

## Model Description

This is a multilingual text classification model based on XLM-RoBERTa. It has been fine-tuned for sentiment analysis across multiple languages and can classify text into positive, negative, and neutral categories.

## Model Details

- **Base Model**: XLM-RoBERTa Base
- **Model Type**: XLMRobertaForSequenceClassification
- **Number of Labels**: 3 (Negative, Neutral, Positive)
- **Languages**: Multilingual (English, Russian, Tajik, and others)
- **Max Sequence Length**: 512 tokens
- **Hidden Size**: 768
- **Attention Heads**: 12
- **Layers**: 12

## Performance

Based on training metrics:
- **Training Accuracy**: 58.33%
- **Validation Accuracy**: 100%
- **Training Loss**: 0.94
- **Validation Loss**: 0.79

## Usage

### Using the Model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("advexon/multilingual-sentiment-classifier")
model = AutoModelForSequenceClassification.from_pretrained("advexon/multilingual-sentiment-classifier")

# Example usage
text = "This product is amazing!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=1).item()

# Class mapping: 0=Negative, 1=Neutral, 2=Positive
sentiment_labels = ["Negative", "Neutral", "Positive"]
predicted_sentiment = sentiment_labels[predicted_class]
print(f"Predicted sentiment: {predicted_sentiment}")
```

### Example Predictions

- "I absolutely love this product!" → Positive
- "This is terrible quality." → Negative  
- "It's okay, nothing special." → Neutral
- "Отличный сервис!" → Positive (Russian)
- "Хунуки хуб нест" → Negative (Tajik)

## Training

This model was trained using:
- **Base Model**: XLM-RoBERTa Base
- **Optimizer**: AdamW
- **Learning Rate**: 2e-5
- **Batch Size**: 16
- **Training Epochs**: 2
- **Languages**: English, Russian, Tajik

## Model Architecture

The model uses the XLM-RoBERTa architecture with:
- 12 transformer layers
- 768 hidden dimensions
- 12 attention heads
- 3 classification heads for sentiment analysis

## Limitations

- The model's performance may vary across different languages
- It is recommended to fine-tune on domain-specific data for optimal performance
- Maximum input length is 512 tokens
- Performance may be lower on languages not well-represented in the training data

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{multilingual-text-classifier,
  title={Multilingual Text Classification Model},
  author={Advexon},
  year={2024},
  publisher={Siyovush Mirzoev},
  journal={Hugging Face Hub},
  howpublished={\url{https://huggingface.co/advexon/multilingual-sentiment-classifier}},
}
```
## Contact me
 [email protected] / +992 710707777 WhatsApp