metadata
license: cc-by-sa-4.0
datasets:
- Kostya165/ru_emotion_dvach
language:
- ru
metrics:
- accuracy
base_model:
- cointegrated/rubert-tiny2
pipeline_tag: text-classification
tags:
- russian
- emotion
- sentiment
- sentiment-analisys
- emotion-analisys
- emotion-classification
- emotion-detection
- rubert
- rubert-tiny
rubert_tiny2_russian_emotion_sentiment
Описание
Модель rubert_tiny2_russian_emotion_sentiment
— это дообученная версия легковесной модели cointegrated/rubert-tiny2
для классификации пяти эмоций в русскоязычных сообщениях:
- 0: aggression (агрессия)
- 1: anxiety (тревожность)
- 2: neutral (нейтральное состояние)
- 3: positive (позитив)
- 4: sarcasm (сарказм)
Результаты на валидации
Метрика | Значение |
---|---|
Accuracy | 0.8911 |
F1 macro | 0.8910 |
F1 micro | 0.8911 |
Точность по классам:
- агрессия (0): 0.9120
- тревожность (1): 0.9462
- нейтральное (2): 0.8663
- позитив (3): 0.8884
- сарказм (4): 0.8426
Использование
pip install transformers torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Загружаем модель и токенизатор
MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()
texts = [
"Сегодня отличный день!",
"Меня это всё бесит и раздражает."
]
# Токенизация
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
preds = logits.argmax(dim=-1).tolist()
# Преобразуем ID обратно в метки
id2label = model.config.id2label
labels = [id2label[p] for p in preds]
print(labels) # например: ['positive', 'aggression']
Как было обучено
- База:
cointegrated/rubert-tiny2
- Датасет:
Kostya165/ru_emotion_dvach
- Эпохи: 2
- Batch size: 32
- LR: 1e-5
- Mixed precision: FP16
- Регуляризация: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1
Зависимости
transformers>=4.30.0
torch>=1.10.0
datasets
evaluate
Лицензия
CC-BY-SA 4.0.
Цитирование
@article{rubert_tiny2_russian_emotion_sentiment,
title = {Russian Emotion Sentiment Classification with RuBERT-tiny2},
author = {Kostya165},
year = {2024},
howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
}
English
rubert_tiny2_russian_emotion_sentiment
Description
The rubert_tiny2_russian_emotion_sentiment
model is a fine‑tuned version of the lightweight cointegrated/rubert-tiny2
for classifying five emotions in Russian text:
- 0: aggression
- 1: anxiety
- 2: neutral
- 3: positive
- 4: sarcasm
Validation Results
Metric | Value |
---|---|
Accuracy | 0.8911 |
F1 macro | 0.8910 |
F1 micro | 0.8911 |
Per‑class accuracy:
- aggression: 0.9120
- anxiety: 0.9462
- neutral: 0.8663
- positive: 0.8884
- sarcasm: 0.8426
Usage
pip install transformers torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()
texts = ["Сегодня отличный день!", "Меня это всё бесит и раздражает."]
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
preds = logits.argmax(dim=-1).tolist()
labels = [model.config.id2label[p] for p in preds]
print(labels) # e.g. ['positive', 'aggression']
Training Details
- Base:
cointegrated/rubert-tiny2
- Dataset:
Kostya165/ru_emotion_dvach
(train/validation) - Epochs: 2
- Batch size: 32
- Learning rate: 1e‑5
- Mixed precision: FP16
- Regularization: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1
Requirements
transformers>=4.30.0
torch>=1.10.0
datasets
evaluate
License
CC-BY-SA 4.0.
Citation
@article{rubert_tiny2_russian_emotion_sentiment,
title = {Russian Emotion Sentiment Classification with RuBERT-tiny2},
author = {Kostya165},
year = {2024},
howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
}