Turkish Sentiment Analysis (3-class) — Fine-tuned TyRoberta
Overview
This model is a fine-tuned version of VRLLab_TurkishBERTweet_finetuned_optuna_turkish_sentiment
for 3-class Turkish sentiment analysis. It was trained on an imbalanced dataset of e-commerce product reviews, and hyperparameters were optimized with Optuna to obtain the most effective fine-tuning configuration.
Bu model, üç sınıflı Türkçe duygu analizi için VRLLab_TurkishBERTweet_finetuned_optuna_turkish_sentiment
taban alınarak ince ayar (fine-tuning) yapılmış bir sürümdür. Model, dengesiz bir e-ticaret ürün yorumları veri kümesi üzerinde eğitilmiş; en etkili ince ayar yapılandırmasını elde etmek için hiperparametreler Optuna ile optimize edilmiştir.
Intended Use
- Product reviews classification
- Social media analysis
- Customer feedback analysis
- Brand monitoring
- Market research
- Customer service optimization
- Competitive intelligence
Model Details
Field |
Value |
Model Name |
msamilim/VRLLab_TurkishBERTweet_finetuned_optuna_turkish_sentiment_v02 |
Base Model |
VRLLab_TurkishBERTweet_finetuned_optuna_turkish_sentiment |
Task |
Sentiment Analysis |
Language |
Turkish |
Fine-Tuning Dataset |
Turkish E-Commerce Product Reviews Dataset |
Number of Labels |
3 |
Problem Type |
Single-label classification |
License |
apache-2.0 |
Fine-Tuning Framework |
Hugging Face Transformers |
Dataset
The dataset is a Turkish three-class sentiment corpus (negatif / notr / pozitif). Overall distribution and per-split distributions are shown below.
Dataset Distribution (Overall)
LabelID |
LabelName |
Count |
Ratio (%) |
0 |
negatif |
9462 |
18.86 |
1 |
notr |
746 |
1.49 |
2 |
pozitif |
39952 |
79.65 |
— |
Total |
50160 |
100.00 |
Training Procedure
- Objective metric:
eval_macro_f1
- Hyperparameter Optimization Techniques:
Optuna
HPO Parameter Ranges
params = {
"learning_rate": trial.suggest_float("learning_rate", 5e-6, 5e-5, log=True),
"per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32]),
"per_device_eval_batch_size": trial.suggest_categorical("per_device_eval_batch_size", [32]),
"weight_decay": trial.suggest_float("weight_decay", 0.0, 0.1),
"warmup_ratio": trial.suggest_float("warmup_ratio", 0.0, 0.2),
"num_train_epochs": trial.suggest_int("num_train_epochs", 6, 8),
"gradient_accumulation_steps": trial.suggest_categorical("gradient_accumulation_steps", [1, 2, 4]),
}
Best Trial Hyperparameters
{
"gradient_accumulation_steps": 1.0,
"learning_rate": 9.52810812981246e-06,
"num_train_epochs": 7.0,
"per_device_eval_batch_size": 32.0,
"per_device_train_batch_size": 16.0,
"warmup_ratio": 0.0898355081163283,
"weight_decay": 0.0718262528732965
}
Evaluation Results
Metric |
Value |
accuracy |
0.9103 |
loss |
0.7386 |
macro_f1 |
0.6014 |
runtime |
2.5064 |
samples_per_second |
1396.4400 |
steps_per_second |
43.8880 |
Epoch-wise Metrics
epoch |
train_loss |
eval_accuracy |
eval_loss |
eval_macro_f1 |
eval_runtime |
eval_samples_per_second |
eval_steps_per_second |
1 |
0.1104 |
0.9134 |
0.4424 |
0.6352 |
2.9387 |
1190.9840 |
37.4310 |
2 |
0.1107 |
0.9194 |
0.4984 |
0.6302 |
2.5437 |
1375.9330 |
43.2440 |
3 |
0.0813 |
0.9094 |
0.5466 |
0.6285 |
2.5496 |
1372.7590 |
43.1440 |
4 |
0.0498 |
0.9129 |
0.6645 |
0.6118 |
2.4822 |
1410.0340 |
44.3150 |
5 |
0.0278 |
0.9129 |
0.6938 |
0.5979 |
2.4873 |
1407.1440 |
44.2250 |
6 |
0.0202 |
0.9103 |
0.7386 |
0.6014 |
2.5064 |
1396.4400 |
43.8880 |
How to use - Pipeline
from transformers import pipeline
model_name = "msamilim/VRLLab_TurkishBERTweet_finetuned_optuna_turkish_sentiment_v02"
pipe = pipeline("text-classification", model=model_name)
sentence = "Güzel ürün, tavsiye ederim."
result = pipe(sentence)
print(result)
How to use - Full Classification
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "msamilim/VRLLab_TurkishBERTweet_finetuned_optuna_turkish_sentiment_v02"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def predict_sentiment(texts):
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
id2label = { 0: "Negatif", 1: "Nötr", 2: "Pozitif"}
return [id2label[p] for p in torch.argmax(probabilities, dim=-1).tolist()]
texts = [
"Güzel ürün, tavsiye ederim kullanılır.",
"Ürün çok güzel ve kaliteli. Maalesef yüzüme uymadığı için iade etmek zorunda kaldım.",
"Keşke aldıktan sonra indirime girmeseydi.",
"Daha soluk ve mat yapısı var beğenmedim .",
]
for text, sentiment in zip(texts, predict_sentiment(texts)):
print(f"Text: {text}\nSentiment: {sentiment}\n")
Framework versions
- transformers==4.57.0
- torch==2.8.0+cu128
- datasets==4.2.0
- accelerate==1.10.1
- evaluate==0.4.6
- python==3.11.13