Turkish Sentiment Analysis (3-class) — Fine-tuned bert-base-turkish-128k

Overview

This model is a fine-tuned version of dbmdz/bert-base-turkish-128k-uncased for 3-class Turkish sentiment analysis. It was trained on an imbalanced dataset of e-commerce product reviews, and hyperparameters were optimized with Optuna to obtain the most effective fine-tuning configuration.

Bu model, üç sınıflı Türkçe duygu analizi için dbmdz/bert-base-turkish-128k-uncased taban alınarak ince ayar (fine-tuning) yapılmış bir sürümdür. Model, dengesiz bir e-ticaret ürün yorumları veri kümesi üzerinde eğitilmiş; en etkili ince ayar yapılandırmasını elde etmek için hiperparametreler Optuna ile optimize edilmiştir.

Intended Use

Product reviews classification
Social media analysis
Customer feedback analysis
Brand monitoring
Market research
Customer service optimization
Competitive intelligence

Model Details

Field	Value
Model Name	msamilim/bert_base_128k_uncased_finetuned_optuna_turkish_sentiment
Base Model	dbmdz/bert-base-turkish-128k-uncased
Task	Sentiment Analysis
Language	Turkish
Fine-Tuning Dataset	Turkish E-Commerce Product Reviews Dataset
Number of Labels	3
Problem Type	Single-label classification
License	apache-2.0
Fine-Tuning Framework	Hugging Face Transformers

Dataset

The dataset is a Turkish three-class sentiment corpus (negatif / notr / pozitif). Overall distribution and per-split distributions are shown below.

Dataset Distribution (Overall)

LabelID	LabelName	Count	Ratio (%)
0	negatif	9462	18.86
1	notr	746	1.49
2	pozitif	39952	79.65
—	Total	50160	100.00

Training Procedure

Objective metric: eval_macro_f1
Hyperparameter Optimization Techniques: Optuna

HPO Parameter Ranges

params = {
        "learning_rate": trial.suggest_float("learning_rate", 5e-6, 5e-5, log=True),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32]),
        "per_device_eval_batch_size":  trial.suggest_categorical("per_device_eval_batch_size",  [32]),
        "weight_decay": trial.suggest_float("weight_decay", 0.0, 0.1),
        "warmup_ratio": trial.suggest_float("warmup_ratio", 0.0, 0.2),
        "num_train_epochs": trial.suggest_int("num_train_epochs", 6, 8),
        "gradient_accumulation_steps": trial.suggest_categorical("gradient_accumulation_steps", [1, 2, 4]),
    }

Best Trial Hyperparameters

{
  "learning_rate": 4.834910037829986e-05,
  "per_device_train_batch_size": 32,
  "per_device_eval_batch_size": 32,
  "weight_decay": 0.07239811900518305,
  "warmup_ratio": 0.13792340720630256,
  "num_train_epochs": 8,
  "gradient_accumulation_steps": 2
}

Evaluation Results

Metric	Value
accuracy	0.9231
loss	0.3944
macro_f1	0.6578
runtime	2.5646
samples_per_second	1364.7420
steps_per_second	42.8920

Epoch-wise Metrics

epoch	train_loss	eval_accuracy	eval_loss	eval_macro_f1	eval_runtime	eval_samples_per_second	eval_steps_per_second
1	0.4133	0.9246	0.2150	0.6037	2.6201	1335.8380	41.9830
2	0.2032	0.9251	0.2049	0.5893	2.5654	1364.3220	42.8790
3	0.1631	0.9334	0.1925	0.6153	2.5461	1374.6530	43.2030
4	0.1231	0.9306	0.2484	0.6318	2.5557	1369.5000	43.0410
5	0.0899	0.9289	0.2577	0.6352	2.7704	1263.3530	39.7050
6	0.0604	0.9257	0.3342	0.6706	2.6759	1307.9580	41.1070
7	0.0383	0.9240	0.3696	0.6618	2.5877	1352.5710	42.5090
8	0.0232	0.9231	0.3944	0.6578	2.5646	1364.7420	42.8920

How to use - Pipeline

from transformers import pipeline

# Load the classification pipeline with the specified model
model_name = "msamilim/bert_base_128k_uncased_finetuned_optuna_turkish_sentiment"
pipe = pipeline("text-classification", model=model_name)

# Classify a new sentence
sentence = "Güzel ürün, tavsiye ederim."
result = pipe(sentence)

# Print the result
print(result)

# Example output : 
# [{'label': 'pozitif', 'score': 0.9998408555984497}]

How to use - Full Classification

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "msamilim/bert_base_128k_uncased_finetuned_optuna_turkish_sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(texts):
    inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    id2label = {    0: "Negatif",    1: "Nötr",    2: "Pozitif"}
    return [id2label[p] for p in torch.argmax(probabilities, dim=-1).tolist()]

texts = [
     "Güzel ürün, tavsiye ederim kullanılır.", 
     "Ürün çok güzel ve kaliteli. Maalesef yüzüme uymadığı için iade etmek zorunda kaldım.", 
     "Keşke aldıktan sonra indirime girmeseydi.",
     "Daha soluk ve mat yapısı var beğenmedim .",
]

for text, sentiment in zip(texts, predict_sentiment(texts)):
    print(f"Text: {text}\nSentiment: {sentiment}\n")

# Example output : 
# Text: Güzel ürün, tavsiye ederim kullanılır.
# Sentiment: Pozitif
# Text: Ürün çok güzel ve kaliteli. Maalesef yüzüme uymadığı için iade etmek zorunda kaldım.
# Sentiment: Pozitif
# Text: Keşke aldıktan sonra indirime girmeseydi.
# Sentiment: Negatif
# Text: Daha soluk ve mat yapısı var beğenmedim .
# Sentiment: Negatif

Framework versions

transformers==4.56.2
torch==2.5.1+cu121
datasets==4.1.1
accelerate==1.10.1
evaluate==0.4.6
python==3.11.13

Downloads last month: 21

Safetensors

Model size

184M params

Tensor type

F32

Model tree for msamilim/bert_base_128k_uncased_finetuned_optuna_turkish_sentiment

Base model

dbmdz/bert-base-turkish-128k-uncased

Finetuned

(9)

this model