Turkish NLP E-Commerce
Collection
3 items
•
Updated
This model is a fine-tuned version of dbmdz/bert-base-turkish-cased specifically trained for aspect term extraction from Turkish e-commerce product reviews.
B-ASPECT
: Beginning of an aspect termI-ASPECT
: Inside/continuation of an aspect termO
: Outside (not an aspect term)The model showed consistent improvement across epochs:
Epoch | Loss |
---|---|
1 | 0.1758 |
2 | 0.1749 |
3 | 0.1217 |
4 | 0.1079 |
5 | 0.0699 |
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
# Create pipeline
aspect_extractor = pipeline("token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple")
# Example usage
text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz."
results = aspect_extractor(text)
print(results)
Expected Output:
[{'entity_group': 'ASPECT', 'score': 0.99498886, 'word': 'kamerası', 'start': 13, 'end': 21},
{'entity_group': 'ASPECT', 'score': 0.9970175, 'word': 'bataryası', 'start': 34, 'end': 43}]
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
# Example text
text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz."
# Tokenize input
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class_ids = predictions.argmax(dim=-1)
# Convert predictions to labels
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[0]]
# Display results
for token, label in zip(tokens, predicted_labels):
if token not in ['[CLS]', '[SEP]', '[PAD]']:
print(f"{token}: {label}")
Expected Output:
Bu: O
telefonun: O
kamerası: B-ASPECT
çok: O
iyi: O
ama: O
batarya: B-ASPECT
##sı: I-ASPECT
yetersiz: O
.: O
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
# Example texts for batch processing
texts = [
"Bu telefonun kamerası çok iyi ama bataryası yetersiz.",
"Ürünün fiyatı uygun ancak kalitesi düşük.",
"Teslimat hızı mükemmel, ambalaj da gayet sağlam."
]
# Tokenize all texts
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)
# Get predictions for all texts
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class_ids = predictions.argmax(dim=-1)
# Process results for each text
for i, text in enumerate(texts):
print(f"\nText {i+1}: {text}")
print("-" * 50)
# Get tokens for this specific text
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][i])
predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[i]]
# Display results
for token, label in zip(tokens, predicted_labels):
if token not in ['[CLS]', '[SEP]', '[PAD]']:
print(f"{token}: {label}")
Expected Output:
Text 1: Bu telefonun kamerası çok iyi ama bataryası yetersiz.
Bu: O
telefonun: O
kamerası: B-ASPECT
çok: O
iyi: O
ama: O
batarya: B-ASPECT
##sı: I-ASPECT
yetersiz: O
.: O
Text 2: Ürünün fiyatı uygun ancak kalitesi düşük.
Ürünün: O
fiyatı: B-ASPECT
uygun: O
ancak: O
kalitesi: B-ASPECT
düşük: O
.: O
Text 3: Teslimat hızı mükemmel, ambalaj da gayet sağlam.
Teslim: B-ASPECT
##at: I-ASPECT
hızı: I-ASPECT
mükemmel: O
,: O
ambalaj: B-ASPECT
da: O
gayet: O
sağlam: O
.: O
id2label = {
0: "O",
1: "B-ASPECT",
2: "I-ASPECT"
}
label2id = {
"O": 0,
"B-ASPECT": 1,
"I-ASPECT": 2
}
This model is designed for:
If you use this model, please cite:
@misc{turkish-bert-aspect-extraction,
title={Turkish BERT for Aspect Term Extraction},
author={Abdullah Koçak},
year={2025},
url={https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction}
}
@misc{schweter2020bertbase,
title={BERTurk - BERT models for Turkish},
author={Stefan Schweter},
year={2020},
publisher={Hugging Face},
url={https://huggingface.co/dbmdz/bert-base-turkish-cased}
}
Base model
dbmdz/bert-base-turkish-cased