uvegesistvan's picture
Update README.md
161feb5 verified
---
language:
- en
- hu
- de
library_name: transformers
tags:
- text-classification
- multilingual
- distilbert
- fine-tuned
datasets:
- custom
model_name: EGD_distilbert-base-multilingual-cased
model_type: distilbert-base-multilingual-cased
license: apache-2.0
---
# EGD DistilBERT (Multilingual Cased)
## Model Overview
This model is based on **DistilBERT-base-multilingual-cased** and has been **fine-tuned on English, Hungarian, and German** data for text classification of **European Parliamentary speeches** into rhetorical categories.
The model classifies text into three categories:
- **0 - Other** (text that does not fit into moralist or realist categories)
- **1 - Moralist** (arguments emphasizing moral reasoning)
- **2 - Realist** (arguments applying pragmatic or realist reasoning)
This model is useful for **analyzing political discourse and rhetorical styles** in multiple languages.
---
## Evaluation Results
The model was evaluated on a **test set of 938 sentences**, with the following results:
| Label | Precision | Recall | F1-score | Support |
|--------|-----------|--------|----------|---------|
| **0 - Other** | 0.91 | 0.92 | 0.92 | 783 |
| **1 - Moralist** | 0.49 | 0.40 | 0.44 | 65 |
| **2 - Realist** | 0.43 | 0.44 | 0.44 | 90 |
- **Overall accuracy:** **0.84**
- **Macro average F1-score:** **0.60**
- **Weighted average F1-score:** **0.84**
The model reliably distinguishes the general (other) class from moralist and realist arguments, though performance on the minority classes (1 and 2) is lower.
---
## Usage
This model can be used with the **Hugging Face Transformers library**:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Classify an example text
text = "The European Union has a responsibility towards future generations."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# Get predicted class
predicted_class = logits.argmax().item()
print(f"Predicted class: {predicted_class}")