Update README.md
Browse files
README.md
CHANGED
@@ -18,44 +18,56 @@ license: apache-2.0
|
|
18 |
|
19 |
# EGD DistilBERT (Multilingual Cased)
|
20 |
|
21 |
-
##
|
22 |
|
23 |
-
|
24 |
|
25 |
-
|
26 |
-
- **0 - Other** (
|
27 |
-
- **1 -
|
28 |
-
- **2 -
|
29 |
|
30 |
-
|
31 |
|
32 |
---
|
33 |
|
34 |
-
##
|
35 |
|
36 |
-
|
37 |
|
38 |
| Label | Precision | Recall | F1-score | Support |
|
39 |
|--------|-----------|--------|----------|---------|
|
40 |
| **0 - Other** | 0.91 | 0.92 | 0.92 | 783 |
|
41 |
-
| **1 -
|
42 |
-
| **2 -
|
43 |
|
44 |
-
-
|
45 |
-
- **
|
46 |
-
- **
|
47 |
|
48 |
-
|
49 |
|
50 |
---
|
51 |
|
52 |
-
##
|
53 |
|
54 |
-
|
55 |
|
56 |
```python
|
57 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
58 |
|
59 |
model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"
|
60 |
|
61 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
# EGD DistilBERT (Multilingual Cased)
|
20 |
|
21 |
+
## Model Overview
|
22 |
|
23 |
+
This model is based on **DistilBERT-base-multilingual-cased** and has been **fine-tuned on English, Hungarian, and German** data for text classification of **European Parliamentary speeches** into rhetorical categories.
|
24 |
|
25 |
+
The model classifies text into three categories:
|
26 |
+
- **0 - Other** (text that does not fit into moralist or realist categories)
|
27 |
+
- **1 - Moralist** (arguments emphasizing moral reasoning)
|
28 |
+
- **2 - Realist** (arguments applying pragmatic or realist reasoning)
|
29 |
|
30 |
+
This model is useful for **analyzing political discourse and rhetorical styles** in multiple languages.
|
31 |
|
32 |
---
|
33 |
|
34 |
+
## Evaluation Results
|
35 |
|
36 |
+
The model was evaluated on a **test set of 938 sentences**, with the following results:
|
37 |
|
38 |
| Label | Precision | Recall | F1-score | Support |
|
39 |
|--------|-----------|--------|----------|---------|
|
40 |
| **0 - Other** | 0.91 | 0.92 | 0.92 | 783 |
|
41 |
+
| **1 - Moralist** | 0.49 | 0.40 | 0.44 | 65 |
|
42 |
+
| **2 - Realist** | 0.43 | 0.44 | 0.44 | 90 |
|
43 |
|
44 |
+
- **Overall accuracy:** **0.84**
|
45 |
+
- **Macro average F1-score:** **0.60**
|
46 |
+
- **Weighted average F1-score:** **0.84**
|
47 |
|
48 |
+
The model reliably distinguishes the general (other) class from moralist and realist arguments, though performance on the minority classes (1 and 2) is lower.
|
49 |
|
50 |
---
|
51 |
|
52 |
+
## Usage
|
53 |
|
54 |
+
This model can be used with the **Hugging Face Transformers library**:
|
55 |
|
56 |
```python
|
57 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
58 |
|
59 |
model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"
|
60 |
|
61 |
+
# Load tokenizer and model
|
62 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
63 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
64 |
+
|
65 |
+
# Classify an example text
|
66 |
+
text = "The European Union has a responsibility towards future generations."
|
67 |
+
inputs = tokenizer(text, return_tensors="pt")
|
68 |
+
outputs = model(**inputs)
|
69 |
+
logits = outputs.logits
|
70 |
+
|
71 |
+
# Get predicted class
|
72 |
+
predicted_class = logits.argmax().item()
|
73 |
+
print(f"Predicted class: {predicted_class}")
|