uvegesistvan
/

EGD_distilbert-base-multilingual-cased

@@ -18,44 +18,56 @@ license: apache-2.0
 # EGD DistilBERT (Multilingual Cased)
-## Modell áttekintés
-Ez a modell egy **DistilBERT-base-multilingual-cased** architektúrára épül, és **angol, magyar és német** nyelvű adatokon lett finomhangolva **Európai Parlamenti beszédek** moralista és realista jellegének osztályozására.
-A modell három kategóriába sorolja a szövegeket:
-- **0 - Other** (egyéb, nem besorolható szöveg)
-- **1 - Moralista** (morális érveket hangsúlyozó beszédek)
-- **2 - Realista** (realista vagy pragmatikus érvelést alkalmazó beszédek)
-A modell alkalmas **politikai diskurzusok**, illetve **retorikai stílusok elemzésére** a három nyelven.
 ---
-## Eredmények
-A modellt egy **938 mondatból álló teszthalmazon** értékeltük. Az eredmények az alábbiak:
 | Label  | Precision | Recall | F1-score | Support |
 |--------|-----------|--------|----------|---------|
 | **0 - Other** | 0.91 | 0.92 | 0.92 | 783 |
-| **1 - Moralista** | 0.49 | 0.40 | 0.44 | 65 |
-| **2 - Realista** | 0.43 | 0.44 | 0.44 | 90 |
-- **Összpontosság (accuracy):** **0.84**
-- **Makro átlag (macro avg) F1-score:** **0.60**
-- **Súlyozott átlag (weighted avg) F1-score:** **0.84**
-A modell megbízhatóan különbözteti meg az általános (other) osztályt a moralista és realista érvektől, ugyanakkor a kisebb osztályoknál (1 és 2) a pontosság alacsonyabb.
 ---
-## Használat
-A modellt a **Hugging Face Transformers könyvtárával** töltheted be és használhatod:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"
-#

 # EGD DistilBERT (Multilingual Cased)
+## Model Overview
+This model is based on **DistilBERT-base-multilingual-cased** and has been **fine-tuned on English, Hungarian, and German** data for text classification of **European Parliamentary speeches** into rhetorical categories.
+The model classifies text into three categories:
+- **0 - Other** (text that does not fit into moralist or realist categories)
+- **1 - Moralist** (arguments emphasizing moral reasoning)
+- **2 - Realist** (arguments applying pragmatic or realist reasoning)
+This model is useful for **analyzing political discourse and rhetorical styles** in multiple languages.
 ---
+## Evaluation Results
+The model was evaluated on a **test set of 938 sentences**, with the following results:
 | Label  | Precision | Recall | F1-score | Support |
 |--------|-----------|--------|----------|---------|
 | **0 - Other** | 0.91 | 0.92 | 0.92 | 783 |
+| **1 - Moralist** | 0.49 | 0.40 | 0.44 | 65 |
+| **2 - Realist** | 0.43 | 0.44 | 0.44 | 90 |
+- **Overall accuracy:** **0.84**
+- **Macro average F1-score:** **0.60**
+- **Weighted average F1-score:** **0.84**
+The model reliably distinguishes the general (other) class from moralist and realist arguments, though performance on the minority classes (1 and 2) is lower.
 ---
+## Usage
+This model can be used with the **Hugging Face Transformers library**:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Classify an example text
+text = "The European Union has a responsibility towards future generations."
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+logits = outputs.logits
+# Get predicted class
+predicted_class = logits.argmax().item()
+print(f"Predicted class: {predicted_class}")