uvegesistvan
/

EGD_distilbert-base-multilingual-cased

Text Classification

Model card Files Files and versions Metrics Training metrics Community

EGD_distilbert-base-multilingual-cased / README.md

uvegesistvan's picture

Update README.md

161feb5 verified 6 months ago

|

history blame contribute delete

2.27 kB

	---
	language:
	- en
	- hu
	- de
	library_name: transformers
	tags:
	- text-classification
	- multilingual
	- distilbert
	- fine-tuned
	datasets:
	- custom
	model_name: EGD_distilbert-base-multilingual-cased
	model_type: distilbert-base-multilingual-cased
	license: apache-2.0
	---

	# EGD DistilBERT (Multilingual Cased)

	## Model Overview

	This model is based on DistilBERT-base-multilingual-cased and has been fine-tuned on English, Hungarian, and German data for text classification of European Parliamentary speeches into rhetorical categories.

	The model classifies text into three categories:
	- 0 - Other (text that does not fit into moralist or realist categories)
	- 1 - Moralist (arguments emphasizing moral reasoning)
	- 2 - Realist (arguments applying pragmatic or realist reasoning)

	This model is useful for analyzing political discourse and rhetorical styles in multiple languages.

	---

	## Evaluation Results

	The model was evaluated on a test set of 938 sentences, with the following results:

	\| Label \| Precision \| Recall \| F1-score \| Support \|
	\|--------\|-----------\|--------\|----------\|---------\|
	\| 0 - Other \| 0.91 \| 0.92 \| 0.92 \| 783 \|
	\| 1 - Moralist \| 0.49 \| 0.40 \| 0.44 \| 65 \|
	\| 2 - Realist \| 0.43 \| 0.44 \| 0.44 \| 90 \|

	- Overall accuracy: 0.84
	- Macro average F1-score: 0.60
	- Weighted average F1-score: 0.84

	The model reliably distinguishes the general (other) class from moralist and realist arguments, though performance on the minority classes (1 and 2) is lower.

	---

	## Usage

	This model can be used with the Hugging Face Transformers library:

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Classify an example text
	text = "The European Union has a responsibility towards future generations."
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	logits = outputs.logits

	# Get predicted class
	predicted_class = logits.argmax().item()
	print(f"Predicted class: {predicted_class}")