Improve model card: Add pipeline tag, update license, GitHub link, and detailed sections
Browse filesThis PR significantly enhances the model card for `mdeberta-v3-base-subjectivity-german` by:
* **Updating metadata**:
* Adding `pipeline_tag: text-classification` for improved discoverability.
* Correcting the `license` from `mit` to `cc-by-4.0` as specified in the GitHub repository.
* Adding relevant `tags` such as `subjectivity-detection`, `mdeberta-v3`, and `sentiment`.
* **Populating content sections**:
* Adding a direct link to the **GitHub repository**.
* Filling in comprehensive details for "Model description," "Intended uses & limitations," and "Training and evaluation data" using information from the paper abstract and GitHub README.
* Providing a clear **sample usage** example with Python code.
* Including a proper **citation** for the paper.
These changes provide users with more context, usage instructions, and relevant information about the model, making it more accessible and useful on the Hugging Face Hub.
@@ -1,17 +1,21 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
-
license: mit
|
4 |
base_model: microsoft/mdeberta-v3-base
|
5 |
-
|
6 |
-
-
|
|
|
|
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
- f1
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
model-index:
|
11 |
- name: mdeberta-v3-base-subjectivity-german
|
12 |
results: []
|
13 |
-
language:
|
14 |
-
- de
|
15 |
---
|
16 |
|
17 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -19,7 +23,12 @@ should probably proofread and complete it, then remove this comment. -->
|
|
19 |
|
20 |
# mdeberta-v3-base-subjectivity-german
|
21 |
|
22 |
-
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base)
|
|
|
|
|
|
|
|
|
|
|
23 |
It achieves the following results on the evaluation set:
|
24 |
- Loss: 0.5760
|
25 |
- Macro F1: 0.7720
|
@@ -32,15 +41,50 @@ It achieves the following results on the evaluation set:
|
|
32 |
|
33 |
## Model description
|
34 |
|
35 |
-
|
36 |
|
37 |
## Intended uses & limitations
|
38 |
|
39 |
-
|
|
|
|
|
|
|
40 |
|
41 |
## Training and evaluation data
|
42 |
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
## Training procedure
|
46 |
|
@@ -72,4 +116,16 @@ The following hyperparameters were used during training:
|
|
72 |
- Transformers 4.49.0
|
73 |
- Pytorch 2.5.1+cu121
|
74 |
- Datasets 3.3.1
|
75 |
-
- Tokenizers 0.21.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
2 |
base_model: microsoft/mdeberta-v3-base
|
3 |
+
language:
|
4 |
+
- de
|
5 |
+
library_name: transformers
|
6 |
+
license: cc-by-4.0
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
- f1
|
10 |
+
pipeline_tag: text-classification
|
11 |
+
tags:
|
12 |
+
- generated_from_trainer
|
13 |
+
- subjectivity-detection
|
14 |
+
- mdeberta-v3
|
15 |
+
- sentiment
|
16 |
model-index:
|
17 |
- name: mdeberta-v3-base-subjectivity-german
|
18 |
results: []
|
|
|
|
|
19 |
---
|
20 |
|
21 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
23 |
|
24 |
# mdeberta-v3-base-subjectivity-german
|
25 |
|
26 |
+
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) for **Subjectivity Detection in News Articles**, as presented in the paper [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://arxiv.org/abs/2507.11764).
|
27 |
+
|
28 |
+
It was developed as part of AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1.
|
29 |
+
|
30 |
+
**Code:** [https://github.com/MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat)
|
31 |
+
|
32 |
It achieves the following results on the evaluation set:
|
33 |
- Loss: 0.5760
|
34 |
- Macro F1: 0.7720
|
|
|
41 |
|
42 |
## Model description
|
43 |
|
44 |
+
This model is designed to classify sentences in news articles as either **subjective** (e.g., opinion-laden) or **objective**. This is a key component in combating misinformation, improving fact-checking pipelines, and supporting journalists. It is based on the `mDeBERTaV3-base` architecture and enhances transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, combined with robust decision threshold calibration to address class imbalance, significantly boosts performance, especially for the subjective F1 score.
|
45 |
|
46 |
## Intended uses & limitations
|
47 |
|
48 |
+
This model is intended for classifying sentences in news articles as subjective or objective across various languages. It has been evaluated in monolingual (Arabic, German, English, Italian, Bulgarian), multilingual, and zero-shot transfer settings (Greek, Polish, Romanian, Ukrainian). It is particularly useful for applications requiring fine-grained text analysis in news contexts, such as misinformation detection, fact-checking, and journalistic tools.
|
49 |
+
|
50 |
+
**Limitations:**
|
51 |
+
While designed to handle class imbalance through decision threshold calibration, the model's performance on certain languages or specific class distributions might vary. As highlighted in the original work, initial submission errors revealed the sensitivity to proper calibration. The model's primary focus is on news article text, and its generalization to other domains or highly nuanced subjective expressions might vary.
|
52 |
|
53 |
## Training and evaluation data
|
54 |
|
55 |
+
The model was trained and evaluated on datasets provided for the **CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles**. Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian. For final evaluation, additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) were included to assess generalization capabilities. The training process focused on enhancing transformer embeddings with sentiment signals and employing decision threshold calibration to mitigate class imbalance prevalent across languages.
|
56 |
+
|
57 |
+
## How to use
|
58 |
+
You can use this model directly with the Hugging Face `transformers` library for text classification:
|
59 |
+
|
60 |
+
```python
|
61 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
62 |
+
import torch
|
63 |
+
|
64 |
+
model_name = "MatteoFasulo/mdeberta-v3-base-subjectivity-german"
|
65 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
66 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
67 |
+
|
68 |
+
# Example usage for an objective sentence:
|
69 |
+
text_objective = "Der Bundeskanzler traf sich heute mit dem französischen Präsidenten." # The Chancellor met the French President today.
|
70 |
+
inputs_obj = tokenizer(text_objective, return_tensors="pt")
|
71 |
+
|
72 |
+
with torch.no_grad():
|
73 |
+
logits_obj = model(**inputs_obj).logits
|
74 |
+
|
75 |
+
predicted_class_id_obj = logits_obj.argmax().item()
|
76 |
+
print(f"'{text_objective}' is classified as: {model.config.id2label[predicted_class_id_obj]}")
|
77 |
+
|
78 |
+
# Example usage for a subjective sentence:
|
79 |
+
text_subjective = "Ich denke, dass diese Entscheidung eine Katastrophe ist." # I think that this decision is a disaster.
|
80 |
+
inputs_subj = tokenizer(text_subjective, return_tensors="pt")
|
81 |
+
|
82 |
+
with torch.no_grad():
|
83 |
+
logits_subj = model(**inputs_subj).logits
|
84 |
+
|
85 |
+
predicted_class_id_subj = logits_subj.argmax().item()
|
86 |
+
print(f"'{text_subjective}' is classified as: {model.config.id2label[predicted_class_id_subj]}")
|
87 |
+
```
|
88 |
|
89 |
## Training procedure
|
90 |
|
|
|
116 |
- Transformers 4.49.0
|
117 |
- Pytorch 2.5.1+cu121
|
118 |
- Datasets 3.3.1
|
119 |
+
- Tokenizers 0.21.0
|
120 |
+
|
121 |
+
## Citation
|
122 |
+
If you find our work helpful or inspiring, please feel free to cite it:
|
123 |
+
```bibtex
|
124 |
+
@article{aiwizards2025checkthat,
|
125 |
+
title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
|
126 |
+
author={Antoun, Wissam and Kulumba, Francis and Touchent, Rian and de la Clergerie, Éric and Sagot, Benoît and Seddah, Djamé},
|
127 |
+
journal={arXiv preprint arXiv:2507.11764},
|
128 |
+
year={2025},
|
129 |
+
url={https://arxiv.org/abs/2507.11764}
|
130 |
+
}
|
131 |
+
```
|