Improve model card: Add pipeline tag, update license, expand details and links
Browse filesThis PR significantly enhances the model card by:
* Updating the `license` to `cc-by-4.0` as indicated in the official GitHub repository.
* Adding the `pipeline_tag: text-classification` for improved discoverability and direct usage via `transformers` pipelines.
* Enriching `tags` with relevant keywords like `subjectivity-detection`, `multilingual`, `sentiment`, `news`, and `mdeberta-v3`.
* Adding `language` tags for all supported languages (Arabic, German, English, Italian, Bulgarian, Greek, Polish, Romanian, Ukrainian).
* Adding `datasets` tag for `clef-2025-checkthat-lab-task-1-subjectivity`.
* Populating the "Model description", "Intended uses & limitations", and "Training and evaluation data" sections with comprehensive details extracted from the paper abstract and the GitHub README.
* Adding direct links to the official GitHub repository and the Hugging Face collection related to the project.
* Including a clear "How to use" code snippet for easy model inference.
* Adding a BibTeX "Citation" for the associated paper.
These updates provide a much richer and more user-friendly model card, improving clarity, discoverability, and adherence to Hugging Face Hub best practices.
@@ -1,24 +1,43 @@
|
|
1 |
---
|
|
|
|
|
2 |
library_name: transformers
|
3 |
-
|
4 |
-
- generated_from_trainer
|
5 |
metrics:
|
6 |
- accuracy
|
7 |
- f1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
model-index:
|
9 |
- name: mdeberta-v3-base-subjectivity-sentiment-multilingual
|
10 |
results: []
|
11 |
-
license: mit
|
12 |
-
base_model:
|
13 |
-
- microsoft/mdeberta-v3-base
|
14 |
---
|
15 |
|
16 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
17 |
-
should probably proofread and complete it, then remove this comment. -->
|
18 |
-
|
19 |
# mdeberta-v3-base-subjectivity-sentiment-multilingual
|
20 |
|
21 |
-
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
|
|
|
|
|
|
|
|
|
22 |
It achieves the following results on the evaluation set:
|
23 |
- Loss: 0.7762
|
24 |
- Macro F1: 0.7580
|
@@ -31,15 +50,55 @@ It achieves the following results on the evaluation set:
|
|
31 |
|
32 |
## Model description
|
33 |
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
## Intended uses & limitations
|
37 |
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
## Training and evaluation data
|
41 |
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
## Training procedure
|
45 |
|
@@ -65,10 +124,22 @@ The following hyperparameters were used during training:
|
|
65 |
| 0.3579 | 5.0 | 2010 | 0.7443 | 0.7476 | 0.7485 | 0.7614 | 0.7154 | 0.6440 | 0.8045 | 0.7518 |
|
66 |
| 0.3579 | 6.0 | 2412 | 0.7762 | 0.7580 | 0.7558 | 0.7614 | 0.7100 | 0.6878 | 0.7336 | 0.7676 |
|
67 |
|
68 |
-
|
69 |
### Framework versions
|
70 |
|
71 |
- Transformers 4.49.0
|
72 |
- Pytorch 2.5.1+cu121
|
73 |
- Datasets 3.3.1
|
74 |
-
- Tokenizers 0.21.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- microsoft/mdeberta-v3-base
|
4 |
library_name: transformers
|
5 |
+
license: cc-by-4.0
|
|
|
6 |
metrics:
|
7 |
- accuracy
|
8 |
- f1
|
9 |
+
tags:
|
10 |
+
- generated_from_trainer
|
11 |
+
- subjectivity-detection
|
12 |
+
- multilingual
|
13 |
+
- sentiment
|
14 |
+
- news
|
15 |
+
- mdeberta-v3
|
16 |
+
language:
|
17 |
+
- ar
|
18 |
+
- de
|
19 |
+
- en
|
20 |
+
- it
|
21 |
+
- bg
|
22 |
+
- el
|
23 |
+
- pl
|
24 |
+
- ro
|
25 |
+
- uk
|
26 |
+
datasets:
|
27 |
+
- clef-2025-checkthat-lab-task-1-subjectivity
|
28 |
+
pipeline_tag: text-classification
|
29 |
model-index:
|
30 |
- name: mdeberta-v3-base-subjectivity-sentiment-multilingual
|
31 |
results: []
|
|
|
|
|
|
|
32 |
---
|
33 |
|
|
|
|
|
|
|
34 |
# mdeberta-v3-base-subjectivity-sentiment-multilingual
|
35 |
|
36 |
+
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](https://arxiv.org/abs/2507.11764).
|
37 |
+
|
38 |
+
The official code repository can be found here: [https://github.com/MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat)
|
39 |
+
Explore related models and results on the Hugging Face Collection: [AI Wizards @ CLEF 2025 - CheckThat! Lab - Task 1 Subjectivity](https://huggingface.co/collections/MatteoFasulo/clef-2025-checkthat-lab-task-1-subjectivity-6878f0199d302acdfe2ceddb)
|
40 |
+
|
41 |
It achieves the following results on the evaluation set:
|
42 |
- Loss: 0.7762
|
43 |
- Macro F1: 0.7580
|
|
|
50 |
|
51 |
## Model description
|
52 |
|
53 |
+
This model, `mdeberta-v3-base-subjectivity-sentiment-multilingual`, is part of the AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. Its primary goal is to classify sentences as subjective (opinion-laden) or objective across monolingual, multilingual, and zero-shot settings. The model was evaluated on various languages including Arabic, German, English, Italian, Bulgarian (training/development) and unseen languages like Greek, Romanian, Polish, and Ukrainian (zero-shot evaluation).
|
54 |
+
|
55 |
+
The core innovation of this approach lies in enhancing transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture aims to improve upon standard fine-tuning, particularly boosting the subjective F1 score. To address class imbalance, prevalent across languages, decision threshold calibration optimized on the development set was employed.
|
56 |
+
|
57 |
+
Key contributions from the associated paper include:
|
58 |
+
* **Sentiment-Augmented Fine-Tuning**: Enriching typical embedding-based models by integrating sentiment scores, significantly improving subjective sentence detection.
|
59 |
+
* **Diverse Model Coverage**: Benchmarking `mDeBERTaV3-base` (multilingual), `ModernBERT-base` (English), and `Llama3.2-1B` (zero-shot LLM baseline).
|
60 |
+
* **Threshold Calibration for Imbalance**: A simple yet effective method to tune decision thresholds on each language’s development data to enhance macro-F1 performance.
|
61 |
+
|
62 |
+
The framework led to high rankings, notably 1st for Greek (Macro F1 = 0.51).
|
63 |
|
64 |
## Intended uses & limitations
|
65 |
|
66 |
+
This model is intended for subjectivity detection in news articles, classifying sentences as subjective or objective. This task is crucial for combating misinformation, improving fact-checking pipelines, and supporting journalists. It is designed to be applicable in both monolingual and multilingual contexts, demonstrating robust generalization capabilities to unseen languages in zero-shot settings.
|
67 |
+
|
68 |
+
**Intended uses:**
|
69 |
+
* Classifying sentences in news articles as subjective or objective.
|
70 |
+
* As a component in misinformation detection and fact-checking systems.
|
71 |
+
* Assisting journalists in analyzing news content for bias or opinion.
|
72 |
+
|
73 |
+
**Limitations:**
|
74 |
+
* As noted by the authors, an initial mistake in the submission process led to some lower official multilingual Macro F1 scores (e.g., 0.24). Corrected results indicate significantly better performance (Macro F1 = 0.68), which would have placed the model higher (9th overall). Users should be aware of the corrected performance metrics.
|
75 |
+
* Performance may vary across different languages and specific domains beyond news articles, although the model showed strong generalization in zero-shot settings.
|
76 |
|
77 |
## Training and evaluation data
|
78 |
|
79 |
+
The model was fine-tuned on datasets provided for the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles.
|
80 |
+
Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian. For final evaluation, additional unseen languages such as Greek, Romanian, Polish, and Ukrainian were included to assess generalization capabilities. The training procedure involved integrating sentiment features and applying decision threshold calibration, optimized on the development sets, to mitigate class imbalance.
|
81 |
+
|
82 |
+
## How to use
|
83 |
+
|
84 |
+
You can use this model directly with the Hugging Face `transformers` library to classify text:
|
85 |
+
|
86 |
+
```python
|
87 |
+
from transformers import pipeline
|
88 |
+
|
89 |
+
classifier = pipeline(
|
90 |
+
"text-classification",
|
91 |
+
model="MatteoFasulo/mdeberta-v3-base-subjectivity-sentiment-multilingual"
|
92 |
+
)
|
93 |
+
|
94 |
+
text_objective = "The quick brown fox jumps over the lazy dog."
|
95 |
+
text_subjective = "I strongly believe this is an amazing product and everyone should buy it!"
|
96 |
+
text_german_subj = "Ich bin der Meinung, dass dies ein unglaubliches Produkt ist." # German: I am of the opinion that this is an incredible product.
|
97 |
+
|
98 |
+
print(f"'{text_objective}' -> {classifier(text_objective)}")
|
99 |
+
print(f"'{text_subjective}' -> {classifier(text_subjective)}")
|
100 |
+
print(f"'{text_german_subj}' -> {classifier(text_german_subj)}")
|
101 |
+
```
|
102 |
|
103 |
## Training procedure
|
104 |
|
|
|
124 |
| 0.3579 | 5.0 | 2010 | 0.7443 | 0.7476 | 0.7485 | 0.7614 | 0.7154 | 0.6440 | 0.8045 | 0.7518 |
|
125 |
| 0.3579 | 6.0 | 2412 | 0.7762 | 0.7580 | 0.7558 | 0.7614 | 0.7100 | 0.6878 | 0.7336 | 0.7676 |
|
126 |
|
|
|
127 |
### Framework versions
|
128 |
|
129 |
- Transformers 4.49.0
|
130 |
- Pytorch 2.5.1+cu121
|
131 |
- Datasets 3.3.1
|
132 |
+
- Tokenizers 0.21.0
|
133 |
+
|
134 |
+
## Citation
|
135 |
+
|
136 |
+
If you find our work helpful or inspiring, please consider citing the associated paper:
|
137 |
+
|
138 |
+
```bibtex
|
139 |
+
@article{fasulo2025ai,
|
140 |
+
title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
|
141 |
+
author={Fasulo, Matteo and Bonal, Matteo and Hettich, Noah and Hettich, Elias and Jabbari, Mahdi},
|
142 |
+
journal={arXiv preprint arXiv:2507.11764},
|
143 |
+
year={2025}
|
144 |
+
}
|
145 |
+
```
|