Improve model card: Add pipeline tag, update metadata, and expand content
Browse filesThis PR significantly improves the model card for `mdeberta-v3-base-subjectivity-multilingual-no-arabic` by:
- **Adding the `pipeline_tag: text-classification`** to ensure proper discoverability on the Hugging Face Hub.
- **Updating the `license`** to `cc-by-4.0` as stated in the official GitHub repository.
- **Refining `language` and `tags` metadata** to better describe the model's scope and task.
- **Expanding the \"Model description\", \"Intended uses & limitations\", and \"Training and evaluation data\" sections** with details extracted from the paper abstract and the GitHub README.
- **Adding a direct link to the official GitHub repository** for easy access to the source code.
- **Providing a clear Python usage example** using the `transformers` `pipeline` for text classification.
- **Adding a BibTeX citation** for the associated paper.
Please review and merge.
@@ -1,23 +1,30 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
-
license: mit
|
4 |
base_model: microsoft/mdeberta-v3-base
|
5 |
-
|
6 |
-
-
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
- f1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
model-index:
|
11 |
- name: mdeberta-v3-base-subjectivity-multilingual-no-arabic
|
12 |
results: []
|
13 |
---
|
14 |
|
15 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
16 |
-
should probably proofread and complete it, then remove this comment. -->
|
17 |
-
|
18 |
# mdeberta-v3-base-subjectivity-multilingual-no-arabic
|
19 |
|
20 |
-
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base)
|
|
|
|
|
|
|
21 |
It achieves the following results on the evaluation set:
|
22 |
- Loss: 0.7196
|
23 |
- Macro F1: 0.8071
|
@@ -30,15 +37,20 @@ It achieves the following results on the evaluation set:
|
|
30 |
|
31 |
## Model description
|
32 |
|
33 |
-
|
34 |
|
35 |
## Intended uses & limitations
|
36 |
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
## Training and evaluation data
|
40 |
|
41 |
-
|
42 |
|
43 |
## Training procedure
|
44 |
|
@@ -64,10 +76,49 @@ The following hyperparameters were used during training:
|
|
64 |
| 0.2749 | 5.0 | 1245 | 0.7195 | 0.7996 | 0.8038 | 0.7963 | 0.7461 | 0.7689 | 0.7247 | 0.8139 |
|
65 |
| 0.2749 | 6.0 | 1494 | 0.7196 | 0.8071 | 0.8037 | 0.8123 | 0.7658 | 0.7367 | 0.7973 | 0.8159 |
|
66 |
|
67 |
-
|
68 |
### Framework versions
|
69 |
|
70 |
- Transformers 4.47.0
|
71 |
- Pytorch 2.5.1+cu121
|
72 |
- Datasets 3.3.1
|
73 |
-
- Tokenizers 0.21.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
2 |
base_model: microsoft/mdeberta-v3-base
|
3 |
+
library_name: transformers
|
4 |
+
license: cc-by-4.0
|
5 |
metrics:
|
6 |
- accuracy
|
7 |
- f1
|
8 |
+
pipeline_tag: text-classification
|
9 |
+
language: multilingual
|
10 |
+
tags:
|
11 |
+
- text-classification
|
12 |
+
- subjectivity-detection
|
13 |
+
- news-articles
|
14 |
+
- multilingual
|
15 |
+
- deberta-v3
|
16 |
+
- generated_from_trainer
|
17 |
model-index:
|
18 |
- name: mdeberta-v3-base-subjectivity-multilingual-no-arabic
|
19 |
results: []
|
20 |
---
|
21 |
|
|
|
|
|
|
|
22 |
# mdeberta-v3-base-subjectivity-multilingual-no-arabic
|
23 |
|
24 |
+
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) for subjectivity detection in news articles. It was presented in the paper [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://arxiv.org/abs/2507.11764).
|
25 |
+
|
26 |
+
**GitHub Repository**: For the official code and more details, please refer to the [GitHub repository](https://github.com/MatteoFasulo/clef2025-checkthat).
|
27 |
+
|
28 |
It achieves the following results on the evaluation set:
|
29 |
- Loss: 0.7196
|
30 |
- Macro F1: 0.8071
|
|
|
37 |
|
38 |
## Model description
|
39 |
|
40 |
+
This model is a fine-tuned version of `microsoft/mdeberta-v3-base` for **Subjectivity Detection in News Articles**. It classifies sentences as subjective or objective across monolingual, multilingual, and zero-shot settings. The core innovation lies in enhancing transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, applied here with mDeBERTaV3-base, aims to improve upon standard fine-tuning, particularly boosting subjective F1 score. Decision threshold calibration was also employed to address class imbalance.
|
41 |
|
42 |
## Intended uses & limitations
|
43 |
|
44 |
+
This model is intended to identify whether a sentence is **subjective** (e.g., opinion-laden) or **objective**, making it a valuable tool for combating misinformation, improving fact-checking pipelines, and supporting journalists in content analysis.
|
45 |
+
|
46 |
+
**Limitations:**
|
47 |
+
* This specific model (`multilingual-no-arabic`) was fine-tuned on the multilingual dataset *excluding Arabic* data.
|
48 |
+
* While designed for multilingual and zero-shot transfer, performance can vary significantly across languages and specific domains.
|
49 |
+
* The original submission process had a mistake where a custom train/dev mix was inadvertently used, leading to skewed class distribution and under-calibrated decision thresholds for the official multilingual Macro F1 score (0.24). A re-evaluation with the correct data split yielded a Macro F1 = 0.68, which would have placed the model 9th overall in the challenge.
|
50 |
|
51 |
## Training and evaluation data
|
52 |
|
53 |
+
Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian as part of the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. This specific model was trained on the multilingual dataset, *excluding Arabic* data. Final evaluation included additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) to assess generalization capabilities.
|
54 |
|
55 |
## Training procedure
|
56 |
|
|
|
76 |
| 0.2749 | 5.0 | 1245 | 0.7195 | 0.7996 | 0.8038 | 0.7963 | 0.7461 | 0.7689 | 0.7247 | 0.8139 |
|
77 |
| 0.2749 | 6.0 | 1494 | 0.7196 | 0.8071 | 0.8037 | 0.8123 | 0.7658 | 0.7367 | 0.7973 | 0.8159 |
|
78 |
|
|
|
79 |
### Framework versions
|
80 |
|
81 |
- Transformers 4.47.0
|
82 |
- Pytorch 2.5.1+cu121
|
83 |
- Datasets 3.3.1
|
84 |
+
- Tokenizers 0.21.0
|
85 |
+
|
86 |
+
## How to use
|
87 |
+
|
88 |
+
You can use the model with the `pipeline` API from the `transformers` library for text classification:
|
89 |
+
|
90 |
+
```python
|
91 |
+
from transformers import pipeline
|
92 |
+
|
93 |
+
model_name = "MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic"
|
94 |
+
# The pipeline automatically infers the task and labels from the model config
|
95 |
+
classifier = pipeline("text-classification", model=model_name)
|
96 |
+
|
97 |
+
# Example usage:
|
98 |
+
# A subjective sentence
|
99 |
+
result_subj = classifier("This is a truly amazing and groundbreaking discovery!")
|
100 |
+
print(f"Sentence: 'This is a truly amazing and groundbreaking discovery!' -> {result_subj}")
|
101 |
+
|
102 |
+
# An objective sentence
|
103 |
+
result_obj = classifier("The new policy will be implemented next quarter.")
|
104 |
+
print(f"Sentence: 'The new policy will be implemented next quarter.' -> {result_obj}")
|
105 |
+
```
|
106 |
+
|
107 |
+
## Citation
|
108 |
+
|
109 |
+
If you find our work helpful or inspiring, please feel free to cite the paper:
|
110 |
+
|
111 |
+
```bibtex
|
112 |
+
@article{fasulo2025ai,
|
113 |
+
title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
|
114 |
+
author={Fasulo, Matteo and Fabris, Alessandro and Caldararu, Silvia and Kiselov, Valerij and Stoica, George and Ilie, Andrei},
|
115 |
+
journal={arXiv preprint arXiv:2507.11764},
|
116 |
+
year={2025}
|
117 |
+
}
|
118 |
+
```
|
119 |
+
|
120 |
+
You can find the official paper on Hugging Face Papers: [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://huggingface.co/papers/2507.11764).
|
121 |
+
|
122 |
+
## License
|
123 |
+
|
124 |
+
This work is licensed under the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/deed.en) (CC BY 4.0).
|