nielsr HF Staff commited on
Commit
7425929
·
verified ·
1 Parent(s): 3da3927

Improve model card: Add pipeline tag, update metadata, and enrich content

Browse files

This PR significantly enhances the model card for `mdeberta-v3-base-subjectivity-multilingual` by:

* **Adding `pipeline_tag: text-classification`** to enable better discoverability on the Hugging Face Hub (e.g., via `https://huggingface.co/models?pipeline_tag=text-classification`).
* **Updating the `license`** to `cc-by-4.0`, as specified in the associated GitHub repository.
* **Refining `tags`** to include `deberta-v3`, `subjectivity-detection`, `multilingual`, and `sentiment-analysis` for more accurate categorization.
* **Adding specific `language` tags** for all languages the model was trained/evaluated on (`ar`, `de`, `en`, `it`, `bg`, `el`, `pl`, `ro`, `uk`).
* **Adding `arxiv_id` and `code_url`** to the metadata for direct, machine-readable links to the paper and codebase.
* **Adding `datasets`** to specify the source of training data.
* **Populating the "Model description", "Intended uses & limitations", and "Training and evaluation data" sections** with comprehensive details extracted from the paper abstract and the GitHub README.
* **Providing a clear "How to use" example** utilizing the `transformers` pipeline for easy inference.
* **Adding a dedicated "GitHub Repository" section** for easy access to the code.
* **Including a BibTeX entry** for proper citation.

These updates ensure the model card is more informative, discoverable, and adheres to best practices for documentation on the Hub.

Files changed (1) hide show
  1. README.md +98 -13
README.md CHANGED
@@ -1,23 +1,43 @@
1
  ---
2
- library_name: transformers
3
- license: mit
4
  base_model: microsoft/mdeberta-v3-base
5
- tags:
6
- - generated_from_trainer
7
  metrics:
8
  - accuracy
9
  - f1
 
 
 
 
 
 
 
10
  model-index:
11
  - name: mdeberta-v3-base-subjectivity-multilingual
12
  results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
  # mdeberta-v3-base-subjectivity-multilingual
19
 
20
- This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
 
 
 
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.8345
23
  - Macro F1: 0.7475
@@ -30,15 +50,63 @@ It achieves the following results on the evaluation set:
30
 
31
  ## Model description
32
 
33
- More information needed
 
 
 
 
 
 
 
 
 
34
 
35
  ## Intended uses & limitations
36
 
37
- More information needed
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Training and evaluation data
40
 
41
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Training procedure
44
 
@@ -62,7 +130,7 @@ The following hyperparameters were used during training:
62
  | 0.4631 | 3.0 | 1206 | 0.6583 | 0.7328 | 0.7311 | 0.7353 | 0.6785 | 0.6609 | 0.6971 | 0.7439 |
63
  | 0.394 | 4.0 | 1608 | 0.7692 | 0.7255 | 0.7327 | 0.7215 | 0.6523 | 0.6924 | 0.6165 | 0.7451 |
64
  | 0.3475 | 5.0 | 2010 | 0.7538 | 0.7438 | 0.7414 | 0.7481 | 0.6951 | 0.6667 | 0.7261 | 0.7530 |
65
- | 0.3475 | 6.0 | 2412 | 0.8345 | 0.7475 | 0.7530 | 0.7439 | 0.6824 | 0.7145 | 0.6531 | 0.7643 |
66
 
67
 
68
  ### Framework versions
@@ -70,4 +138,21 @@ The following hyperparameters were used during training:
70
  - Transformers 4.49.0
71
  - Pytorch 2.5.1+cu121
72
  - Datasets 3.3.1
73
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: microsoft/mdeberta-v3-base
3
+ library_name: transformers
4
+ license: cc-by-4.0
5
  metrics:
6
  - accuracy
7
  - f1
8
+ tags:
9
+ - generated_from_trainer
10
+ - deberta-v3
11
+ - subjectivity-detection
12
+ - multilingual
13
+ - text-classification
14
+ - sentiment-analysis
15
  model-index:
16
  - name: mdeberta-v3-base-subjectivity-multilingual
17
  results: []
18
+ pipeline_tag: text-classification
19
+ language:
20
+ - ar
21
+ - de
22
+ - en
23
+ - it
24
+ - bg
25
+ - el
26
+ - pl
27
+ - ro
28
+ - uk
29
+ datasets:
30
+ - clef-2025-checkthat-task1
31
+ arxiv_id: 2507.11764
32
+ code_url: https://github.com/MatteoFasulo/clef2025-checkthat
33
  ---
34
 
 
 
 
35
  # mdeberta-v3-base-subjectivity-multilingual
36
 
37
+ This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) for **Subjectivity Detection in News Articles**. It was developed as part of **AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1**.
38
+
39
+ The model was presented in the paper [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://huggingface.co/papers/2507.11764).
40
+
41
  It achieves the following results on the evaluation set:
42
  - Loss: 0.8345
43
  - Macro F1: 0.7475
 
50
 
51
  ## Model description
52
 
53
+ This model, `mdeberta-v3-base-subjectivity-multilingual`, is designed to classify sentences as subjective (opinion-laden) or objective (fact-based) within news articles. It was developed by AI Wizards for the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles.
54
+
55
+ The core innovation of this model lies in enhancing standard transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, built upon mDeBERTaV3-base, aims to significantly improve performance, particularly for the subjective F1 score. To counteract prevalent class imbalance across languages, decision threshold calibration optimized on the development set was employed.
56
+
57
+ The model was evaluated across:
58
+ * **Monolingual** settings (Arabic, German, English, Italian, and Bulgarian)
59
+ * **Zero-shot transfer** settings (Greek, Polish, Romanian, and Ukrainian)
60
+ * **Multilingual** training
61
+
62
+ This framework led to high rankings in the competition, notably achieving 1st place for Greek (Macro F1 = 0.51).
63
 
64
  ## Intended uses & limitations
65
 
66
+ **Intended uses:**
67
+ This model is intended for research and practical applications involving subjectivity detection, particularly in news media. Specific uses include:
68
+ * Classifying sentences in news articles as subjective or objective.
69
+ * Supporting fact-checking pipelines by identifying opinionated content.
70
+ * Assisting journalists in analyzing text for bias or subjective reporting.
71
+ * Applications in both monolingual and multilingual contexts, including zero-shot scenarios for unseen languages.
72
+
73
+ **Limitations:**
74
+ * Performance may vary across different languages, especially in zero-shot settings, despite efforts for generalization.
75
+ * The effectiveness of the sentiment augmentation relies on the quality and domain relevance of the auxiliary sentiment model.
76
+ * While designed for news articles, its performance might differ on other text genres or domains.
77
+ * Like other large language models, it may carry biases present in its training data.
78
 
79
  ## Training and evaluation data
80
 
81
+ The model was fine-tuned on training and development datasets provided for the CLEF 2025 CheckThat! Lab Task 1. These datasets included sentences from news articles in Arabic, German, English, Italian, and Bulgarian. For final evaluation, additional unseen languages such as Greek, Romanian, Polish, and Ukrainian were included to assess the model's generalization capabilities. Class imbalance issues, which were prevalent across languages, were addressed through decision threshold calibration.
82
+
83
+ ## How to use
84
+
85
+ You can easily use this model with the Hugging Face `transformers` library:
86
+
87
+ ```python
88
+ from transformers import pipeline
89
+
90
+ # Load the text classification pipeline with the model
91
+ classifier = pipeline(
92
+ "text-classification",
93
+ model="MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual"
94
+ )
95
+
96
+ # Example 1: Objective sentence
97
+ text_objective = "The capital of France is Paris."
98
+ result_objective = classifier(text_objective)
99
+ print(f"Text: '{text_objective}'
100
+ Result: {result_objective}")
101
+ # Expected output: [{'label': 'OBJ', 'score': <confidence_score>}]
102
+
103
+ # Example 2: Subjective sentence
104
+ text_subjective = "This is a fantastic movie! I absolutely loved it."
105
+ result_subjective = classifier(text_subjective)
106
+ print(f"Text: '{text_subjective}'
107
+ Result: {result_subjective}")
108
+ # Expected output: [{'label': 'SUBJ', 'score': <confidence_score>}]
109
+ ```
110
 
111
  ## Training procedure
112
 
 
130
  | 0.4631 | 3.0 | 1206 | 0.6583 | 0.7328 | 0.7311 | 0.7353 | 0.6785 | 0.6609 | 0.6971 | 0.7439 |
131
  | 0.394 | 4.0 | 1608 | 0.7692 | 0.7255 | 0.7327 | 0.7215 | 0.6523 | 0.6924 | 0.6165 | 0.7451 |
132
  | 0.3475 | 5.0 | 2010 | 0.7538 | 0.7438 | 0.7414 | 0.7481 | 0.6951 | 0.6667 | 0.7261 | 0.7530 |
133
+ | 0.3475 | 6.0 | 2412 | 0.8345 | 0.7475 | 0.7530 | 0.7439 | 0.6824 | 0.7145 | 0.6531 | 0.7643 |
134
 
135
 
136
  ### Framework versions
 
138
  - Transformers 4.49.0
139
  - Pytorch 2.5.1+cu121
140
  - Datasets 3.3.1
141
+ - Tokenizers 0.21.0
142
+
143
+ ## GitHub Repository
144
+
145
+ The code and materials for this model are available on GitHub: [MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat)
146
+
147
+ ## Citation
148
+
149
+ If you find this work useful for your research, please cite the paper:
150
+
151
+ ```bibtex
152
+ @article{fasulo2025ai,
153
+ title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
154
+ author={Fasulo, Matteo and Zhumash, Alim M Z and Turchi, Matteo and Rossi, Andrea and Di Nunzio, Giorgio Maria},
155
+ journal={arXiv preprint arXiv:2507.11764},
156
+ year={2025}
157
+ }
158
+ ```