nielsr HF Staff commited on
Commit
753b39d
·
verified ·
1 Parent(s): 7506b75

Improve model card: Add pipeline tag, update metadata, and expand content

Browse files

This PR significantly improves the model card for `mdeberta-v3-base-subjectivity-multilingual-no-arabic` by:

- **Adding the `pipeline_tag: text-classification`** to ensure proper discoverability on the Hugging Face Hub.
- **Updating the `license`** to `cc-by-4.0` as stated in the official GitHub repository.
- **Refining `language` and `tags` metadata** to better describe the model's scope and task.
- **Expanding the \"Model description\", \"Intended uses & limitations\", and \"Training and evaluation data\" sections** with details extracted from the paper abstract and the GitHub README.
- **Adding a direct link to the official GitHub repository** for easy access to the source code.
- **Providing a clear Python usage example** using the `transformers` `pipeline` for text classification.
- **Adding a BibTeX citation** for the associated paper.

Please review and merge.

Files changed (1) hide show
  1. README.md +64 -13
README.md CHANGED
@@ -1,23 +1,30 @@
1
  ---
2
- library_name: transformers
3
- license: mit
4
  base_model: microsoft/mdeberta-v3-base
5
- tags:
6
- - generated_from_trainer
7
  metrics:
8
  - accuracy
9
  - f1
 
 
 
 
 
 
 
 
 
10
  model-index:
11
  - name: mdeberta-v3-base-subjectivity-multilingual-no-arabic
12
  results: []
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
  # mdeberta-v3-base-subjectivity-multilingual-no-arabic
19
 
20
- This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
 
 
 
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.7196
23
  - Macro F1: 0.8071
@@ -30,15 +37,20 @@ It achieves the following results on the evaluation set:
30
 
31
  ## Model description
32
 
33
- More information needed
34
 
35
  ## Intended uses & limitations
36
 
37
- More information needed
 
 
 
 
 
38
 
39
  ## Training and evaluation data
40
 
41
- More information needed
42
 
43
  ## Training procedure
44
 
@@ -64,10 +76,49 @@ The following hyperparameters were used during training:
64
  | 0.2749 | 5.0 | 1245 | 0.7195 | 0.7996 | 0.8038 | 0.7963 | 0.7461 | 0.7689 | 0.7247 | 0.8139 |
65
  | 0.2749 | 6.0 | 1494 | 0.7196 | 0.8071 | 0.8037 | 0.8123 | 0.7658 | 0.7367 | 0.7973 | 0.8159 |
66
 
67
-
68
  ### Framework versions
69
 
70
  - Transformers 4.47.0
71
  - Pytorch 2.5.1+cu121
72
  - Datasets 3.3.1
73
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: microsoft/mdeberta-v3-base
3
+ library_name: transformers
4
+ license: cc-by-4.0
5
  metrics:
6
  - accuracy
7
  - f1
8
+ pipeline_tag: text-classification
9
+ language: multilingual
10
+ tags:
11
+ - text-classification
12
+ - subjectivity-detection
13
+ - news-articles
14
+ - multilingual
15
+ - deberta-v3
16
+ - generated_from_trainer
17
  model-index:
18
  - name: mdeberta-v3-base-subjectivity-multilingual-no-arabic
19
  results: []
20
  ---
21
 
 
 
 
22
  # mdeberta-v3-base-subjectivity-multilingual-no-arabic
23
 
24
+ This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) for subjectivity detection in news articles. It was presented in the paper [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://arxiv.org/abs/2507.11764).
25
+
26
+ **GitHub Repository**: For the official code and more details, please refer to the [GitHub repository](https://github.com/MatteoFasulo/clef2025-checkthat).
27
+
28
  It achieves the following results on the evaluation set:
29
  - Loss: 0.7196
30
  - Macro F1: 0.8071
 
37
 
38
  ## Model description
39
 
40
+ This model is a fine-tuned version of `microsoft/mdeberta-v3-base` for **Subjectivity Detection in News Articles**. It classifies sentences as subjective or objective across monolingual, multilingual, and zero-shot settings. The core innovation lies in enhancing transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, applied here with mDeBERTaV3-base, aims to improve upon standard fine-tuning, particularly boosting subjective F1 score. Decision threshold calibration was also employed to address class imbalance.
41
 
42
  ## Intended uses & limitations
43
 
44
+ This model is intended to identify whether a sentence is **subjective** (e.g., opinion-laden) or **objective**, making it a valuable tool for combating misinformation, improving fact-checking pipelines, and supporting journalists in content analysis.
45
+
46
+ **Limitations:**
47
+ * This specific model (`multilingual-no-arabic`) was fine-tuned on the multilingual dataset *excluding Arabic* data.
48
+ * While designed for multilingual and zero-shot transfer, performance can vary significantly across languages and specific domains.
49
+ * The original submission process had a mistake where a custom train/dev mix was inadvertently used, leading to skewed class distribution and under-calibrated decision thresholds for the official multilingual Macro F1 score (0.24). A re-evaluation with the correct data split yielded a Macro F1 = 0.68, which would have placed the model 9th overall in the challenge.
50
 
51
  ## Training and evaluation data
52
 
53
+ Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian as part of the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. This specific model was trained on the multilingual dataset, *excluding Arabic* data. Final evaluation included additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) to assess generalization capabilities.
54
 
55
  ## Training procedure
56
 
 
76
  | 0.2749 | 5.0 | 1245 | 0.7195 | 0.7996 | 0.8038 | 0.7963 | 0.7461 | 0.7689 | 0.7247 | 0.8139 |
77
  | 0.2749 | 6.0 | 1494 | 0.7196 | 0.8071 | 0.8037 | 0.8123 | 0.7658 | 0.7367 | 0.7973 | 0.8159 |
78
 
 
79
  ### Framework versions
80
 
81
  - Transformers 4.47.0
82
  - Pytorch 2.5.1+cu121
83
  - Datasets 3.3.1
84
+ - Tokenizers 0.21.0
85
+
86
+ ## How to use
87
+
88
+ You can use the model with the `pipeline` API from the `transformers` library for text classification:
89
+
90
+ ```python
91
+ from transformers import pipeline
92
+
93
+ model_name = "MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic"
94
+ # The pipeline automatically infers the task and labels from the model config
95
+ classifier = pipeline("text-classification", model=model_name)
96
+
97
+ # Example usage:
98
+ # A subjective sentence
99
+ result_subj = classifier("This is a truly amazing and groundbreaking discovery!")
100
+ print(f"Sentence: 'This is a truly amazing and groundbreaking discovery!' -> {result_subj}")
101
+
102
+ # An objective sentence
103
+ result_obj = classifier("The new policy will be implemented next quarter.")
104
+ print(f"Sentence: 'The new policy will be implemented next quarter.' -> {result_obj}")
105
+ ```
106
+
107
+ ## Citation
108
+
109
+ If you find our work helpful or inspiring, please feel free to cite the paper:
110
+
111
+ ```bibtex
112
+ @article{fasulo2025ai,
113
+ title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
114
+ author={Fasulo, Matteo and Fabris, Alessandro and Caldararu, Silvia and Kiselov, Valerij and Stoica, George and Ilie, Andrei},
115
+ journal={arXiv preprint arXiv:2507.11764},
116
+ year={2025}
117
+ }
118
+ ```
119
+
120
+ You can find the official paper on Hugging Face Papers: [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://huggingface.co/papers/2507.11764).
121
+
122
+ ## License
123
+
124
+ This work is licensed under the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/deed.en) (CC BY 4.0).