Improve model card: Add text-classification pipeline tag, update license, expand sections, and add usage/code

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +66 -14
README.md CHANGED
@@ -1,25 +1,29 @@
1
  ---
2
- library_name: transformers
3
- license: mit
4
  base_model: microsoft/mdeberta-v3-base
5
- tags:
6
- - generated_from_trainer
 
 
7
  metrics:
8
  - accuracy
9
  - f1
 
 
 
 
 
 
 
10
  model-index:
11
  - name: mdeberta-v3-base-subjectivity-arabic
12
  results: []
13
- language:
14
- - ar
15
  ---
16
 
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
-
20
  # mdeberta-v3-base-subjectivity-arabic
21
 
22
- This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
23
  It achieves the following results on the evaluation set:
24
  - Loss: 0.7419
25
  - Macro F1: 0.5291
@@ -32,15 +36,17 @@ It achieves the following results on the evaluation set:
32
 
33
  ## Model description
34
 
35
- More information needed
36
 
37
  ## Intended uses & limitations
38
 
39
- More information needed
 
 
40
 
41
  ## Training and evaluation data
42
 
43
- More information needed
44
 
45
  ## Training procedure
46
 
@@ -72,4 +78,50 @@ The following hyperparameters were used during training:
72
  - Transformers 4.49.0
73
  - Pytorch 2.5.1+cu121
74
  - Datasets 3.3.1
75
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: microsoft/mdeberta-v3-base
3
+ language:
4
+ - ar
5
+ library_name: transformers
6
+ license: cc-by-4.0
7
  metrics:
8
  - accuracy
9
  - f1
10
+ tags:
11
+ - generated_from_trainer
12
+ - text-classification
13
+ - subjectivity-detection
14
+ - news
15
+ - arabic
16
+ pipeline_tag: text-classification
17
  model-index:
18
  - name: mdeberta-v3-base-subjectivity-arabic
19
  results: []
20
+ datasets:
21
+ - MatteoFasulo/clef2025_checkthat_task1_subjectivity
22
  ---
23
 
 
 
 
24
  # mdeberta-v3-base-subjectivity-arabic
25
 
26
+ This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](https://arxiv.org/abs/2507.11764).
27
  It achieves the following results on the evaluation set:
28
  - Loss: 0.7419
29
  - Macro F1: 0.5291
 
36
 
37
  ## Model description
38
 
39
+ This model is part of AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. It aims to classify sentences as subjective or objective, a key component in combating misinformation, improving fact-checking pipelines, and supporting journalists. The model enhances transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, applied here with mDeBERTaV3-base, has shown consistent performance gains, particularly in subjective F1 score.
40
 
41
  ## Intended uses & limitations
42
 
43
+ This model is intended for subjectivity detection in sentences from news articles, classifying them as either subjective (opinion-laden) or objective. This capability is valuable for applications such as combating misinformation, improving fact-checking pipelines, and supporting journalists. It has been evaluated across monolingual (Arabic, German, English, Italian, Bulgarian), multilingual, and zero-shot settings (Greek, Romanian, Polish, Ukrainian).
44
+
45
+ A key strategy employed is decision threshold calibration to address class imbalance prevalent across languages. Users should be aware that the initial official multilingual Macro F1 score was lower due to a submission error (skewed class distribution), which was later corrected offline to Macro F1 = 0.68, placing the team 9th overall in the challenge.
46
 
47
  ## Training and evaluation data
48
 
49
+ The model was trained and evaluated on datasets provided for the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. Training and development datasets were available for Arabic, German, English, Italian, and Bulgarian. For final evaluation, additional unseen languages such as Greek, Romanian, Polish, and Ukrainian were used to assess generalization capabilities. The training incorporates sentiment scores from an auxiliary model and utilizes decision threshold calibration to mitigate class imbalance.
50
 
51
  ## Training procedure
52
 
 
78
  - Transformers 4.49.0
79
  - Pytorch 2.5.1+cu121
80
  - Datasets 3.3.1
81
+ - Tokenizers 0.21.0
82
+
83
+ ## How to use
84
+
85
+ You can use the model directly with the `transformers` library for text classification:
86
+
87
+ ```python
88
+ from transformers import pipeline
89
+
90
+ # Load the text classification pipeline
91
+ classifier = pipeline(
92
+ "text-classification",
93
+ model="MatteoFasulo/mdeberta-v3-base-subjectivity-arabic",
94
+ tokenizer="microsoft/mdeberta-v3-base",
95
+ )
96
+
97
+ # Example usage for an objective sentence
98
+ text1 = "وهكذا بدأت النساء يعين أهمية دورهن في عدم الصمت أمام هذه الاقتحامات ورفضها بإعلاء صيحات الله أكبر."
99
+ result1 = classifier(text1)
100
+ print(f"Text: '{text1}' Classification: {result1}")
101
+ # Expected output: [{'label': 'OBJ', 'score': ...}]
102
+
103
+ # Example usage for a subjective sentence
104
+ text2 = "ستشمل الشحنة الأولية نصف الجرعات، يليها النصف الثاني بعد ثلاثة أسابيع."
105
+ result2 = classifier(text2)
106
+ print(f"Text: '{text2}' Classification: {result2}")
107
+ # Expected output: [{'label': 'SUBJ', 'score': ...}]
108
+ ```
109
+
110
+ ## Code
111
+ The official code and materials for this project are available on GitHub: [https://github.com/MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat).
112
+
113
+ ## Citation
114
+
115
+ If you find our work helpful or inspiring, please feel free to cite it:
116
+
117
+ ```bibtex
118
+ @misc{fasulo2025aiwizardscheckthat2025,
119
+ title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
120
+ author={Matteo Fasulo and Luca Babboni and Luca Tedeschini},
121
+ year={2025},
122
+ eprint={2507.11764},
123
+ archivePrefix={arXiv},
124
+ primaryClass={cs.CL},
125
+ url={https://arxiv.org/abs/2507.11764},
126
+ }
127
+ ```