nielsr HF Staff commited on
Commit
e8f4bf4
·
verified ·
1 Parent(s): 949e665

Improve model card: Add pipeline tag, update license, expand description and usage

Browse files

This PR significantly enhances the model card by:
- Updating the license in the metadata to `cc-by-4.0` as specified in the GitHub repository.
- Adding the `pipeline_tag: text-classification` for improved discoverability and inference widget functionality.
- Including relevant tags like `subjectivity-detection` and `deberta-v3`.
- Adding the Hugging Face `paper` ID and the `repo_url` to the metadata.
- Removing the automatically generated comment at the top of the content.
- Adding a direct link to the GitHub repository and reiterating the paper link in the introductory section.
- Populating the "Model description", "Intended uses & limitations", and "Training and evaluation data" sections with detailed information extracted from the paper abstract and the associated GitHub README.
- Adding a "How to use" section with a practical Python code snippet using the `transformers` library for inference.
- Including a "Citation" section with the BibTeX entry for the paper.

Files changed (1) hide show
  1. README.md +109 -15
README.md CHANGED
@@ -1,25 +1,34 @@
1
  ---
2
- library_name: transformers
3
- license: mit
4
  base_model: microsoft/mdeberta-v3-base
5
- tags:
6
- - generated_from_trainer
 
 
7
  metrics:
8
  - accuracy
9
  - f1
 
 
 
 
 
 
 
 
10
  model-index:
11
  - name: mdeberta-v3-base-subjectivity-italian
12
  results: []
13
- language:
14
- - it
15
  ---
16
 
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
-
20
  # mdeberta-v3-base-subjectivity-italian
21
 
22
- This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
 
 
 
 
 
 
23
  It achieves the following results on the evaluation set:
24
  - Loss: 0.7922
25
  - Macro F1: 0.7490
@@ -32,15 +41,28 @@ It achieves the following results on the evaluation set:
32
 
33
  ## Model description
34
 
35
- More information needed
 
 
36
 
37
  ## Intended uses & limitations
38
 
39
- More information needed
 
 
 
 
 
 
 
 
 
40
 
41
  ## Training and evaluation data
42
 
43
- More information needed
 
 
44
 
45
  ## Training procedure
46
 
@@ -66,10 +88,82 @@ The following hyperparameters were used during training:
66
  | 0.4326 | 5.0 | 505 | 0.7883 | 0.7463 | 0.7413 | 0.7522 | 0.6322 | 0.6105 | 0.6554 | 0.7976 |
67
  | 0.4326 | 6.0 | 606 | 0.7922 | 0.7490 | 0.7409 | 0.7602 | 0.6402 | 0.6020 | 0.6836 | 0.7961 |
68
 
69
-
70
  ### Framework versions
71
 
72
  - Transformers 4.49.0
73
  - Pytorch 2.5.1+cu121
74
  - Datasets 3.3.1
75
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: microsoft/mdeberta-v3-base
3
+ language:
4
+ - it
5
+ library_name: transformers
6
+ license: cc-by-4.0
7
  metrics:
8
  - accuracy
9
  - f1
10
+ tags:
11
+ - generated_from_trainer
12
+ - text-classification
13
+ - subjectivity-detection
14
+ - deberta-v3
15
+ pipeline_tag: text-classification
16
+ paper: 2507.11764
17
+ repo_url: https://github.com/MatteoFasulo/clef2025-checkthat
18
  model-index:
19
  - name: mdeberta-v3-base-subjectivity-italian
20
  results: []
 
 
21
  ---
22
 
 
 
 
23
  # mdeberta-v3-base-subjectivity-italian
24
 
25
+ This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) for **Subjectivity Detection in News Articles**. It was developed by AI Wizards as part of their participation in the **CLEF 2025 CheckThat! Lab Task 1**.
26
+
27
+ The model aims to classify sentences as subjective (opinion-laden) or objective. Its primary strategy involves enhancing transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This approach has been shown to significantly boost performance, especially the subjective F1 score.
28
+
29
+ For more details, refer to the paper: [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://arxiv.org/abs/2507.11764).
30
+ The official code repository is available at: [https://github.com/MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat).
31
+
32
  It achieves the following results on the evaluation set:
33
  - Loss: 0.7922
34
  - Macro F1: 0.7490
 
41
 
42
  ## Model description
43
 
44
+ This `mdeberta-v3-base-subjectivity-italian` model is a transformer-based classifier specifically designed for subjectivity detection in news articles. It distinguishes between subjective (opinion-laden) and objective sentences. The model's innovation lies in augmenting transformer embeddings with sentiment signals from an auxiliary model, leading to consistent performance gains, particularly in the subjective F1 score. It also incorporates robust decision threshold calibration to counter class imbalances prevalent across different languages.
45
+
46
+ This model was evaluated across monolingual settings (Arabic, German, English, Italian, Bulgarian), zero-shot transfer (Greek, Polish, Romanian, Ukrainian), and multilingual training, demonstrating strong generalization capabilities.
47
 
48
  ## Intended uses & limitations
49
 
50
+ **Intended Uses:**
51
+ - **Subjectivity Detection**: Classifying sentences in news articles as subjective or objective.
52
+ - **Fact-Checking Pipelines**: Serving as a component to identify opinionated content that might require further scrutiny.
53
+ - **Journalism Support**: Aiding journalists in analyzing content for bias or sentiment.
54
+ - **Misinformation Combatting**: Contributing to systems designed to detect and combat misinformation by flagging subjective claims.
55
+
56
+ **Limitations:**
57
+ - **Class Imbalance Sensitivity**: While decision threshold calibration was applied, the model's performance can be sensitive to the class distribution of the evaluation data. An initial submission error during the CLEF 2025 challenge illustrated this sensitivity.
58
+ - **Domain Specificity**: Optimized for news articles; performance might vary on text from significantly different domains.
59
+ - **Sentiment Model Dependency**: The effectiveness of sentiment augmentation depends on the quality and relevance of the auxiliary sentiment model used.
60
 
61
  ## Training and evaluation data
62
 
63
+ This model was fine-tuned on data from the **CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles**. The training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian. For the final evaluation, additional unseen languages such as Greek, Romanian, Polish, and Ukrainian were included to assess the model's generalization capabilities.
64
+
65
+ The training process specifically addressed class imbalance, which was a notable characteristic across these languages, by employing decision threshold calibration optimized on the development set.
66
 
67
  ## Training procedure
68
 
 
88
  | 0.4326 | 5.0 | 505 | 0.7883 | 0.7463 | 0.7413 | 0.7522 | 0.6322 | 0.6105 | 0.6554 | 0.7976 |
89
  | 0.4326 | 6.0 | 606 | 0.7922 | 0.7490 | 0.7409 | 0.7602 | 0.6402 | 0.6020 | 0.6836 | 0.7961 |
90
 
 
91
  ### Framework versions
92
 
93
  - Transformers 4.49.0
94
  - Pytorch 2.5.1+cu121
95
  - Datasets 3.3.1
96
+ - Tokenizers 0.21.0
97
+
98
+ ## How to use
99
+
100
+ You can use this model for text classification (subjectivity detection) with the `transformers` library:
101
+
102
+ ```python
103
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
104
+ import torch
105
+
106
+ model_name = "MatteoFasulo/mdeberta-v3-base-subjectivity-italian"
107
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
108
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
109
+
110
+ # Example 1: Subjective sentence
111
+ text_1 = "This is a truly exceptional movie with stunning visuals and a captivating plot."
112
+ inputs_1 = tokenizer(text_1, return_tensors="pt")
113
+
114
+ with torch.no_grad():
115
+ logits_1 = model(**inputs_1).logits
116
+
117
+ predicted_class_id_1 = logits_1.argmax().item()
118
+ predicted_label_1 = model.config.id2label[predicted_class_id_1]
119
+
120
+ print(f"Text: '{text_1}'")
121
+ print(f"Predicted label: {predicted_label_1}") # Expected: SUBJ
122
+
123
+ # Example 2: Objective sentence
124
+ text_2 = "The capital of France is Paris."
125
+ inputs_2 = tokenizer(text_2, return_tensors="pt")
126
+
127
+ with torch.no_grad():
128
+ logits_2 = model(**inputs_2).logits
129
+
130
+ predicted_class_id_2 = logits_2.argmax().item()
131
+ predicted_label_2 = model.config.id2label[predicted_class_id_2]
132
+
133
+ print(f"Text: '{text_2}'")
134
+ print(f"Predicted label: {predicted_label_2}") # Expected: OBJ
135
+
136
+ # Example 3: Batch processing
137
+ texts_to_classify = [
138
+ "I believe this decision is a grave mistake for our future.",
139
+ "The report indicates a significant decline in quarterly earnings.",
140
+ "What an absolutely brilliant performance by the lead actor!",
141
+ "The meeting is scheduled for tomorrow at 10 AM in conference room B."
142
+ ]
143
+ inputs_batch = tokenizer(texts_to_classify, padding=True, truncation=True, return_tensors="pt")
144
+
145
+ with torch.no_grad():
146
+ logits_batch = model(**inputs_batch).logits
147
+
148
+ predicted_class_ids_batch = logits_batch.argmax(dim=1).tolist()
149
+ predicted_labels_batch = [model.config.id2label[id] for id in predicted_class_ids_batch]
150
+
151
+ for text, label in zip(texts_to_classify, predicted_labels_batch):
152
+ print(f"Text: '{text}' -> Label: {label}")
153
+ ```
154
+
155
+ ## Citation
156
+
157
+ If you find this model or the associated work useful, please cite the original paper:
158
+
159
+ ```bibtex
160
+ @misc{antoun2024camembert20smarterfrench,
161
+ title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
162
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
163
+ year={2024},
164
+ eprint={2507.11764},
165
+ archivePrefix={arXiv},
166
+ primaryClass={cs.CL},
167
+ url={https://arxiv.org/abs/2507.11764},
168
+ }
169
+ ```