Improve model card: Add pipeline tag, update license, GitHub link, and detailed sections

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +68 -11
README.md CHANGED
@@ -1,17 +1,23 @@
1
  ---
2
- library_name: transformers
3
- license: mit
4
  base_model: microsoft/mdeberta-v3-base
5
- tags:
6
- - generated_from_trainer
 
 
7
  metrics:
8
  - accuracy
9
  - f1
 
 
 
 
 
 
10
  model-index:
11
  - name: mdeberta-v3-base-subjectivity-german
12
  results: []
13
- language:
14
- - de
15
  ---
16
 
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -19,7 +25,12 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  # mdeberta-v3-base-subjectivity-german
21
 
22
- This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
 
 
 
 
 
23
  It achieves the following results on the evaluation set:
24
  - Loss: 0.5760
25
  - Macro F1: 0.7720
@@ -32,15 +43,45 @@ It achieves the following results on the evaluation set:
32
 
33
  ## Model description
34
 
35
- More information needed
36
 
37
  ## Intended uses & limitations
38
 
39
- More information needed
 
 
 
40
 
41
  ## Training and evaluation data
42
 
43
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  ## Training procedure
46
 
@@ -72,4 +113,20 @@ The following hyperparameters were used during training:
72
  - Transformers 4.49.0
73
  - Pytorch 2.5.1+cu121
74
  - Datasets 3.3.1
75
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: microsoft/mdeberta-v3-base
3
+ language:
4
+ - de
5
+ library_name: transformers
6
+ license: cc-by-4.0
7
  metrics:
8
  - accuracy
9
  - f1
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - generated_from_trainer
13
+ - subjectivity-detection
14
+ - mdeberta-v3
15
+ - sentiment
16
  model-index:
17
  - name: mdeberta-v3-base-subjectivity-german
18
  results: []
19
+ datasets:
20
+ - MatteoFasulo/clef2025_checkthat_task1_subjectivity
21
  ---
22
 
23
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
25
 
26
  # mdeberta-v3-base-subjectivity-german
27
 
28
+ This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) for **Subjectivity Detection in News Articles**, as presented in the paper [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://arxiv.org/abs/2507.11764).
29
+
30
+ It was developed as part of AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1.
31
+
32
+ **Code:** [https://github.com/MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat)
33
+
34
  It achieves the following results on the evaluation set:
35
  - Loss: 0.5760
36
  - Macro F1: 0.7720
 
43
 
44
  ## Model description
45
 
46
+ This model is designed to classify sentences in news articles as either **subjective** (e.g., opinion-laden) or **objective**. This is a key component in combating misinformation, improving fact-checking pipelines, and supporting journalists. It is based on the `mDeBERTaV3-base` architecture and enhances transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, combined with robust decision threshold calibration to address class imbalance, significantly boosts performance, especially for the subjective F1 score.
47
 
48
  ## Intended uses & limitations
49
 
50
+ This model is intended for classifying sentences in news articles as subjective or objective across various languages. It has been evaluated in monolingual (Arabic, German, English, Italian, Bulgarian), multilingual, and zero-shot transfer settings (Greek, Polish, Romanian, Ukrainian). It is particularly useful for applications requiring fine-grained text analysis in news contexts, such as misinformation detection, fact-checking, and journalistic tools.
51
+
52
+ **Limitations:**
53
+ While designed to handle class imbalance through decision threshold calibration, the model's performance on certain languages or specific class distributions might vary. As highlighted in the original work, initial submission errors revealed the sensitivity to proper calibration. The model's primary focus is on news article text, and its generalization to other domains or highly nuanced subjective expressions might vary.
54
 
55
  ## Training and evaluation data
56
 
57
+ The model was trained and evaluated on datasets provided for the **CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles**. Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian. For final evaluation, additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) were included to assess generalization capabilities. The training process focused on enhancing transformer embeddings with sentiment signals and employing decision threshold calibration to mitigate class imbalance prevalent across languages.
58
+
59
+ ## How to use
60
+ You can use this model directly with the Hugging Face `transformers` library for text classification:
61
+
62
+ ```python
63
+ from transformers import pipeline
64
+
65
+ # Load the text classification pipeline
66
+ classifier = pipeline(
67
+ "text-classification",
68
+ model="MatteoFasulo/mdeberta-v3-base-subjectivity-german",
69
+ tokenizer="microsoft/mdeberta-v3-base",
70
+ )
71
+
72
+ # Example usage for an objective sentence
73
+ text1 = "Das Unternehmen meldete im letzten Quartal einen Gewinnanstieg von 10 %."
74
+ result1 = classifier(text1)
75
+ print(f"Text: '{text1}' Classification: {result1}")
76
+ # Expected output: [{'label': 'OBJ', 'score': ...}]
77
+
78
+ # Example usage for a subjective sentence
79
+ text2 = "Dieses Produkt ist absolut erstaunlich und jeder sollte es ausprobieren!"
80
+ result2 = classifier(text2)
81
+ print(f"Text: '{text2}' Classification: {result2}")
82
+ # Expected output: [{'label': 'SUBJ', 'score': ...}]
83
+
84
+ ```
85
 
86
  ## Training procedure
87
 
 
113
  - Transformers 4.49.0
114
  - Pytorch 2.5.1+cu121
115
  - Datasets 3.3.1
116
+ - Tokenizers 0.21.0
117
+
118
+ ## Citation
119
+
120
+ If you find our work helpful or inspiring, please feel free to cite it:
121
+
122
+ ```bibtex
123
+ @misc{fasulo2025aiwizardscheckthat2025,
124
+ title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
125
+ author={Matteo Fasulo and Luca Babboni and Luca Tedeschini},
126
+ year={2025},
127
+ eprint={2507.11764},
128
+ archivePrefix={arXiv},
129
+ primaryClass={cs.CL},
130
+ url={https://arxiv.org/abs/2507.11764},
131
+ }
132
+ ```