nielsr HF Staff commited on
Commit
29be8b9
·
verified ·
1 Parent(s): fef225e

Improve model card: Add text-classification pipeline tag, update license, expand sections, and add usage/code

Browse files

This PR improves the model card for `mdeberta-v3-base-subjectivity-arabic` by:

* Adding the `pipeline_tag: text-classification` for better discoverability.
* Updating the license from `mit` to `cc-by-4.0`, as specified in the original GitHub repository.
* Adding relevant `subjectivity-detection`, `news`, and `arabic` tags for improved searchability.
* Populating the "Model description", "Intended uses & limitations", and "Training and evaluation data" sections with details from the paper abstract and GitHub README.
* Adding a "How to use" section with a Python code example for inference.
* Including a "Code" section with a direct link to the GitHub repository.
* Adding a "Citation" section with the BibTeX entry from the project's GitHub README.

Files changed (1) hide show
  1. README.md +58 -14
README.md CHANGED
@@ -1,25 +1,27 @@
1
  ---
2
- library_name: transformers
3
- license: mit
4
  base_model: microsoft/mdeberta-v3-base
5
- tags:
6
- - generated_from_trainer
 
 
7
  metrics:
8
  - accuracy
9
  - f1
 
 
 
 
 
 
 
10
  model-index:
11
  - name: mdeberta-v3-base-subjectivity-arabic
12
  results: []
13
- language:
14
- - ar
15
  ---
16
 
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
-
20
  # mdeberta-v3-base-subjectivity-arabic
21
 
22
- This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
23
  It achieves the following results on the evaluation set:
24
  - Loss: 0.7419
25
  - Macro F1: 0.5291
@@ -32,15 +34,17 @@ It achieves the following results on the evaluation set:
32
 
33
  ## Model description
34
 
35
- More information needed
36
 
37
  ## Intended uses & limitations
38
 
39
- More information needed
 
 
40
 
41
  ## Training and evaluation data
42
 
43
- More information needed
44
 
45
  ## Training procedure
46
 
@@ -72,4 +76,44 @@ The following hyperparameters were used during training:
72
  - Transformers 4.49.0
73
  - Pytorch 2.5.1+cu121
74
  - Datasets 3.3.1
75
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: microsoft/mdeberta-v3-base
3
+ language:
4
+ - ar
5
+ library_name: transformers
6
+ license: cc-by-4.0
7
  metrics:
8
  - accuracy
9
  - f1
10
+ tags:
11
+ - generated_from_trainer
12
+ - text-classification
13
+ - subjectivity-detection
14
+ - news
15
+ - arabic
16
+ pipeline_tag: text-classification
17
  model-index:
18
  - name: mdeberta-v3-base-subjectivity-arabic
19
  results: []
 
 
20
  ---
21
 
 
 
 
22
  # mdeberta-v3-base-subjectivity-arabic
23
 
24
+ This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](https://arxiv.org/abs/2507.11764).
25
  It achieves the following results on the evaluation set:
26
  - Loss: 0.7419
27
  - Macro F1: 0.5291
 
34
 
35
  ## Model description
36
 
37
+ This model is part of AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. It aims to classify sentences as subjective or objective, a key component in combating misinformation, improving fact-checking pipelines, and supporting journalists. The model enhances transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, applied here with mDeBERTaV3-base, has shown consistent performance gains, particularly in subjective F1 score.
38
 
39
  ## Intended uses & limitations
40
 
41
+ This model is intended for subjectivity detection in sentences from news articles, classifying them as either subjective (opinion-laden) or objective. This capability is valuable for applications such as combating misinformation, improving fact-checking pipelines, and supporting journalists. It has been evaluated across monolingual (Arabic, German, English, Italian, Bulgarian), multilingual, and zero-shot settings (Greek, Romanian, Polish, Ukrainian).
42
+
43
+ A key strategy employed is decision threshold calibration to address class imbalance prevalent across languages. Users should be aware that the initial official multilingual Macro F1 score was lower due to a submission error (skewed class distribution), which was later corrected offline to Macro F1 = 0.68, placing the team 9th overall in the challenge.
44
 
45
  ## Training and evaluation data
46
 
47
+ The model was trained and evaluated on datasets provided for the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. Training and development datasets were available for Arabic, German, English, Italian, and Bulgarian. For final evaluation, additional unseen languages such as Greek, Romanian, Polish, and Ukrainian were used to assess generalization capabilities. The training incorporates sentiment scores from an auxiliary model and utilizes decision threshold calibration to mitigate class imbalance.
48
 
49
  ## Training procedure
50
 
 
76
  - Transformers 4.49.0
77
  - Pytorch 2.5.1+cu121
78
  - Datasets 3.3.1
79
+ - Tokenizers 0.21.0
80
+
81
+ ## How to use
82
+
83
+ You can use the model directly with the `transformers` library for text classification:
84
+
85
+ ```python
86
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
87
+ import torch
88
+
89
+ model_name = "MatteoFasulo/mdeberta-v3-base-subjectivity-arabic"
90
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
91
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
92
+
93
+ text = "This is a very subjective opinion."
94
+ inputs = tokenizer(text, return_tensors="pt")
95
+
96
+ with torch.no_grad():
97
+ logits = model(**inputs).logits
98
+
99
+ predicted_class_id = logits.argmax().item()
100
+ print(model.config.id2label[predicted_class_id])
101
+ ```
102
+
103
+ ## Code
104
+ The official code and materials for this project are available on GitHub: [https://github.com/MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat).
105
+
106
+ ## Citation
107
+ If you find our work helpful or inspiring, please feel free to cite it.
108
+
109
+ ```bibtex
110
+ @misc{antoun2024camembert20smarterfrench,
111
+ title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
112
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
113
+ year={2024},
114
+ eprint={2411.08868},
115
+ archivePrefix={arXiv},
116
+ primaryClass={cs.CL},
117
+ url={https://arxiv.org/abs/2411.08868},
118
+ }
119
+ ```