Improve model card: Add text-classification pipeline tag, update license, expand sections, and add usage/code
Browse filesThis PR improves the model card for `mdeberta-v3-base-subjectivity-arabic` by:
* Adding the `pipeline_tag: text-classification` for better discoverability.
* Updating the license from `mit` to `cc-by-4.0`, as specified in the original GitHub repository.
* Adding relevant `subjectivity-detection`, `news`, and `arabic` tags for improved searchability.
* Populating the "Model description", "Intended uses & limitations", and "Training and evaluation data" sections with details from the paper abstract and GitHub README.
* Adding a "How to use" section with a Python code example for inference.
* Including a "Code" section with a direct link to the GitHub repository.
* Adding a "Citation" section with the BibTeX entry from the project's GitHub README.
@@ -1,25 +1,27 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
-
license: mit
|
4 |
base_model: microsoft/mdeberta-v3-base
|
5 |
-
|
6 |
-
-
|
|
|
|
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
- f1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
model-index:
|
11 |
- name: mdeberta-v3-base-subjectivity-arabic
|
12 |
results: []
|
13 |
-
language:
|
14 |
-
- ar
|
15 |
---
|
16 |
|
17 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
18 |
-
should probably proofread and complete it, then remove this comment. -->
|
19 |
-
|
20 |
# mdeberta-v3-base-subjectivity-arabic
|
21 |
|
22 |
-
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
|
23 |
It achieves the following results on the evaluation set:
|
24 |
- Loss: 0.7419
|
25 |
- Macro F1: 0.5291
|
@@ -32,15 +34,17 @@ It achieves the following results on the evaluation set:
|
|
32 |
|
33 |
## Model description
|
34 |
|
35 |
-
|
36 |
|
37 |
## Intended uses & limitations
|
38 |
|
39 |
-
|
|
|
|
|
40 |
|
41 |
## Training and evaluation data
|
42 |
|
43 |
-
|
44 |
|
45 |
## Training procedure
|
46 |
|
@@ -72,4 +76,44 @@ The following hyperparameters were used during training:
|
|
72 |
- Transformers 4.49.0
|
73 |
- Pytorch 2.5.1+cu121
|
74 |
- Datasets 3.3.1
|
75 |
-
- Tokenizers 0.21.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
2 |
base_model: microsoft/mdeberta-v3-base
|
3 |
+
language:
|
4 |
+
- ar
|
5 |
+
library_name: transformers
|
6 |
+
license: cc-by-4.0
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
- f1
|
10 |
+
tags:
|
11 |
+
- generated_from_trainer
|
12 |
+
- text-classification
|
13 |
+
- subjectivity-detection
|
14 |
+
- news
|
15 |
+
- arabic
|
16 |
+
pipeline_tag: text-classification
|
17 |
model-index:
|
18 |
- name: mdeberta-v3-base-subjectivity-arabic
|
19 |
results: []
|
|
|
|
|
20 |
---
|
21 |
|
|
|
|
|
|
|
22 |
# mdeberta-v3-base-subjectivity-arabic
|
23 |
|
24 |
+
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](https://arxiv.org/abs/2507.11764).
|
25 |
It achieves the following results on the evaluation set:
|
26 |
- Loss: 0.7419
|
27 |
- Macro F1: 0.5291
|
|
|
34 |
|
35 |
## Model description
|
36 |
|
37 |
+
This model is part of AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. It aims to classify sentences as subjective or objective, a key component in combating misinformation, improving fact-checking pipelines, and supporting journalists. The model enhances transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, applied here with mDeBERTaV3-base, has shown consistent performance gains, particularly in subjective F1 score.
|
38 |
|
39 |
## Intended uses & limitations
|
40 |
|
41 |
+
This model is intended for subjectivity detection in sentences from news articles, classifying them as either subjective (opinion-laden) or objective. This capability is valuable for applications such as combating misinformation, improving fact-checking pipelines, and supporting journalists. It has been evaluated across monolingual (Arabic, German, English, Italian, Bulgarian), multilingual, and zero-shot settings (Greek, Romanian, Polish, Ukrainian).
|
42 |
+
|
43 |
+
A key strategy employed is decision threshold calibration to address class imbalance prevalent across languages. Users should be aware that the initial official multilingual Macro F1 score was lower due to a submission error (skewed class distribution), which was later corrected offline to Macro F1 = 0.68, placing the team 9th overall in the challenge.
|
44 |
|
45 |
## Training and evaluation data
|
46 |
|
47 |
+
The model was trained and evaluated on datasets provided for the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. Training and development datasets were available for Arabic, German, English, Italian, and Bulgarian. For final evaluation, additional unseen languages such as Greek, Romanian, Polish, and Ukrainian were used to assess generalization capabilities. The training incorporates sentiment scores from an auxiliary model and utilizes decision threshold calibration to mitigate class imbalance.
|
48 |
|
49 |
## Training procedure
|
50 |
|
|
|
76 |
- Transformers 4.49.0
|
77 |
- Pytorch 2.5.1+cu121
|
78 |
- Datasets 3.3.1
|
79 |
+
- Tokenizers 0.21.0
|
80 |
+
|
81 |
+
## How to use
|
82 |
+
|
83 |
+
You can use the model directly with the `transformers` library for text classification:
|
84 |
+
|
85 |
+
```python
|
86 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
87 |
+
import torch
|
88 |
+
|
89 |
+
model_name = "MatteoFasulo/mdeberta-v3-base-subjectivity-arabic"
|
90 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
91 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
92 |
+
|
93 |
+
text = "This is a very subjective opinion."
|
94 |
+
inputs = tokenizer(text, return_tensors="pt")
|
95 |
+
|
96 |
+
with torch.no_grad():
|
97 |
+
logits = model(**inputs).logits
|
98 |
+
|
99 |
+
predicted_class_id = logits.argmax().item()
|
100 |
+
print(model.config.id2label[predicted_class_id])
|
101 |
+
```
|
102 |
+
|
103 |
+
## Code
|
104 |
+
The official code and materials for this project are available on GitHub: [https://github.com/MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat).
|
105 |
+
|
106 |
+
## Citation
|
107 |
+
If you find our work helpful or inspiring, please feel free to cite it.
|
108 |
+
|
109 |
+
```bibtex
|
110 |
+
@misc{antoun2024camembert20smarterfrench,
|
111 |
+
title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
|
112 |
+
author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
|
113 |
+
year={2024},
|
114 |
+
eprint={2411.08868},
|
115 |
+
archivePrefix={arXiv},
|
116 |
+
primaryClass={cs.CL},
|
117 |
+
url={https://arxiv.org/abs/2411.08868},
|
118 |
+
}
|
119 |
+
```
|