Improve model card: Add pipeline tag, update license and tags, expand content and usage
Browse filesThis PR significantly enhances the model card for `mdeberta-v3-base-subjectivity-bulgarian` by:
* **Updating Metadata**:
* Adding `pipeline_tag: text-classification` for better discoverability on the Hugging Face Hub.
* Changing the `license` from `mit` to `cc-by-4.0` to accurately reflect the license specified in the project's GitHub repository.
* Adding descriptive `tags` such as `deberta`, `multilingual`, and `subjectivity-detection` to improve searchability and context.
* **Enriching Content**:
* Populating the "Model description", "Intended uses & limitations", and "Training and evaluation data" sections with detailed information extracted from the paper abstract and the associated GitHub README.
* Adding a clear and functional Python code snippet in the "How to use" section, demonstrating practical inference with the `transformers` library for Bulgarian text.
* Including a proper BibTeX "Citation" for the paper to acknowledge the original work.
* **Improving Navigation**:
* Ensuring direct links to the paper (keeping the existing arXiv link as per instructions) and the GitHub repository are prominently displayed.
These changes aim to provide a more informative, accessible, and compliant model card for the community.
@@ -1,25 +1,29 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
-
license: mit
|
4 |
base_model: microsoft/mdeberta-v3-base
|
5 |
-
|
6 |
-
-
|
|
|
|
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
- f1
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
model-index:
|
11 |
- name: mdeberta-v3-base-subjectivity-bulgarian
|
12 |
results: []
|
13 |
-
language:
|
14 |
-
- bg
|
15 |
---
|
16 |
|
17 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
18 |
-
should probably proofread and complete it, then remove this comment. -->
|
19 |
-
|
20 |
# mdeberta-v3-base-subjectivity-bulgarian
|
21 |
|
22 |
-
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base)
|
|
|
|
|
|
|
23 |
It achieves the following results on the evaluation set:
|
24 |
- Loss: 0.5111
|
25 |
- Macro F1: 0.7869
|
@@ -32,15 +36,27 @@ It achieves the following results on the evaluation set:
|
|
32 |
|
33 |
## Model description
|
34 |
|
35 |
-
|
|
|
|
|
36 |
|
37 |
## Intended uses & limitations
|
38 |
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## Training and evaluation data
|
42 |
|
43 |
-
|
|
|
|
|
44 |
|
45 |
## Training procedure
|
46 |
|
@@ -66,10 +82,63 @@ The following hyperparameters were used during training:
|
|
66 |
| No log | 5.0 | 230 | 0.5065 | 0.7728 | 0.7835 | 0.7696 | 0.7315 | 0.7966 | 0.6763 | 0.7803 |
|
67 |
| No log | 6.0 | 276 | 0.5111 | 0.7869 | 0.7949 | 0.7839 | 0.7510 | 0.8033 | 0.7050 | 0.7930 |
|
68 |
|
69 |
-
|
70 |
### Framework versions
|
71 |
|
72 |
- Transformers 4.50.0
|
73 |
- Pytorch 2.5.1+cu121
|
74 |
- Datasets 3.3.1
|
75 |
-
- Tokenizers 0.21.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
2 |
base_model: microsoft/mdeberta-v3-base
|
3 |
+
language:
|
4 |
+
- bg
|
5 |
+
library_name: transformers
|
6 |
+
license: cc-by-4.0
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
- f1
|
10 |
+
tags:
|
11 |
+
- generated_from_trainer
|
12 |
+
- deberta
|
13 |
+
- multilingual
|
14 |
+
- subjectivity-detection
|
15 |
+
pipeline_tag: text-classification
|
16 |
model-index:
|
17 |
- name: mdeberta-v3-base-subjectivity-bulgarian
|
18 |
results: []
|
|
|
|
|
19 |
---
|
20 |
|
|
|
|
|
|
|
21 |
# mdeberta-v3-base-subjectivity-bulgarian
|
22 |
|
23 |
+
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) for **Subjectivity Detection in News Articles**. It was presented by AI Wizards in the paper [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://arxiv.org/abs/2507.11764) as part of the [CLEF 2025 CheckThat! Lab Task 1](https://huggingface.co/papers/2507.11764).
|
24 |
+
|
25 |
+
The official code and materials for this project can be found on the [GitHub repository](https://github.com/MatteoFasulo/clef2025-checkthat).
|
26 |
+
|
27 |
It achieves the following results on the evaluation set:
|
28 |
- Loss: 0.5111
|
29 |
- Macro F1: 0.7869
|
|
|
36 |
|
37 |
## Model description
|
38 |
|
39 |
+
This model identifies whether a sentence is **subjective** (e.g., opinion-laden) or **objective**. This task is a key component in combating misinformation, improving fact-checking pipelines, and supporting journalists. This specific checkpoint is fine-tuned for the Bulgarian language.
|
40 |
+
|
41 |
+
The primary strategy behind this model involves enhancing transformer-based classifiers (specifically mDeBERTaV3-base) by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This aims to improve upon standard fine-tuning, particularly boosting subjective F1 score. To address class imbalance prevalent across languages, decision threshold calibration optimized on the development set was employed. The research achieved high rankings in the CLEF 2025 CheckThat! Lab Task 1, notably ranking 1st for Greek (zero-shot, Macro F1 = 0.51) and securing 1st–4th place in most monolingual settings.
|
42 |
|
43 |
## Intended uses & limitations
|
44 |
|
45 |
+
**Intended Uses:**
|
46 |
+
This model is intended for research and practical applications focused on subjectivity detection in news articles, particularly for distinguishing subjective (opinion-laden) from objective content. It can be particularly useful in:
|
47 |
+
* Combating misinformation by identifying opinionated content.
|
48 |
+
* Improving fact-checking pipelines.
|
49 |
+
* Supporting journalists in content analysis and understanding bias.
|
50 |
+
|
51 |
+
**Limitations:**
|
52 |
+
* While the overarching research explored multilingual and zero-shot settings, this specific model checkpoint is fine-tuned for Bulgarian. Its performance might vary when applied to other languages or domains not represented in the training data without further fine-tuning.
|
53 |
+
* The paper notes that an initial submission quirk led to skewed class distribution and under-calibrated thresholds; the reported results reflect the corrected evaluation. Users should be aware of potential nuances when applying the model to data with significantly different class distributions.
|
54 |
|
55 |
## Training and evaluation data
|
56 |
|
57 |
+
This model was trained and evaluated as part of the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian. The final evaluation included additional unseen languages such as Greek, Romanian, Polish, and Ukrainian to assess generalization capabilities.
|
58 |
+
|
59 |
+
To address class imbalance, a common issue across these languages, a decision threshold calibration optimized on the development set was employed. More details on the datasets and experimental setup can be found in the [paper](https://arxiv.org/abs/2507.11764) and the [GitHub repository](https://github.com/MatteoFasulo/clef2025-checkthat).
|
60 |
|
61 |
## Training procedure
|
62 |
|
|
|
82 |
| No log | 5.0 | 230 | 0.5065 | 0.7728 | 0.7835 | 0.7696 | 0.7315 | 0.7966 | 0.6763 | 0.7803 |
|
83 |
| No log | 6.0 | 276 | 0.5111 | 0.7869 | 0.7949 | 0.7839 | 0.7510 | 0.8033 | 0.7050 | 0.7930 |
|
84 |
|
|
|
85 |
### Framework versions
|
86 |
|
87 |
- Transformers 4.50.0
|
88 |
- Pytorch 2.5.1+cu121
|
89 |
- Datasets 3.3.1
|
90 |
+
- Tokenizers 0.21.0
|
91 |
+
|
92 |
+
## How to use
|
93 |
+
|
94 |
+
You can use this model directly with the Hugging Face `transformers` library for text classification:
|
95 |
+
|
96 |
+
```python
|
97 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
98 |
+
import torch
|
99 |
+
|
100 |
+
# Load tokenizer and model
|
101 |
+
tokenizer = AutoTokenizer.from_pretrained("MatteoFasulo/mdeberta-v3-base-subjectivity-bulgarian")
|
102 |
+
model = AutoModelForSequenceClassification.from_pretrained("MatteoFasulo/mdeberta-v3-base-subjectivity-bulgarian")
|
103 |
+
|
104 |
+
# Example for an objective sentence (Bulgarian)
|
105 |
+
text_objective = "Правителството обяви нови мерки за борба с инфлацията." # "The government announced new measures to combat inflation."
|
106 |
+
inputs_objective = tokenizer(text_objective, return_tensors="pt")
|
107 |
+
|
108 |
+
with torch.no_grad():
|
109 |
+
logits_objective = model(**inputs_objective).logits
|
110 |
+
|
111 |
+
predicted_class_id_objective = logits_objective.argmax().item()
|
112 |
+
predicted_label_objective = model.config.id2label[predicted_class_id_objective]
|
113 |
+
|
114 |
+
print(f"Text: '{text_objective}'")
|
115 |
+
print(f"Predicted label: {predicted_label_objective}")
|
116 |
+
# Expected output: Predicted label: OBJ (or similar with score)
|
117 |
+
|
118 |
+
# Example for a subjective sentence (Bulgarian)
|
119 |
+
text_subjective = "Според мен това е най-доброто решение." # "In my opinion, this is the best decision."
|
120 |
+
inputs_subjective = tokenizer(text_subjective, return_tensors="pt")
|
121 |
+
with torch.no_grad():
|
122 |
+
logits_subjective = model(**inputs_subjective).logits
|
123 |
+
predicted_class_id_subjective = logits_subjective.argmax().item()
|
124 |
+
predicted_label_subjective = model.config.id2label[predicted_class_id_subjective]
|
125 |
+
print(f"Text: '{text_subjective}'")
|
126 |
+
print(f"Predicted label: {predicted_label_subjective}")
|
127 |
+
# Expected output: Predicted label: SUBJ (or similar with score)
|
128 |
+
```
|
129 |
+
|
130 |
+
## Citation
|
131 |
+
|
132 |
+
If you find this work helpful or inspiring, please consider citing the original paper:
|
133 |
+
|
134 |
+
```bibtex
|
135 |
+
@misc{antoun2024camembert20smarterfrench,
|
136 |
+
title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
|
137 |
+
author={Matteo Fasulo and Stefan Petkov and Antonio Toral},
|
138 |
+
year={2025},
|
139 |
+
eprint={2507.11764},
|
140 |
+
archivePrefix={arXiv},
|
141 |
+
primaryClass={cs.CL},
|
142 |
+
url={https://arxiv.org/abs/2507.11764},
|
143 |
+
}
|
144 |
+
```
|