Improve model card: Add pipeline tag, update metadata, and expand content

This PR significantly improves the model card for `mdeberta-v3-base-subjectivity-multilingual-no-arabic` by:

- **Adding the `pipeline_tag: text-classification`** to ensure proper discoverability on the Hugging Face Hub.
- **Updating the `license`** to `cc-by-4.0` as stated in the official GitHub repository.
- **Refining `language` and `tags` metadata** to better describe the model's scope and task.
- **Expanding the \"Model description\", \"Intended uses & limitations\", and \"Training and evaluation data\" sections** with details extracted from the paper abstract and the GitHub README.
- **Adding a direct link to the official GitHub repository** for easy access to the source code.
- **Providing a clear Python usage example** using the `transformers` `pipeline` for text classification.
- **Adding a BibTeX citation** for the associated paper.

Please review and merge.

Files changed (1) hide show

README.md +64 -13

README.md CHANGED Viewed

@@ -1,23 +1,30 @@
 ---
-library_name: transformers
-license: mit
 base_model: microsoft/mdeberta-v3-base
-tags:
-- generated_from_trainer
 metrics:
 - accuracy
 - f1
 model-index:
 - name: mdeberta-v3-base-subjectivity-multilingual-no-arabic
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # mdeberta-v3-base-subjectivity-multilingual-no-arabic
-This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
 It achieves the following results on the evaluation set:
 - Loss: 0.7196
 - Macro F1: 0.8071
@@ -30,15 +37,20 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -64,10 +76,49 @@ The following hyperparameters were used during training:
 | 0.2749        | 5.0   | 1245 | 0.7195          | 0.7996   | 0.8038  | 0.7963  | 0.7461  | 0.7689 | 0.7247 | 0.8139   |
 | 0.2749        | 6.0   | 1494 | 0.7196          | 0.8071   | 0.8037  | 0.8123  | 0.7658  | 0.7367 | 0.7973 | 0.8159   |
 ### Framework versions
 - Transformers 4.47.0
 - Pytorch 2.5.1+cu121
 - Datasets 3.3.1
-- Tokenizers 0.21.0

 ---
 base_model: microsoft/mdeberta-v3-base
+library_name: transformers
+license: cc-by-4.0
 metrics:
 - accuracy
 - f1
+pipeline_tag: text-classification
+language: multilingual
+tags:
+- text-classification
+- subjectivity-detection
+- news-articles
+- multilingual
+- deberta-v3
+- generated_from_trainer
 model-index:
 - name: mdeberta-v3-base-subjectivity-multilingual-no-arabic
   results: []
 ---
 # mdeberta-v3-base-subjectivity-multilingual-no-arabic
+This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) for subjectivity detection in news articles. It was presented in the paper [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://arxiv.org/abs/2507.11764).
+**GitHub Repository**: For the official code and more details, please refer to the [GitHub repository](https://github.com/MatteoFasulo/clef2025-checkthat).
 It achieves the following results on the evaluation set:
 - Loss: 0.7196
 - Macro F1: 0.8071
 ## Model description
+This model is a fine-tuned version of `microsoft/mdeberta-v3-base` for **Subjectivity Detection in News Articles**. It classifies sentences as subjective or objective across monolingual, multilingual, and zero-shot settings. The core innovation lies in enhancing transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, applied here with mDeBERTaV3-base, aims to improve upon standard fine-tuning, particularly boosting subjective F1 score. Decision threshold calibration was also employed to address class imbalance.
 ## Intended uses & limitations
+This model is intended to identify whether a sentence is **subjective** (e.g., opinion-laden) or **objective**, making it a valuable tool for combating misinformation, improving fact-checking pipelines, and supporting journalists in content analysis.
+**Limitations:**
+*   This specific model (`multilingual-no-arabic`) was fine-tuned on the multilingual dataset *excluding Arabic* data.
+*   While designed for multilingual and zero-shot transfer, performance can vary significantly across languages and specific domains.
+*   The original submission process had a mistake where a custom train/dev mix was inadvertently used, leading to skewed class distribution and under-calibrated decision thresholds for the official multilingual Macro F1 score (0.24). A re-evaluation with the correct data split yielded a Macro F1 = 0.68, which would have placed the model 9th overall in the challenge.
 ## Training and evaluation data
+Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian as part of the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. This specific model was trained on the multilingual dataset, *excluding Arabic* data. Final evaluation included additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) to assess generalization capabilities.
 ## Training procedure
 | 0.2749        | 5.0   | 1245 | 0.7195          | 0.7996   | 0.8038  | 0.7963  | 0.7461  | 0.7689 | 0.7247 | 0.8139   |
 | 0.2749        | 6.0   | 1494 | 0.7196          | 0.8071   | 0.8037  | 0.8123  | 0.7658  | 0.7367 | 0.7973 | 0.8159   |
 ### Framework versions
 - Transformers 4.47.0
 - Pytorch 2.5.1+cu121
 - Datasets 3.3.1
+- Tokenizers 0.21.0
+## How to use
+You can use the model with the `pipeline` API from the `transformers` library for text classification:
+```python
+from transformers import pipeline
+model_name = "MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic"
+# The pipeline automatically infers the task and labels from the model config
+classifier = pipeline("text-classification", model=model_name)
+# Example usage:
+# A subjective sentence
+result_subj = classifier("This is a truly amazing and groundbreaking discovery!")
+print(f"Sentence: 'This is a truly amazing and groundbreaking discovery!' -> {result_subj}")
+# An objective sentence
+result_obj = classifier("The new policy will be implemented next quarter.")
+print(f"Sentence: 'The new policy will be implemented next quarter.' -> {result_obj}")
+```
+## Citation
+If you find our work helpful or inspiring, please feel free to cite the paper:
+```bibtex
+@article{fasulo2025ai,
+  title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles},
+  author={Fasulo, Matteo and Fabris, Alessandro and Caldararu, Silvia and Kiselov, Valerij and Stoica, George and Ilie, Andrei},
+  journal={arXiv preprint arXiv:2507.11764},
+  year={2025}
+}
+```
+You can find the official paper on Hugging Face Papers: [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://huggingface.co/papers/2507.11764).
+## License
+This work is licensed under the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/deed.en) (CC BY 4.0).