nielsr HF Staff commited on
Commit
bfc8af5
·
verified ·
1 Parent(s): 7494486

Improve model card: license, tags, paper & GitHub links, usage, and description

Browse files

This PR significantly enhances the model card for `ModernBERT-base-subjectivity-english` by:

- Correcting the `license` metadata to `cc-by-4.0` as specified in the official GitHub repository.
- Adding `pipeline_tag: text-classification` for improved discoverability on the Hub.
- Including additional `tags` like `modernbert` and `subjectivity-detection`.
- Adding a direct link to the associated paper on Hugging Face Papers.
- Providing a link to the official GitHub repository.
- Populating the previously placeholder sections ("Model description", "Intended uses & limitations", "Training and evaluation data") with detailed information from the paper abstract and GitHub README.
- Adding a clear "How to use" section with a Python code snippet for inference.
- Removing the automatic generation boilerplate comment.

These updates provide a more comprehensive, accurate, and user-friendly model card.

Files changed (1) hide show
  1. README.md +47 -13
README.md CHANGED
@@ -1,25 +1,30 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
  base_model: answerdotai/ModernBERT-base
5
- tags:
6
- - generated_from_trainer
 
 
7
  metrics:
8
  - accuracy
9
  - f1
 
 
 
 
 
10
  model-index:
11
  - name: ModernBERT-base-subjectivity-english
12
  results: []
13
- language:
14
- - en
15
  ---
16
 
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
-
20
  # ModernBERT-base-subjectivity-english
21
 
22
- This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](arxiv.org/abs/2507.11764).
 
 
 
 
 
23
  It achieves the following results on the evaluation set:
24
  - Loss: 1.0478
25
  - Macro F1: 0.7034
@@ -32,15 +37,44 @@ It achieves the following results on the evaluation set:
32
 
33
  ## Model description
34
 
35
- More information needed
36
 
37
  ## Intended uses & limitations
38
 
39
- More information needed
40
 
41
  ## Training and evaluation data
42
 
43
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  ## Training procedure
46
 
 
1
  ---
 
 
2
  base_model: answerdotai/ModernBERT-base
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ license: cc-by-4.0
7
  metrics:
8
  - accuracy
9
  - f1
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - generated_from_trainer
13
+ - modernbert
14
+ - subjectivity-detection
15
  model-index:
16
  - name: ModernBERT-base-subjectivity-english
17
  results: []
 
 
18
  ---
19
 
 
 
 
20
  # ModernBERT-base-subjectivity-english
21
 
22
+ This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [CheckThat! Lab Task 1 Subjectivity Detection at CLEF 2025](https://arxiv.org/abs/2507.11764).
23
+
24
+ The model was presented in the paper [AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles](https://huggingface.co/papers/2507.11764).
25
+
26
+ The official code repository can be found at: [https://github.com/MatteoFasulo/clef2025-checkthat](https://github.com/MatteoFasulo/clef2025-checkthat)
27
+
28
  It achieves the following results on the evaluation set:
29
  - Loss: 1.0478
30
  - Macro F1: 0.7034
 
37
 
38
  ## Model description
39
 
40
+ This model, `ModernBERT-base-subjectivity-english`, is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) designed for subjectivity detection in news articles. It was developed as part of AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1, aiming to classify sentences as subjective or objective. The core innovation of this model lies in enhancing transformer-based embeddings by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This approach has shown to significantly boost performance, particularly the subjective F1 score, and aims to improve upon standard fine-tuning methods. To address prevalent class imbalance across languages, the model also employs decision threshold calibration optimized on the development set.
41
 
42
  ## Intended uses & limitations
43
 
44
+ This model is intended for classifying sentences in news articles as subjective (opinion-laden) or objective. This capability is crucial for applications such as combating misinformation, improving fact-checking pipelines, and supporting journalistic efforts. While this specific model is tailored for English, the broader research explored its effectiveness across monolingual (Arabic, German, Italian, Bulgarian) and zero-shot transfer settings (Greek, Polish, Romanian, Ukrainian). A key strength is its use of decision threshold calibration to mitigate class imbalance. However, users should note that the original submission had an issue with skewed class distribution which was later corrected, indicating the importance of proper data splits and calibration for optimal performance.
45
 
46
  ## Training and evaluation data
47
 
48
+ The `ModernBERT-base-subjectivity-english` model was fine-tuned on the English portion of the CheckThat! Lab Task 1: Subjectivity Detection in News Articles dataset provided for CLEF 2025. The training and development datasets included sentences in English (among other languages like Arabic, German, Italian, and Bulgarian). For final evaluation, the broader project also assessed generalization on unseen languages like Greek, Romanian, Polish, and Ukrainian. The training strategy involved augmenting transformer embeddings with sentiment signals and employing decision threshold calibration to improve performance and handle class imbalance.
49
+
50
+ ## How to use
51
+
52
+ You can use this model directly with the `transformers` library for text classification:
53
+
54
+ ```python
55
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
56
+ import torch
57
+
58
+ model_name = "MatteoFasulo/ModernBERT-base-subjectivity-english"
59
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
60
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
61
+
62
+ # Example text
63
+ text = "The new policy is an absolute disaster for the economy."
64
+
65
+ # Tokenize and perform inference
66
+ inputs = tokenizer(text, return_tensors="pt")
67
+ with torch.no_grad():
68
+ logits = model(**inputs).logits
69
+
70
+ # Get predicted class (0 for OBJ, 1 for SUBJ as per model config)
71
+ predicted_class_id = logits.argmax().item()
72
+ labels = model.config.id2label # Access the label mapping from model config
73
+ predicted_label = labels[predicted_class_id]
74
+
75
+ print(f"Text: '{text}'")
76
+ print(f"Predicted label: {predicted_label}")
77
+ ```
78
 
79
  ## Training procedure
80