clapAI
/

modernBERT-base-multilingual-sentiment

@@ -3,7 +3,13 @@ library_name: transformers
 license: apache-2.0
 base_model: answerdotai/ModernBERT-base
 tags:
-- generated_from_trainer
 metrics:
 - f1
 - precision
@@ -11,6 +17,26 @@ metrics:
 model-index:
 - name: clapAI/modernBERT-base-multilingual-sentiment
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -18,59 +44,232 @@ should probably proofread and complete it, then remove this comment. -->
 # clapAI/modernBERT-base-multilingual-sentiment
-This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.4517
-- F1: 0.8012
-- Precision: 0.8020
-- Recall: 0.8007
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 512
-- eval_batch_size: 512
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 2
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 2048
-- total_eval_batch_size: 1024
-- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.01
-- num_epochs: 5.0
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | F1     | Precision | Recall |
-|:-------------:|:-----:|:----:|:---------------:|:------:|:---------:|:------:|
-| 0.9287        | 1.0   | 1537 | 0.4626          | 0.7910 | 0.7940    | 0.7897 |
-| 0.8356        | 2.0   | 3074 | 0.4441          | 0.8011 | 0.8009    | 0.8015 |
-| 0.7488        | 3.0   | 4611 | 0.4517          | 0.8012 | 0.8020    | 0.8007 |
-| 0.6177        | 4.0   | 6148 | 0.4915          | 0.7990 | 0.7989    | 0.7991 |
-| 0.5174        | 5.0   | 7685 | 0.5464          | 0.7944 | 0.7945    | 0.7944 |
 ### Framework versions
-- Transformers 4.48.0.dev0
-- Pytorch 2.4.0+cu121
-- Datasets 3.2.0
-- Tokenizers 0.21.0

 license: apache-2.0
 base_model: answerdotai/ModernBERT-base
 tags:
+- sentiment
+- text-classification
+- multilingual
+- modernbert
+- sentiment-analysis
+- product-reviews
+- place-reviews
 metrics:
 - f1
 - precision
 model-index:
 - name: clapAI/modernBERT-base-multilingual-sentiment
   results: []
+datasets:
+- clapAI/MultiLingualSentiment
+language:
+- en
+- zh
+- vi
+- ko
+- ja
+- ar
+- de
+- es
+- fr
+- hi
+- id
+- it
+- ms
+- pt
+- ru
+- tr
+pipeline_tag: text-classification
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # clapAI/modernBERT-base-multilingual-sentiment
+## Introduction
+**modernBERT-base-multilingual-sentiment** is a multilingual sentiment classification model, part of
+the [Multilingual-Sentiment](https://huggingface.co/collections/clapAI/multilingual-sentiment-677416a6b23e03f52cb6cc3f)
+collection.
+The model is fine-tuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) using the
+multilingual sentiment
+dataset [clapAI/MultiLingualSentiment](https://huggingface.co/datasets/clapAI/MultiLingualSentiment).
+Model supports multilingual sentiment classification across 16+ languages, including English, Vietnamese, Chinese,
+French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Arabic, and more.
+## Evaluation & Performance
+After fine-tuning, the best model is loaded and evaluated on the `test` dataset
+from [clapAI/MultiLingualSentiment](https://huggingface.co/datasets/clapAI/MultiLingualSentiment)
+|                                                      Model                                                       | Pretrained Model  | Parameters  | Latency (ms) |    F1    | Precision |  Recall  |
+|:----------------------------------------------------------------------------------------------------------------:|:-----------------:|:-----------:|:------------:|:--------:|:---------:|:--------:|
+|  [modernBERT-base-multilingual-sentiment](https://huggingface.co/clapAI/modernBERT-base-multilingual-sentiment)  |  ModernBERT-base  | 149.607.171 |   Updating   |  80.16   |   80.26   |  80.09   |
+| [modernBERT-large-multilingual-sentiment](https://huggingface.co/clapAI/modernBERT-large-multilingual-sentiment) | ModernBERT-large  | 395.833.346 |   Updating   | Updating | Updating  | Updating |
+|     [roberta-base-multilingual-sentiment](https://huggingface.co/clapAI/roberta-base-multilingual-sentiment)     | XLM-roberta-base  | 278.045.186 |   Updating   | Updating | Updating  | Updating |
+|    [roberta-large-multilingual-sentiment](https://huggingface.co/clapAI/roberta-large-multilingual-sentiment)    | XLM-roberta-large | 559.892.482 |   Updating   | Updating | Updating  | Updating |
+## How to use
+### Requirements
+Since **transformers** only supports the **ModernBERT** architecture from version `4.48.0.dev0`, use the following
+command to get the required version:
+```bash
+pip install "git+https://github.com/huggingface/transformers.git@6e0515e99c39444caae39472ee1b2fd76ece32f1" --upgrade
+```
+Install **FlashAttention** to accelerate inference performance
+```bash
+pip install flash-attn==2.7.2.post1
+```
+### Quick start
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model_id = "clapAI/modernBERT-base-multilingual-sentiment"
+# Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id, torch_dtype=torch.float16)
+model.to(device)
+model.eval()
+# Retrieve labels from the model's configuration
+id2label = model.config.id2label
+texts = [
+    # English
+    {
+        "text": "I absolutely love the new design of this app!",
+        "label": "positive"
+    },
+    {
+        "text": "The customer service was disappointing.",
+        "label": "negative"
+    },
+    # Arabic
+    {
+        "text": "هذا المنتج رائع للغاية!",
+        "label": "positive"
+    },
+    {
+        "text": "الخدمة كانت سيئة للغاية.",
+        "label": "negative"
+    },
+    # German
+    {
+        "text": "Ich bin sehr zufrieden mit dem Kauf.",
+        "label": "positive"
+    },
+    {
+        "text": "Die Lieferung war eine Katastrophe.",
+        "label": "negative"
+    },
+    # Spanish
+    {
+        "text": "Este es el mejor libro que he leído.",
+        "label": "positive"
+    },
+    {
+        "text": "El producto llegó roto y no funciona.",
+        "label": "negative"
+    },
+    # French
+    {
+        "text": "J'adore ce restaurant, la nourriture est délicieuse!",
+        "label": "positive"
+    },
+    {
+        "text": "Le service était très lent et désagréable.",
+        "label": "negative"
+    },
+    # Indonesian
+    {
+        "text": "Saya sangat senang dengan pelayanan ini.",
+        "label": "positive"
+    },
+    {
+        "text": "Makanannya benar-benar tidak enak.",
+        "label": "negative"
+    },
+    # Japanese
+    {
+        "text": "この製品は本当に素晴らしいです！",
+        "label": "positive"
+    },
+    {
+        "text": "サービスがひどかったです。",
+        "label": "negative"
+    },
+    # Korean
+    {
+        "text": "이 제품을 정말 좋아해요!",
+        "label": "positive"
+    },
+    {
+        "text": "고객 서비스가 정말 실망스러웠어요.",
+        "label": "negative"
+    },
+    # Russian
+    {
+        "text": "Этот фильм просто потрясающий!",
+        "label": "positive"
+    },
+    {
+        "text": "Качество было ужасным.",
+        "label": "negative"
+    },
+    # Vietnamese
+    {
+        "text": "Tôi thực sự yêu thích sản phẩm này!",
+        "label": "positive"
+    },
+    {
+        "text": "Dịch vụ khách hàng thật tệ.",
+        "label": "negative"
+    },
+    # Chinese
+    {
+        "text": "我非常喜欢这款产品！",
+        "label": "positive"
+    },
+    {
+        "text": "质量真的很差。",
+        "label": "negative"
+    }
+]
+for item in texts:
+    text = item["text"]
+    label = item["label"]
+    inputs = tokenizer(text, return_tensors="pt").to(device)
+    # Perform inference in inference mode
+    with torch.inference_mode():
+        outputs = model(**inputs)
+        predictions = outputs.logits.argmax(dim=-1)
+    print(f"Text: {text} | Label: {label} | Prediction: {id2label[predictions.item()]}")
+```
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
+```yaml
+learning_rate: 5e-05
+train_batch_size: 512
+eval_batch_size: 512
+seed: 42
+distributed_type: multi-GPU
+num_devices: 2
+gradient_accumulation_steps: 2
+total_train_batch_size: 2048
+total_eval_batch_size: 1024
+optimizer:
+  type: adamw_torch_fused
+  betas: [ 0.9, 0.999 ]
+  epsilon: 1e-08
+  optimizer_args: "No additional optimizer arguments"
+lr_scheduler:
+  type: cosine
+  warmup_ratio: 0.01
+num_epochs: 5.0
+mixed_precision_training: Native AMP
+```
 ### Framework versions
+```plaintex
+transformers==4.48.0.dev0
+torch==2.4.0+cu121
+datasets==3.2.0
+tokenizers==0.21.0
+flash-attn==2.7.2.post1
+```
+## Citation
+If you find our project helpful, please star our repo and cite our work. Thanks!
+```bibtex
+@misc{modernBERT-base-multilingual-sentiment,
+      title={modernBERT-base-multilingual-sentiment: A Multilingual Sentiment Classification Model},
+      author={clapAI},
+      howpublished={\url{https://huggingface.co/clapAI/modernBERT-base-multilingual-sentiment}},
+      year={2025},
+}