metadata
license: apache-2.0
datasets:
- arbml/SANAD
language:
- ar
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: text-classification
library_name: transformers
tags:
- modernbert
- arabic
ModernBERT Arabic Model Card
Overview
This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
This is an Experimental Arabic version of ModernBERT-base,trained ONLY on Topic Classification Task using the base model of original modernbert with a custom Arabic trained tokenizer with the following details:
- Dataset: Arabic Wikipedia
- Size: 1.8 GB
- Tokens: 228,788,529 tokens
This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
Model Eval Details
- Epochs: 3
- Evaluation Metrics:
- F1 Score: 0.95
- Loss: 0.1998
- Training Step: 47,862
Dataset Used For Training:
- SANAD DATASET was used for training and testing which contains 7 different topics such as Politics, Finance, Medical, Culture, Sport , Tech and Religion.
How to Use
The model can be used for text classification using the transformers
library. Below is an example:
from transformers import pipeline
# Load model from huggingface.co/models using our repository ID
classifier = pipeline(
task="text-classification",
model="Omartificial-Intelligence-Space/AraModernBert-Topic-Classifier",
)
sample = '''
PUT SOME TEXT HERE TO CLASSIFY ITS TOPIC
'''
classifier(sample)
# [{'label': 'health', 'score': 0.6779336333274841}]