Omartificial-Intelligence-Space's picture
update readme.md
07fa0f1 verified
|
raw
history blame
1.68 kB
metadata
license: apache-2.0
datasets:
  - arbml/SANAD
language:
  - ar
base_model:
  - answerdotai/ModernBERT-base
pipeline_tag: text-classification
library_name: transformers
tags:
  - modernbert
  - arabic

ModernBERT Arabic Model Card

Overview

This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.

This is an Experimental Arabic version of ModernBERT-base,trained ONLY on Topic Classification Task using the base model of original modernbert with a custom Arabic trained tokenizer with the following details:

  • Dataset: Arabic Wikipedia
  • Size: 1.8 GB
  • Tokens: 228,788,529 tokens

This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.

Model Eval Details

  • Epochs: 3
  • Evaluation Metrics:
    • F1 Score: 0.95
    • Loss: 0.1998
  • Training Step: 47,862

Dataset Used For Training:

  • SANAD DATASET was used for training and testing which contains 7 different topics such as Politics, Finance, Medical, Culture, Sport , Tech and Religion.

How to Use

The model can be used for text classification using the transformers library. Below is an example:

from transformers import pipeline

# Load model from huggingface.co/models using our repository ID
classifier = pipeline(
    task="text-classification",
    model="Omartificial-Intelligence-Space/AraModernBert-Topic-Classifier",
)

sample = '''
PUT SOME TEXT HERE TO CLASSIFY ITS TOPIC
'''

classifier(sample)

# [{'label': 'health', 'score': 0.6779336333274841}]