NaBI Model: Nepali Bias & Information Classifier
The NaBI Model is a text classifier for Nepali content, designed to automatically detect bias, misinformation, and hate speech. Trained on a balanced dataset created using oversampling techniques to address class imbalances in the real-world NaBI data, the model achieves 99% accuracy on this balanced split.
Overview
Task: Multi-Class Text Classification
Categories:- Bias (editorial bias, user comment bias, etc.)
- Normal
- Misinformation
- Hate Speech
Model Performance:
Achieves 99% accuracy on a balanced dataset obtained via oversampling to mitigate class imbalance. Please note that further inference using the model on real-world data can help label additional biased and misinformation news, paving the way for continuous dataset expansion.Dataset Details:
The dataset is derived from real-world Nepali content, which was originally imbalanced. Oversampling was used during training to ensure sufficient representation of underrepresented classes.Real-World Implications and Future Work:
Although oversampling allowed the model to learn effectively from balanced data, the original dataset remains imbalanced. Further inference using this model on unlabeled real-world data (biased, misinformation news, etc.) can facilitate the creation of a larger, more diverse dataset over time.
Usage
Below is a simple example of how to use the NaBI Model with the Hugging Face Transformers library:
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification", model="Utkarsha666/NaBI-Bert")
# Classify a sample Nepali text
sample_text = "यहाँ नेपालीमा तपाईंको पाठ राख्नुहोस्।"
result = classifier(sample_text)
print(result)
- Downloads last month
- 220
Model tree for Utkarsha666/NaBI-Bert
Base model
google-bert/bert-base-multilingual-cased