--- license: mit datasets: - haipradana/indonesian-twitter-hate-speech-cleaned language: - id tags: - bert - RoBERTa - tweet - hate - twitter base_model: - cardiffnlp/twitter-roberta-base-sentiment-latest --- # Fine-tuned RoBERTa pre-trained model to classify Indonesian hate tweet(s) Just check GitHub for full-code and Google Colab: https://github.com/haipradana/RoBERTa-Indonesian-Hate-Tweet-Classification/tree/main This project fine-tunes a RoBERTa model from [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) to classify Indonesian tweets as either **neutral** or **hate speech**. ## How to use this model? ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model tokenizer = AutoTokenizer.from_pretrained('./model') model = AutoModelForSequenceClassification.from_pretrained('./model') # Predict def predict(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=511) with torch.no_grad(): outputs = model(**inputs) prediction = torch.argmax(outputs.logits, dim=1).item() return 'hate' if prediction == 1 else 'neutral' # Example result = predict("Paru-parumu terbuat dari batu ya? udah sakit gini masih aja merokok!") print(result) # Output: hate ``` ### Or just using the script in the GitHub Repos ```bash cd scripts python predict.py ``` ## Performance Metrics ``` Accuracy: 82.01% Precision: 82.68% Recall: 81.72% F1-Score: 82.19% ```