metadata
language: en
license: agpl-3.0
datasets:
- edqian/twitter-climate-change-sentiment-dataset
metrics:
- accuracy
- f1
- precision
- recall
base_model: bert-base-uncased
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- climate-change
- twitter
- bert
BERT Climate Sentiment Analysis Model
Model Description
This model fine-tunes BERT (bert-base-uncased) to perform sentiment analysis on climate change-related tweets. It classifies tweets into four sentiment categories: anti-climate (negative), neutral, pro-climate (positive), and news.
Model Details
- Model Type: Fine-tuned BERT (bert-base-uncased)
- Version: 1.0.0
- Framework: PyTorch & Transformers
- Language: English
- License: AGPL-3.0
Training Data
This model was trained on the Twitter Climate Change Sentiment Dataset, which contains tweets related to climate change labeled with sentiment categories:
- news: Factual news about climate change (2)
- pro: Supporting action on climate change (1)
- neutral: Neutral stance on climate change (0)
- anti: Skeptical about climate change claims (-1)
The dataset was cleaned with the following steps:
Features | Strategy |
---|---|
Hashtag | Removed |
Mention | Removed |
RT Tag | Removed |
URL | Removed |
Stop Words | Removed |
Special Characters | Removed |
Training Procedure
- Training Framework: PyTorch with Transformers
- Training Approach: Fine-tuning the entire BERT model
- Optimizer: AdamW with learning rate 2e-5
- Batch Size: 64
- Early Stopping: Yes, with patience of 3 epochs
- Hardware: GPU acceleration (when available)
Model Performance
- AUC-ROC
- Training and Validation Loss
Limitations and Biases
- The model is trained on Twitter data, which may not generalize well to other text sources.
- Twitter data may contain inherent biases in how climate change is discussed.
- The model might struggle with complex or nuanced sentiment expressions.
- Sarcasm and figurative language may be misclassified.
- The model is only trained for English language content.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("keanteng/bert-base-clean-climate-sentiment-wqf7007")
# Prepare text
text = "Climate change is real and we need to act now!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=1)
# Map prediction to sentiment
sentiment_map = {-1: "anti", 0: "neutral", 1: "pro", 2: "news"}
predicted_sentiment = sentiment_map[predictions.item()]
print("Predicted sentiment: " + predicted_sentiment)
Ethical Considerations
This model should be used responsibly for analyzing climate sentiment and should not be deployed in ways that might:
- Amplify misinformation about climate change
- Target or discriminate against specific groups
- Make critical decisions without human oversight