File size: 3,547 Bytes
418300b dda83f0 418300b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
language: en
license: agpl-3.0
datasets:
- edqian/twitter-climate-change-sentiment-dataset
metrics:
- accuracy
- f1
- precision
- recall
base_model: bert-base-uncased
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- climate-change
- twitter
- bert
---
# BERT Climate Sentiment Analysis Model
## Model Description
This model fine-tunes BERT (bert-base-uncased) to perform sentiment analysis on climate change-related tweets. It classifies tweets into four sentiment categories: anti-climate (negative), neutral, pro-climate (positive), and news.
## Model Details
- **Model Type:** Fine-tuned BERT (bert-base-uncased)
- **Version:** 1.0.0
- **Framework:** PyTorch & Transformers
- **Language:** English
- **License:** [AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.en.html)
## Training Data
This model was trained on the [Twitter Climate Change Sentiment Dataset](https://www.kaggle.com/datasets/edqian/twitter-climate-change-sentiment-dataset/data), which contains tweets related to climate change labeled with sentiment categories:
- **news**: Factual news about climate change (2)
- **pro**: Supporting action on climate change (1)
- **neutral**: Neutral stance on climate change (0)
- **anti**: Skeptical about climate change claims (-1)
The dataset was cleaned with the following steps:
|Features | Strategy |
|---------|----------|
| Hashtag | Removed |
| Mention | Removed |
| RT Tag | Removed |
| URL | Removed |
| Stop Words | Removed |
| Special Characters | Removed |
## Training Procedure
- **Training Framework:** PyTorch with Transformers
- **Training Approach:** Fine-tuning the entire BERT model
- **Optimizer:** AdamW with learning rate 2e-5
- **Batch Size:** 64
- **Early Stopping:** Yes, with patience of 3 epochs
- **Hardware:** GPU acceleration (when available)
## Model Performance
- AUC-ROC

- Training and Validation Loss

## Limitations and Biases
- The model is trained on Twitter data, which may not generalize well to other text sources.
- Twitter data may contain inherent biases in how climate change is discussed.
- The model might struggle with complex or nuanced sentiment expressions.
- Sarcasm and figurative language may be misclassified.
- The model is only trained for English language content.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("keanteng/bert-base-clean-climate-sentiment-wqf7007")
# Prepare text
text = "Climate change is real and we need to act now!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=1)
# Map prediction to sentiment
sentiment_map = {-1: "anti", 0: "neutral", 1: "pro", 2: "news"}
predicted_sentiment = sentiment_map[predictions.item()]
print("Predicted sentiment: " + predicted_sentiment)
```
## Ethical Considerations
This model should be used responsibly for analyzing climate sentiment and should not be deployed in ways that might:
- Amplify misinformation about climate change
- Target or discriminate against specific groups
- Make critical decisions without human oversight
|