Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### **BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification**
|
2 |
+
|
3 |
+
This repository hosts a quantized version of the BERT model, fine-tuned for **Disaster SOS Message Classification**. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy.
|
4 |
+
|
5 |
+
## **Model Details**
|
6 |
+
|
7 |
+
- **Model Architecture:** BERT Base Uncased
|
8 |
+
- **Task:** Disaster SOS Message Classification
|
9 |
+
- **Dataset:** Disaster Response Messages Dataset
|
10 |
+
- **Quantization:** Float16
|
11 |
+
- **Fine-tuning Framework:** Hugging Face Transformers
|
12 |
+
|
13 |
+
## **Usage**
|
14 |
+
|
15 |
+
### **Installation**
|
16 |
+
|
17 |
+
```sh
|
18 |
+
pip install transformers torch
|
19 |
+
```
|
20 |
+
|
21 |
+
### **Loading the Model**
|
22 |
+
|
23 |
+
```python
|
24 |
+
from transformers import BertForSequenceClassification, BertTokenizer
|
25 |
+
import torch
|
26 |
+
|
27 |
+
# Load quantized model
|
28 |
+
quantized_model_path = "/kaggle/working/bert_finetuned_fp16"
|
29 |
+
quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path)
|
30 |
+
quantized_model.eval() # Set to evaluation mode
|
31 |
+
quantized_model.half() # Convert model to FP16
|
32 |
+
|
33 |
+
# Load tokenizer
|
34 |
+
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
35 |
+
|
36 |
+
# Define a test SOS message
|
37 |
+
test_message = "There is a massive earthquake, and people need help immediately!"
|
38 |
+
|
39 |
+
# Tokenize input
|
40 |
+
inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128)
|
41 |
+
|
42 |
+
# Ensure input tensors are in correct dtype
|
43 |
+
inputs["input_ids"] = inputs["input_ids"].long()
|
44 |
+
inputs["attention_mask"] = inputs["attention_mask"].long()
|
45 |
+
|
46 |
+
# Make prediction
|
47 |
+
with torch.no_grad():
|
48 |
+
outputs = quantized_model(**inputs)
|
49 |
+
|
50 |
+
# Get predicted categories
|
51 |
+
probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten()
|
52 |
+
predictions = (probs > 0.5).astype(int)
|
53 |
+
|
54 |
+
# Category mapping (Example)
|
55 |
+
category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"]
|
56 |
+
predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1]
|
57 |
+
|
58 |
+
print(f"Message: {test_message}")
|
59 |
+
print(f"Predicted Categories: {predicted_labels}")
|
60 |
+
print(f"Confidence Scores: {probs}")
|
61 |
+
```
|
62 |
+
|
63 |
+
## **Performance Metrics**
|
64 |
+
|
65 |
+
- **Accuracy:** 0.85
|
66 |
+
- **F1 Score:** 0.83
|
67 |
+
|
68 |
+
## **Fine-Tuning Details**
|
69 |
+
|
70 |
+
### **Dataset**
|
71 |
+
|
72 |
+
The dataset is the **Disaster Response Messages Dataset**, which contains real-life messages from various disaster scenarios.
|
73 |
+
|
74 |
+
### **Training**
|
75 |
+
|
76 |
+
- Number of epochs: 3
|
77 |
+
- Batch size: 8
|
78 |
+
- Evaluation strategy: epoch
|
79 |
+
- Learning rate: 2e-5
|
80 |
+
|
81 |
+
### **Quantization**
|
82 |
+
|
83 |
+
Post-training quantization was applied using PyTorch’s built-in quantization framework, reducing model size and improving inference speed.
|
84 |
+
|
85 |
+
## **Repository Structure**
|
86 |
+
|
87 |
+
```
|
88 |
+
.
|
89 |
+
├── model/ # Contains the quantized model files
|
90 |
+
├── tokenizer_config/ # Tokenizer configuration and vocabulary files
|
91 |
+
├── model.safensors/ # Fine-tuned Model
|
92 |
+
├── README.md # Model documentation
|
93 |
+
```
|
94 |
+
|
95 |
+
## **Limitations**
|
96 |
+
|
97 |
+
- The model may not generalize well to unseen disaster types outside the training data.
|
98 |
+
- Minor accuracy degradation due to quantization.
|
99 |
+
|
100 |
+
## **Contributing**
|
101 |
+
|
102 |
+
Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
|
103 |
+
|
104 |
+
---
|