developerPushkal commited on
Commit
f5381b5
·
verified ·
1 Parent(s): 871a44f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### **BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification**
2
+
3
+ This repository hosts a quantized version of the BERT model, fine-tuned for **Disaster SOS Message Classification**. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy.
4
+
5
+ ## **Model Details**
6
+
7
+ - **Model Architecture:** BERT Base Uncased
8
+ - **Task:** Disaster SOS Message Classification
9
+ - **Dataset:** Disaster Response Messages Dataset
10
+ - **Quantization:** Float16
11
+ - **Fine-tuning Framework:** Hugging Face Transformers
12
+
13
+ ## **Usage**
14
+
15
+ ### **Installation**
16
+
17
+ ```sh
18
+ pip install transformers torch
19
+ ```
20
+
21
+ ### **Loading the Model**
22
+
23
+ ```python
24
+ from transformers import BertForSequenceClassification, BertTokenizer
25
+ import torch
26
+
27
+ # Load quantized model
28
+ quantized_model_path = "/kaggle/working/bert_finetuned_fp16"
29
+ quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path)
30
+ quantized_model.eval() # Set to evaluation mode
31
+ quantized_model.half() # Convert model to FP16
32
+
33
+ # Load tokenizer
34
+ tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
35
+
36
+ # Define a test SOS message
37
+ test_message = "There is a massive earthquake, and people need help immediately!"
38
+
39
+ # Tokenize input
40
+ inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128)
41
+
42
+ # Ensure input tensors are in correct dtype
43
+ inputs["input_ids"] = inputs["input_ids"].long()
44
+ inputs["attention_mask"] = inputs["attention_mask"].long()
45
+
46
+ # Make prediction
47
+ with torch.no_grad():
48
+ outputs = quantized_model(**inputs)
49
+
50
+ # Get predicted categories
51
+ probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten()
52
+ predictions = (probs > 0.5).astype(int)
53
+
54
+ # Category mapping (Example)
55
+ category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"]
56
+ predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1]
57
+
58
+ print(f"Message: {test_message}")
59
+ print(f"Predicted Categories: {predicted_labels}")
60
+ print(f"Confidence Scores: {probs}")
61
+ ```
62
+
63
+ ## **Performance Metrics**
64
+
65
+ - **Accuracy:** 0.85
66
+ - **F1 Score:** 0.83
67
+
68
+ ## **Fine-Tuning Details**
69
+
70
+ ### **Dataset**
71
+
72
+ The dataset is the **Disaster Response Messages Dataset**, which contains real-life messages from various disaster scenarios.
73
+
74
+ ### **Training**
75
+
76
+ - Number of epochs: 3
77
+ - Batch size: 8
78
+ - Evaluation strategy: epoch
79
+ - Learning rate: 2e-5
80
+
81
+ ### **Quantization**
82
+
83
+ Post-training quantization was applied using PyTorch’s built-in quantization framework, reducing model size and improving inference speed.
84
+
85
+ ## **Repository Structure**
86
+
87
+ ```
88
+ .
89
+ ├── model/ # Contains the quantized model files
90
+ ├── tokenizer_config/ # Tokenizer configuration and vocabulary files
91
+ ├── model.safensors/ # Fine-tuned Model
92
+ ├── README.md # Model documentation
93
+ ```
94
+
95
+ ## **Limitations**
96
+
97
+ - The model may not generalize well to unseen disaster types outside the training data.
98
+ - Minor accuracy degradation due to quantization.
99
+
100
+ ## **Contributing**
101
+
102
+ Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
103
+
104
+ ---