### **BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification** This repository hosts a quantized version of the BERT model, fine-tuned for **Disaster SOS Message Classification**. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy. ## **Model Details** - **Model Architecture:** BERT Base Uncased - **Task:** Disaster SOS Message Classification - **Dataset:** Disaster Response Messages Dataset - **Quantization:** Float16 - **Fine-tuning Framework:** Hugging Face Transformers ## **Usage** ### **Installation** ```sh pip install transformers torch ``` ### **Loading the Model** ```python from transformers import BertForSequenceClassification, BertTokenizer import torch # Load quantized model quantized_model_path = "/kaggle/working/bert_finetuned_fp16" quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path) quantized_model.eval() # Set to evaluation mode quantized_model.half() # Convert model to FP16 # Load tokenizer tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") # Define a test SOS message test_message = "There is a massive earthquake, and people need help immediately!" # Tokenize input inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128) # Ensure input tensors are in correct dtype inputs["input_ids"] = inputs["input_ids"].long() inputs["attention_mask"] = inputs["attention_mask"].long() # Make prediction with torch.no_grad(): outputs = quantized_model(**inputs) # Get predicted categories probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten() predictions = (probs > 0.5).astype(int) # Category mapping (Example) category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"] predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1] print(f"Message: {test_message}") print(f"Predicted Categories: {predicted_labels}") print(f"Confidence Scores: {probs}") ``` ## **Performance Metrics** - **Accuracy:** 0.85 - **F1 Score:** 0.83 ## **Fine-Tuning Details** ### **Dataset** The dataset is the **Disaster Response Messages Dataset**, which contains real-life messages from various disaster scenarios. ### **Training** - Number of epochs: 3 - Batch size: 8 - Evaluation strategy: epoch - Learning rate: 2e-5 ### **Quantization** Post-training quantization was applied using PyTorch’s built-in quantization framework, reducing model size and improving inference speed. ## **Repository Structure** ``` . ├── model/ # Contains the quantized model files ├── tokenizer_config/ # Tokenizer configuration and vocabulary files ├── model.safensors/ # Fine-tuned Model ├── README.md # Model documentation ``` ## **Limitations** - The model may not generalize well to unseen disaster types outside the training data. - Minor accuracy degradation due to quantization. ## **Contributing** Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements. ---