YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification

This repository hosts a quantized version of the BERT model, fine-tuned for Disaster SOS Message Classification. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy.

Model Details

  • Model Architecture: BERT Base Uncased
  • Task: Disaster SOS Message Classification
  • Dataset: Disaster Response Messages Dataset
  • Quantization: Float16
  • Fine-tuning Framework: Hugging Face Transformers

Usage

Installation

pip install transformers torch

Loading the Model

from transformers import BertForSequenceClassification, BertTokenizer
import torch

# Load quantized model
quantized_model_path = "/kaggle/working/bert_finetuned_fp16"
quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path)
quantized_model.eval()  # Set to evaluation mode
quantized_model.half()  # Convert model to FP16

# Load tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Define a test SOS message
test_message = "There is a massive earthquake, and people need help immediately!"

# Tokenize input
inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Ensure input tensors are in correct dtype
inputs["input_ids"] = inputs["input_ids"].long()
inputs["attention_mask"] = inputs["attention_mask"].long()

# Make prediction
with torch.no_grad():
    outputs = quantized_model(**inputs)

# Get predicted categories
probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten()
predictions = (probs > 0.5).astype(int)

# Category mapping (Example)
category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"]
predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1]

print(f"Message: {test_message}")
print(f"Predicted Categories: {predicted_labels}")
print(f"Confidence Scores: {probs}")

Performance Metrics

  • Accuracy: 0.85
  • F1 Score: 0.83

Fine-Tuning Details

Dataset

The dataset is the Disaster Response Messages Dataset, which contains real-life messages from various disaster scenarios.

Training

  • Number of epochs: 3
  • Batch size: 8
  • Evaluation strategy: epoch
  • Learning rate: 2e-5

Quantization

Post-training quantization was applied using PyTorch’s built-in quantization framework, reducing model size and improving inference speed.

Repository Structure

.
β”œβ”€β”€ model/               # Contains the quantized model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer configuration and vocabulary files
β”œβ”€β”€ model.safensors/     # Fine-tuned Model
β”œβ”€β”€ README.md            # Model documentation

Limitations

  • The model may not generalize well to unseen disaster types outside the training data.
  • Minor accuracy degradation due to quantization.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.


Downloads last month
7
Safetensors
Model size
110M params
Tensor type
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support