YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification

This repository hosts a quantized version of the BERT model, fine-tuned for Disaster SOS Message Classification. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy.

Model Details

  • Model Architecture: BERT Base Uncased
  • Task: Disaster SOS Message Classification
  • Dataset: Disaster Response Messages Dataset
  • Quantization: Float16
  • Fine-tuning Framework: Hugging Face Transformers

Usage

Installation

pip install transformers torch

Loading the Model

from transformers import BertForSequenceClassification, BertTokenizer
import torch

# Load quantized model
quantized_model_path = "/kaggle/working/bert_finetuned_fp16"
quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path)
quantized_model.eval()  # Set to evaluation mode
quantized_model.half()  # Convert model to FP16

# Load tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Define a test SOS message
test_message = "There is a massive earthquake, and people need help immediately!"

# Tokenize input
inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Ensure input tensors are in correct dtype
inputs["input_ids"] = inputs["input_ids"].long()
inputs["attention_mask"] = inputs["attention_mask"].long()

# Make prediction
with torch.no_grad():
    outputs = quantized_model(**inputs)

# Get predicted categories
probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten()
predictions = (probs > 0.5).astype(int)

# Category mapping (Example)
category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"]
predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1]

print(f"Message: {test_message}")
print(f"Predicted Categories: {predicted_labels}")
print(f"Confidence Scores: {probs}")

Performance Metrics

  • Accuracy: 0.85
  • F1 Score: 0.83

Fine-Tuning Details

Dataset

The dataset is the Disaster Response Messages Dataset, which contains real-life messages from various disaster scenarios.

Training

  • Number of epochs: 3
  • Batch size: 8
  • Evaluation strategy: epoch
  • Learning rate: 2e-5

Quantization

Post-training quantization was applied using PyTorch’s built-in quantization framework, reducing model size and improving inference speed.

Repository Structure

.
β”œβ”€β”€ model/               # Contains the quantized model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer configuration and vocabulary files
β”œβ”€β”€ model.safensors/     # Fine-tuned Model
β”œβ”€β”€ README.md            # Model documentation

Limitations

  • The model may not generalize well to unseen disaster types outside the training data.
  • Minor accuracy degradation due to quantization.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.


Downloads last month
56
Safetensors
Model size
110M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support