Electric Vehicles Classifier (Distilbert)

This model classifies content related to electric vehicles on climate change subreddits.

Model Details

  • Model Type: Distilbert
  • Task: Multilabel text classification
  • Sector: Electric Vehicles
  • Base Model: Distilbert base uncased
  • Labels: 7
  • Training Data: Sample from 1000 GPT 4o-mini-labeled Reddit posts from climate subreddits (2010-2023)

Labels

The model predicts 7 labels simultaneously:

  1. Alternative Modes: Advocates bikes, transit, e-scooters, trains instead of private EVs.
  2. Charging Infrastructure: Talks about availability, speed, or reliability of public or home chargers.
  3. Environmental Benefit: Claims EVs reduce emissions or pollution vs. gasoline cars.
  4. Grid Impact And Energy Mix: Links EV charging to grid capacity, blackout fears, renewable share of electricity.
  5. Mineral Supply Chain: Concerns over lithium, cobalt, nickel, rare-earth mining or shortages for batteries.
  6. Policy And Mandates: References government regulations, bans on ICE sales, fleet targets or central-planning critiques.
  7. Purchase Price: Discusses up-front sticker price, MSRP, subsidies, or tax credits for buying an EV.

Note: Label order in predictions matches the order above.

Usage

import torch, sys, os, tempfile
from transformers import DistilBertTokenizer
from huggingface_hub import snapshot_download

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def print_sorted_label_scores(label_scores):
    # Sort label_scores dict by score descending
    sorted_items = sorted(label_scores.items(), key=lambda x: x[1], reverse=True)
    for label, score in sorted_items:
        print(f"  {label}: {score:.6f}")

# Model link and examples for this specific model
model_link = 'sanchow/electric_vehicles-distilbert-classifier'
examples = [
    "Switching to electric cars can cut down on smog and carbon output."
]

print(f"\n{'='*60}")
print("MODEL: ELECTRIC VEHICLES SECTOR")
print(f"{'='*60}")

print(f"Downloading model: {model_link}")
with tempfile.TemporaryDirectory() as temp_dir:
    snapshot_download(
        repo_id=model_link,
        local_dir=temp_dir,
        local_dir_use_symlinks=False
    )
    model_class_path = os.path.join(temp_dir, 'model_class.py')
    if not os.path.exists(model_class_path):
        print(f"model_class.py not found in downloaded files")
        print(f"   Available files: {os.listdir(temp_dir)}")
    else:
        sys.path.insert(0, temp_dir)
        from model_class import MultilabelClassifier
        tokenizer = DistilBertTokenizer.from_pretrained(temp_dir)
        checkpoint = torch.load(os.path.join(temp_dir, 'model.pt'), map_location='cpu', weights_only=False)
        model = MultilabelClassifier(checkpoint['model_name'], len(checkpoint['label_names']))
        model.load_state_dict(checkpoint['model_state_dict'])
        model.to(device)
        model.eval()
        print("Model loaded successfully")
        print(f"   Labels: {checkpoint['label_names']}")
        print("\nElectric Vehicles classifier results:\n")
        for i, test_text in enumerate(examples):
            inputs = tokenizer(
                test_text, 
                return_tensors="pt", 
                truncation=True, 
                max_length=512,
                padding=True
            ).to(device)
            with torch.no_grad():
                outputs = model(**inputs)
                predictions = outputs.cpu().numpy() if isinstance(outputs, (tuple, list)) else outputs.cpu().numpy()
            label_scores = {label: float(score) for label, score in zip(checkpoint['label_names'], predictions[0])}
            print(f"Example {i+1}: '{test_text}'")
            print("Predictions (all label scores, highest first):")
            print_sorted_label_scores(label_scores)
            print("-" * 40)

Performance

Best model performance:

  • Micro Jaccard: 0.4596
  • Macro Jaccard: 0.5701
  • F1 Score: 0.8772
  • Accuracy: 0.8772

Dataset: ~900 GPT-labeled samples per sector (600 train, 150 validation, 150 test)

Optimal Thresholds

optimal_thresholds = {'Alternative Modes': 0.28427787391225384, 'Charging Infrastructure': 0.3619448731592626, 'Environmental Benefit': 0.4029443119613918, 'Grid Impact And Energy Mix': 0.29907076386497516, 'Mineral Supply Chain': 0.2987419331439881, 'Policy And Mandates': 0.36899998622725905, 'Purchase Price': 0.3463644004166977}
for label, score in zip(label_names, predictions[0]):
    threshold = optimal_thresholds.get(label, 0.5)
    if score > threshold:
        print(f"{label}: {score:.3f}")

Training

Trained on GPT-labeled Reddit data:

  1. Data collection from climate subreddits
  2. keyword based filtering for sector-specific content
  3. GPT labeling for multilabel classification
  4. 80/10/10 train/validation/test split
  5. Fine-tuning with threshold optimization

Citation

If you use this model in your research, please cite:

@misc{electric_vehicles_distilbert_classifier,
  title={Electric Vehicles Classifier for Climate Change Analysis},
  author={Sandeep Chowdhary},
  year={2025},
  publisher={Hugging Face},
  journal={Hugging Face Hub},
  howpublished={\url{https://huggingface.co/echoboi/electric_vehicles-distilbert-classifier}},
}

Limitations

  • Trained on data from specific climate change subreddits and limited to English content
  • Performance depends on GPT-generated labels
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support