Vision Transformer (ViT) with LoRA for Spectrogram Regression

πŸ§‘β€πŸ’» Curated by

Nooshin Bahador

πŸ’° Funded by

Canadian Neuroanalytics Scholars Program

πŸ“œ License

MIT

Model Description

This is a Vision Transformer (ViT) model fine-tuned using Low-Rank Adaptation (LoRA) for regression tasks on spectrogram data. The model predicts three key parameters of chirp signals:

  1. Chirp start time (s)
  2. Start frequency (Hz)
  3. End frequency (Hz)

πŸ”§ Fine-Tuning Details

  • Framework: PyTorch
  • Architecture: Pre-trained Vision Transformer (ViT)
  • Adaptation Method: LoRA (Low-Rank Adaptation)
  • Task: Regression on time-frequency representations
  • Training Protocol: Automatic Mixed Precision (AMP), Early stopping, Learning Rate scheduling
  • Output: Quantitative predictions + optional natural language descriptions

πŸ“¦ Resources

Trained Model

HuggingFace Model Hub

Spectrogram Dataset

HuggingFace Dataset Hub

PyTorch Implementation

GitHub Repository

Chirp Generator

GitHub Package

πŸ“„ Citation

If you use this model in your research, please cite:

Bahador, N., & Lankarany, M. (2025). Chirp localization via fine-tuned transformer model: A proof-of-concept study. arXiv preprint arXiv:2503.22713. [PDF]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support