Vision Transformer (ViT) with LoRA for Spectrogram Regression

🧑‍💻 Curated by

Nooshin Bahador

💰 Funded by

Canadian Neuroanalytics Scholars Program

📜 License

MIT

Model Description

This is a Vision Transformer (ViT) model fine-tuned using Low-Rank Adaptation (LoRA) for regression tasks on spectrogram data. The model predicts three key parameters of chirp signals:

Chirp start time (s)
Start frequency (Hz)
End frequency (Hz)

🔧 Fine-Tuning Details

Framework: PyTorch
Architecture: Pre-trained Vision Transformer (ViT)
Adaptation Method: LoRA (Low-Rank Adaptation)
Task: Regression on time-frequency representations
Training Protocol: Automatic Mixed Precision (AMP), Early stopping, Learning Rate scheduling
Output: Quantitative predictions + optional natural language descriptions

📦 Resources

📄 Citation

If you use this model in your research, please cite:

Bahador, N., & Lankarany, M. (2025). Chirp localization via fine-tuned transformer model: A proof-of-concept study. arXiv preprint arXiv:2503.22713. [PDF]

Vision Transformer (ViT) with LoRA for Spectrogram Regression

🧑‍💻 Curated by

💰 Funded by

📜 License

Model Description

🔧 Fine-Tuning Details

📦 Resources

Trained Model

Spectrogram Dataset

PyTorch Implementation

Chirp Generator

📄 Citation