metadata
license: mit
tags:
- vision-transformer
- spectrogram-analysis
- lora
- pytorch
- regression
Vision Transformer (ViT) with LoRA for Spectrogram Regression
π§βπ» Curated by
Nooshin Bahador
π° Funded by
Canadian Neuroanalytics Scholars Program
π License
MIT
Model Description
This is a Vision Transformer (ViT) model fine-tuned using Low-Rank Adaptation (LoRA) for regression tasks on spectrogram data. The model predicts three key parameters of chirp signals:
- Chirp start time (s)
- Start frequency (Hz)
- End frequency (Hz)
π§ Fine-Tuning Details
- Framework: PyTorch
- Architecture: Pre-trained Vision Transformer (ViT)
- Adaptation Method: LoRA (Low-Rank Adaptation)
- Task: Regression on time-frequency representations
- Training Protocol: Automatic Mixed Precision (AMP), Early stopping, Learning Rate scheduling
- Output: Quantitative predictions + optional natural language descriptions
π¦ Resources
Trained Model
Spectrogram Dataset
PyTorch Implementation
Chirp Generator
π Citation
If you use this model in your research, please cite:
Bahador, N., & Lankarany, M. (2025). Chirp localization via fine-tuned transformer model: A proof-of-concept study. arXiv preprint arXiv:2503.22713. [PDF]