Vision Transformer (ViT) with LoRA for Spectrogram Regression
π§βπ» Curated by
Nooshin Bahador
π° Funded by
Canadian Neuroanalytics Scholars Program
π License
MIT
Model Description
This is a Vision Transformer (ViT) model fine-tuned using Low-Rank Adaptation (LoRA) for regression tasks on spectrogram data. The model predicts three key parameters of chirp signals:
- Chirp start time (s)
- Start frequency (Hz)
- End frequency (Hz)
π§ Fine-Tuning Details
- Framework: PyTorch
- Architecture: Pre-trained Vision Transformer (ViT)
- Adaptation Method: LoRA (Low-Rank Adaptation)
- Task: Regression on time-frequency representations
- Training Protocol: Automatic Mixed Precision (AMP), Early stopping, Learning Rate scheduling
- Output: Quantitative predictions + optional natural language descriptions
π¦ Resources
Trained Model
Spectrogram Dataset
PyTorch Implementation
Chirp Generator
π Citation
If you use this model in your research, please cite:
Bahador, N., & Lankarany, M. (2025). Chirp localization via fine-tuned transformer model: A proof-of-concept study. arXiv preprint arXiv:2503.22713. [PDF]
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The model has no library tag.