nubahador's picture
Update README.md
9d7550a verified
metadata
license: mit
tags:
  - vision-transformer
  - spectrogram-analysis
  - lora
  - pytorch
  - regression

Vision Transformer (ViT) with LoRA for Spectrogram Regression

πŸ§‘β€πŸ’» Curated by

Nooshin Bahador

πŸ’° Funded by

Canadian Neuroanalytics Scholars Program

πŸ“œ License

MIT

Model Description

This is a Vision Transformer (ViT) model fine-tuned using Low-Rank Adaptation (LoRA) for regression tasks on spectrogram data. The model predicts three key parameters of chirp signals:

  1. Chirp start time (s)
  2. Start frequency (Hz)
  3. End frequency (Hz)

πŸ”§ Fine-Tuning Details

  • Framework: PyTorch
  • Architecture: Pre-trained Vision Transformer (ViT)
  • Adaptation Method: LoRA (Low-Rank Adaptation)
  • Task: Regression on time-frequency representations
  • Training Protocol: Automatic Mixed Precision (AMP), Early stopping, Learning Rate scheduling
  • Output: Quantitative predictions + optional natural language descriptions

πŸ“¦ Resources

Trained Model

HuggingFace Model Hub

Spectrogram Dataset

HuggingFace Dataset Hub

PyTorch Implementation

GitHub Repository

Chirp Generator

GitHub Package

πŸ“„ Citation

If you use this model in your research, please cite:

Bahador, N., & Lankarany, M. (2025). Chirp localization via fine-tuned transformer model: A proof-of-concept study. arXiv preprint arXiv:2503.22713. [PDF]