File size: 4,301 Bytes

3bfc74f
 
0e0f2ac
 
 
 
 
 
3bfc74f
 
0e0f2ac
 
3bfc74f
 
 
 
 
 
 
 
 
 
 
 
 
 
e545d00
0e0f2ac
e545d00
0e0f2ac
3de0a9d
 
 
0e0f2ac
9d7550a
 
 
 
 
 
 
 
 
 
 
 
 
3bfc74f
9d7550a
 
3bfc74f
 
9d7550a
 
3bfc74f
 
9d7550a
 
 
 
 
 
 
 
 
 
3bfc74f
 
 
 
9d7550a
3bfc74f
 
9d7550a
3bfc74f

---
license: mit
tags:
- vision-transformer
- spectrogram-analysis
- lora
- pytorch
- regression
---

# Vision Transformer (ViT) with LoRA for Spectrogram Regression

<div style="display: flex; flex-wrap: wrap; gap: 15px; margin-top: 15px;">
    <div style="flex: 1; min-width: 200px; background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
        <h4 style="margin-top: 0; color: #5f6368;">🧑‍💻 Curated by</h4>
        <p>Nooshin Bahador</p>
    </div>
    <div style="flex: 1; min-width: 200px; background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
        <h4 style="margin-top: 0; color: #5f6368;">💰 Funded by</h4>
        <p>Canadian Neuroanalytics Scholars Program</p>
    </div>
    <div style="flex: 1; min-width: 200px; background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
        <h4 style="margin-top: 0; color: #5f6368;">📜 License</h4>
        <p>MIT</p>
    </div>
</div>

## Model Description

This is a Vision Transformer (ViT) model fine-tuned using Low-Rank Adaptation (LoRA) for regression tasks on spectrogram data. The model predicts three key parameters of chirp signals:
1. Chirp start time (s)
2. Start frequency (Hz)
3. End frequency (Hz)

<div style="background: #f8f9fa; border-radius: 8px; padding: 20px; margin-bottom: 20px; border-left: 4px solid #4285f4;">
<h2 style="margin-top: 0;">🔧 Fine-Tuning Details</h2>
<div style="background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
    <ul>
        <li><strong>Framework:</strong> PyTorch</li>
        <li><strong>Architecture:</strong> Pre-trained Vision Transformer (ViT)</li>
        <li><strong>Adaptation Method:</strong> LoRA (Low-Rank Adaptation)</li>
        <li><strong>Task:</strong> Regression on time-frequency representations</li>
        <li><strong>Training Protocol:</strong> Automatic Mixed Precision (AMP), Early stopping, Learning Rate scheduling</li>
        <li><strong>Output:</strong> Quantitative predictions + optional natural language descriptions</li>
    </ul>
</div>
</div>

<div style="background: #f8f9fa; border-radius: 8px; padding: 20px; margin-bottom: 20px; border-left: 4px solid #34a853;">
<h2 style="margin-top: 0;">📦 Resources</h2>
<div style="display: flex; flex-wrap: wrap; gap: 15px;">
    <div style="flex: 1; min-width: 250px; background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
        <h4 style="margin-top: 0;">Trained Model</h4>
        <p><a href="https://huggingface.co/nubahador/Fine_Tuned_Transformer_Model_for_Chirp_Localization/tree/main">HuggingFace Model Hub</a></p>
    </div>
    <div style="flex: 1; min-width: 250px; background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
        <h4 style="margin-top: 0;">Spectrogram Dataset</h4>
        <p><a href="https://huggingface.co/datasets/nubahador/ChirpLoc100K___A_Synthetic_Spectrogram_Dataset_for_Chirp_Localization/tree/main">HuggingFace Dataset Hub</a></p>
    </div>
    <div style="flex: 1; min-width: 250px; background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
        <h4 style="margin-top: 0;">PyTorch Implementation</h4>
        <p><a href="https://github.com/nbahador/Train_Spectrogram_Transformer">GitHub Repository</a></p>
    </div>
    <div style="flex: 1; min-width: 250px; background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
        <h4 style="margin-top: 0;">Chirp Generator</h4>
        <p><a href="https://github.com/nbahador/chirp_spectrogram_generator">GitHub Package</a></p>
    </div>
</div>
</div>

<div style="background: #f8f9fa; border-radius: 8px; padding: 20px; border-left: 4px solid #ea4335;">
<h2 style="margin-top: 0;">📄 Citation</h2>
<div style="background: white; border-radius: 8px; padding: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
    <p>If you use this model in your research, please cite:</p>
    <p>Bahador, N., & Lankarany, M. (2025). Chirp localization via fine-tuned transformer model: A proof-of-concept study. arXiv preprint arXiv:2503.22713. <a href="https://arxiv.org/pdf/2503.22713">[PDF]</a></p>
</div>
</div>