2 2 6

Sagar pallai

sagar007

AI & ML interests

LLM AND STABLE DIFFUSION

Recent Activity

new activity 27 days ago

sagar007/multigemma:🚨🚨🚨 License Violation Alert: Illegally Re-Licensing Google's Gemma Model as "Open Source"

replied to their post 27 days ago

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch

replied to their post 28 days ago

View all activity

Organizations

Posts 7

Post

4146

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP!

Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!

🔧 What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency

📊 Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)

📈 sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding

🔗 **Try it yourself:**
- 🤗 Model: sagar007/multigemma
- 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m

Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!

Would love to hear your feedback! 🙏

#multimodal #gemma #clip #llava #vision-language #pytorch

Post

3749

🚀 Just built a Perplexity-inspired AI search assistant using Gradio, DeepSeek, and DuckDuckGo!
Ask it anything, and it’ll:

Scour the web for answers 📚

Cite sources like a pro 🔗

Even talk back with TTS (thanks, Kokoro!) 🎙️

Ask it anything, and it’ll:

Scour the web for answers 📚

Check it out → sagar007/DeepSeekR1_Search

View all Posts