AI & ML interests

Tools for creating and exploring datasets

Recent Activity

prithivMLmods 
posted an update about 10 hours ago
view post
Post
640
Dropped the HeadshotX : a super-realistic headshot adapter for Qwen/Qwen-Image, an image generation model by Qwen. It is an advanced LoRA adaptation of the Qwen-Image model and an upgraded version of prithivMLmods/Qwen-Image-Studio-Realism, offering more precise portrait rendering with a strong focus on realism. The model was trained on diverse face types from across the world, labeled with florence2-en and caption-optimized using prithivMLmods/DeepCaption-VLA-7B. 11(types) × 5 different face types: Asian, Hispanic, Caucasian, Latina, Middle Eastern, etc.

⮞ Model🤗: prithivMLmods/Qwen-Image-HeadshotX

⮞ The Previous Adapter (LoRA): prithivMLmods/Qwen-Image-Studio-Realism

⮞ Collection: prithivMLmods/qwen-image-exp-lora-68a978fe11400bc3165b0c4d

.
.
.
To know more about it, visit the app page or the respective model page!!
  • 2 replies
·
prithivMLmods 
posted an update 1 day ago
view post
Post
2199
Comparing: DeepCaption-VLA-7B, built on Qwen2.5-VL-7B-Instruct, is tailored for image captioning and vision-language attribution, focusing on precise, descriptive captions of visual properties, object attributes, and scene details. In contrast, Qwen2.5-VL-7B-Abliterated-Caption-it is fine-tuned for abliterated captioning, generating highly detailed descriptions across diverse visual categories.

Models🤗
✦ DeepCaption-VLA-7B : prithivMLmods/DeepCaption-VLA-7B
✦ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it

Spaces⛵
➜ VisionScope-R2 : prithivMLmods/VisionScope-R2
➜ Qwen2.5-VL-Outpost : prithivMLmods/Qwen2.5-VL-Outpost

Collection🗞️
DeepCaption attr. : prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a
VL Abliterated-Caption : prithivMLmods/vl-abliterated-caption-68a0443b63182e97a15c47a3
Multimodal VLMs - Until July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

GitHub↗️
> DeepCaption-VLA-7B [4bit-notebook demo] : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb
> Qwen2.5-VL-3B-Abliterated-Caption-it(caption) : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Qwen2.5-VL-3B-Abliterated-Caption-it(caption)/Qwen2_5_VL_3B_Abliterated_Caption_it.ipynb

The community GPU grant was given by Hugging Face — special thanks to them. 🤗🚀

To know more about it, visit the app page or the respective model page!!
Tonic 
posted an update 3 days ago
view post
Post
266
🙋🏻‍♂️ Hey there folks ,

Just wanted to annouce 🏭SmolFactory : it's the quickest and best way to finetune SmolLM3 and GPT-OSS-20B on huggingface !

Basicaly it's an app you can run on huggingface by duplicating the space and running your training directly on huggingface GPUs .

It will help you basically select datasets and models, fine tune your model , make an experiment tracker you can use on your mobile phone , push all your model card and even automatically make a demo for you on huggingface so you can directly test it out when it's done !

check out the blog to learn more : https://huggingface.co/blog/Tonic/smolfactory

or just try the app directly :
Tonic/SmolFactory

you can vibe check the cool models I made :
French SmolLM3 : Tonic/Petite-LLM-3
Medical GPT-OSS : Tonic/med-gpt-oss-20b-demo

check out the model cards :
multilingual reasoner (gpt-oss) - Tonic/gpt-oss-20b-multilingual-reasoner
med-gpt-oss : Tonic/med-gpt-oss-20b
petite-elle-l-aime : Tonic/petite-elle-L-aime-3-sft

github repo if you like command line more than gradio : https://github.com/josephrp/smolfactory

drop some likes on these links it's really much appreciated !

feedback and PRs are welcome !
davanstrien 
posted an update 4 days ago
view post
Post
301
I fine-tuned a smol VLM to generate specialized art history metadata!

davanstrien/iconclass-vlm: Qwen2.5-VL-3B trained using SFT to generate ICONCLASS codes (think Dewey Decimal for art!)

Trained with TRL + HF Jobs - single UV script, no GPU needed!

Space to explore predictions on a test set: davanstrien/iconclass-predictions

Blog soon!
prithivMLmods 
posted an update 5 days ago
view post
Post
5356
FastVLMs by Apple are the talk of the week for edge device VLMs and also for consumer-grade VLMs on the Hub. They have some impressive demos available on the Hub for live captioning and inference tasks. Meanwhile, I’m still exploring one of the coolest edge-device multimodal releases—Liquid AI’s LFM2-VL (450M and 1.6B). I’ve also made a live camera video inference demo, which is capable of running on Colab’s free-tier T4 GPU.

🤗Live Captioning Notebooks:
➠ LiquidAI LFM2 VL 1.6B Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LiquidAI-LFM2-VL-Live-Cam/LiquidAI_LFM2_VL_1_6B_Live_Cam.ipynb

➠ LiquidAI LFM2 VL 450M Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LiquidAI-LFM2-VL-Live-Cam/LiquidAI_LFM2_VL_450M_Live_Cam.ipynb

✨I also made a demo for the FastVLM Live Captioning Notebook.
➠ FastVLM 0.5B Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Apple-FastVLM-0.5B-Live-Cam/apple_FastVLM_0_5B_live_cam.ipynb

↗️For more notebooks, kindly visit the following repositories.
➠ Multimodal Outpost Notebooks: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks

Feel free to fork, modify, and explore!
louisbrulenaudet 
posted an update 6 days ago
view post
Post
5764
Supercharge Apple’s Shortcuts using Cloudflare Workers and Gemini within minutes (and for free, up to 1,500 requests per day) ☁️✨

Hello everyone, last week, while experimenting for fun, I created an API that allows you to easily access AI models (in this case, Google's) from the Shortcut app in order to analyze data from my apps and make the most of it thanks to the generative capabilities of advanced models.

It costs me nothing, and I think it might be good to share it so that others can build on it.

In README.md, you will find everything you need to get started and put your own microservice into production, which you can call from the app’s HTTP request features.

You will simply be asked to have a free Cloudflare account and an API key obtained from Google's AI Studio.

Feel free to take a look and get back to me if you encounter any problems during deployment.

Here is the GitHub repo where you can find all the source code and run it on your own: https://github.com/louisbrulenaudet/genai-api
louisbrulenaudet 
posted an update 7 days ago
view post
Post
357
Although more and more code editors are aligning themselves with the AGENTS.md file standard, some still use specific nomenclatures that can make it difficult to maintain different configuration files when several people are working on the same project with different agents.

Bodyboard addresses this by generating canonical instructions for code helpers from a single AGENTS.md file, thereby streamlining the production of adapter outputs for Gemini CLI, Copilot, Cline, Claude, Rules, Windsurf, and OpenAI Codex integrations.

You just have to:
npm install -g bodyboard

Then run, at the root of your project:
bodyboard all

Link to npm: https://www.npmjs.com/package/bodyboard
Link to the GitHub repo: https://github.com/louisbrulenaudet/bodyboard

It's a very simple project, but it addresses certain issues I've encountered, so why not make it available to everyone...

If you have other ideas for adapters to create, feel free to open a PR on the GitHub repo.
prithivMLmods 
posted an update 9 days ago
view post
Post
3392
Introducing prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥

✦︎ Models: prithivMLmods/DeepCaption-VLA-7B, also includes prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.

✦︎ Try the demo here: prithivMLmods/VisionScope-R2

✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb

✦︎ Collection: prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a

.
.
.

To know more about it, visit the model card of the respective model. !!
  • 4 replies
·
prithivMLmods 
posted an update 11 days ago
view post
Post
1224
OpenGVLab's InternVL3.5 is a new family of open-source multimodal models that have advanced versatility, reasoning, and efficiency. I have created 𝐝𝐞𝐦𝐨 𝐧𝐨𝐭𝐞𝐛𝐨𝐨𝐤𝐬 for models ranging from 1B to 4B parameters, available in multiple versions (MPO, Instruct, Pre-trained) and in both "thinking" and "non-thinking" settings, with experimental compatibility for 𝐓𝐞𝐬𝐥𝐚 𝐓𝟒 GPUs.

➠InternVL3_5_2B_MPO_Thinking: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3.5-Thinking/1_InternVL3_5_2B_MPO_Thinking/1_InternVL3_5_2B_MPO_Thinking.ipynb
➠InternVL3_5_1B_Instruct_Thinking: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3.5-Thinking/2_InternVL3_5_1B_Instruct_Thinking/2_InternVL3_5_1B_Instruct_Thinking.ipynb

➠InternVL3_5-1B-MPO: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-MPO/InternVL3_5-1B-MPO/InternVL3_5_1B_MPO.ipynb
➠InternVL3_5-2B-MPO: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/tree/main/InternVL-3.5-Notebook/InternVL3_5-MPO/InternVL3_5-2B-MPO

➠InternVL3_5-1B-Instruct: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Instruct/InternVL3_5-1B-Instruct/InternVL3_5_1B_Instruct.ipynb
➠InternVL3_5-2B-Instruct: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Instruct/InternVL3_5-2B-Instruct/InternVL3_5_2B_Instruct.ipynb

➠InternVL3_5-1B-Pretrained: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Pretrained/InternVL3_5-1B-Pretrained/InternVL3_5_1B_Pretrained.ipynb
➠InternVL3_5-2B-Pretrained: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Pretrained/InternVL3_5-2B-Pretrained/InternVL3_5_2B_Pretrained.ipynb

no flash_attention
prithivMLmods 
posted an update 12 days ago
view post
Post
5148
OpenGVLab's InternVL3_5-2B-MPO [Mixed Preference Optimization (MPO)] is a compact vision-language model in the InternVL3.5 series. You can now experience it in the Tiny VLMs Lab, an app featuring 15+ multimodal VLMs ranging from 250M to 4B parameters. These models support tasks such as OCR, reasoning, single-shot answering with small models, and captioning (including ablated variants), across a broad range of visual categories. They are also capable of handling images with complex, sensitive, or nuanced content, while adapting to varying aspect ratios and resolutions.

✨ Space/App : prithivMLmods/Tiny-VLMs-Lab
🫙 Model : OpenGVLab/InternVL3_5-2B-MPO
↗️ Collection: OpenGVLab/internvl35-68ac87bd52ebe953485927fb
🗞️ Paper : https://arxiv.org/pdf/2508.18265
↗️ Multimodal Space Collection : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

To learn more, visit the relevant spaces, collections, and model cards.
  • 2 replies
·
prithivMLmods 
posted an update 13 days ago
view post
Post
449
Dropping new adapters for Qwen-Image, including Qwen-Image-Studio-Realism, Qwen-Image-Anime-LoRA, Qwen-Image-Sketch-Smudge, Qwen-Image-Synthetic-Face, and Qwen-Image-Fragmented-Portraiture, with various style intermix compatibilities. For more details, visit the model card.

⤷ Studio Realism : prithivMLmods/Qwen-Image-Studio-Realism
⤷ Image Anime LoRA : prithivMLmods/Qwen-Image-Anime-LoRA
⤷ Sketch Smudge : prithivMLmods/Qwen-Image-Sketch-Smudge
⤷ Synthetic Face : prithivMLmods/Qwen-Image-Synthetic-Face
⤷ Fragmented Portraiture : prithivMLmods/Qwen-Image-Fragmented-Portraiture

Try it here at
✦︎ Qwen-Image-LoRA-DLC : prithivMLmods/Qwen-Image-LoRA-DLC
✦︎ Qwen-Image-Diffusion : prithivMLmods/Qwen-Image-Diffusion

Collection
✦︎ Qwen-Image-Exp-LoRA : prithivMLmods/qwen-image-exp-lora-68a978fe11400bc3165b0c4d
✦︎ Image Gen Apps (Diffusion) - LastUpdated 08/18 : prithivMLmods/image-gen-apps-diffusion-lastupdated-08-18-68a2f4c5ef3e5e394eacc20a

.
.
.

To know more, visit the following spaces, collections, and model cards.
prithivMLmods 
posted an update 20 days ago
prithivMLmods 
posted an update 22 days ago
view post
Post
4685
Excited to introduce the Tiny VLMs Lab App for experiencing 15+ multimodal VLMs, ranging from a 250M parameter model to a 4B parameter model, for tasks like OCR, reasoning, small models for single-shot answering, and captioning (abliterated), across a broad range of visual categories including images with complex, sensitive, or nuanced content, while handling varying aspect ratios and resolutions.🧪

🤗 Space/App: prithivMLmods/Tiny-VLMs-Lab

✦︎ Also introducing prithivMLmods/Qwen2.5-VL-3B-Abliterated-Caption-it, tailored for Abliterated Captioning / Uncensored Image Captioning. This release comes as a lighter alternative to the existing Qwen2.5-VL-7B-Abliterated-Caption-it prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it model, making it usable on mid-range GPUs and even experimental on T4 GPUs.

✦︎ Collection: prithivMLmods/vl-abliterated-caption-68a0443b63182e97a15c47a3
✦︎ GitHub: https://github.com/PRITHIVSAKTHIUR/Tiny-VLMs-Lab
.
.
.
To know more about it, visit the app page or the respective model page!!
fdaudens 
posted an update 24 days ago
view post
Post
5825
Want to learn to build an AI Agent? I put together a cookbook for creating your own news research agent with OpenAI GPT-OSS:

- Searches headlines & specific sites
- Pulls full articles when you need depth
- Summarizes with clickable sources
- Runs in a simple Gradio chat UI
- No GPU, no local setup — just open-weight GPT-OSS models via Hugging Face

If you’ve been wanting to try agents but weren’t sure where to start, this is an end-to-end example you can fork, run, and adapt.

Full guide + code https://huggingface.co/blog/fdaudens/openai-gpt-oss-agent-inference-providers
  • 2 replies
·
prithivMLmods 
posted an update 25 days ago
view post
Post
3189
Try Liquid AI's all-new multimodal models: LFM2-VL-1.6B & LFM2-VL-450M! Demo with the Gradio UI and ReportLab support and both models are runnable on T4 GPU!

↗ LFM2-VL-1.6B-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-1.6B-LiquidAI/LFM2-VL-1.6B_ReportLab.ipynb

↗ LFM2-VL-450M-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-450M-LiquidAI/LFM2-VL-450M_ReportLab.ipynb

.
.
.
To know more about it, visit the multimodal outpost notebooks !!
  • 1 reply
·
fdaudens 
posted an update 26 days ago
view post
Post
509
What can OpenAI’s new open models do with the news? I built a News Agent to find out.

It can answer questions about the news in real time, and every answer comes with original source links so you can dive deeper.

Ask it things like:
- "What are the top news stories today?"
- "What's the latest on artificial intelligence?"
- Follow-up questions on specific stories

Runs with Hugging Face inference providers, letting you compare results from the OpenAI 20B and 120B models

So far, I’m quite impressed by the capabilities of even the smaller 20B model. Definitely not a perfect project, but curious to hear your thoughts!

fdaudens/gpt-oss-news-agent
  • 2 replies
·
fdaudens 
posted an update 27 days ago
view post
Post
3373
OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.

For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.

It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.

Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K

The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)

Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀
  • 1 reply
·
prithivMLmods 
posted an update 29 days ago
view post
Post
4386
On the verge of releasing Poseidon-Reasoning-5M, a dataset built to excel in general thought processes, mathematics, and science across a diverse mixture of domains, I’m also dropping the Gargantua-R1-Compact dataset, a collection of over six million high-quality reasoning QA pair traces. 🤗🚀

✦ Gargantua-R1-Compact : prithivMLmods/Gargantua-R1-Compact

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Gargantua-R1-Compact", split="train")

Additionally, I’m adding the mini version of Gargantua — the Gargantua-R1-Wee : prithivMLmods/Gargantua-R1-Wee

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Gargantua-R1-Wee", split="train")

The composition spans 73.93% core mathematical reasoning involving problems, proofs, and computational challenges, 12.11% across diverse scientific domains such as physics, chemistry, biology, and interdisciplinary topics, 11.35% in competitive coding covering algorithms and data structures, 1.37% in academic science focusing on research-level methodology, 0.95% in creative and analytical reasoning through logic puzzles and problem-solving tasks, 0.25% in specialized technical areas like MLOps, LLMs, diffusion models, and CUDA, and 0.06% involving data from graphs and charts converted into structured JSON formats. Designed with both rich contextual depth and formal structural clarity, Gargantua-R1-Compact is an optimal resource for advancing research in symbolic reasoning, interpretability, and high-precision question answering in mathematical domains.

✦ Collection : prithivMLmods/gargantua-r1-mod-6896bfd7834e82b89ad2b38b


To know more about it, visit the dataset card of the respective dataset. !!