AI & ML interests

Small LMs for small computers

Sri-Vigneshwar-DJย 
posted an update about 19 hours ago
view post
Post
134
Do you think domain-specific embedding fine-tuners are needed?
I've been working with embeddings for marketing use cases and noticed something: most embeddings don't get marketing concepts very well. They're trained in general-purpose ways.
The Issue I'm Seeing
When I search marketing content with general embeddings:

"organic growth" returns farming articles
"conversion funnel" matches industrial equipment
"brand lift" doesn't connect to campaign effectiveness
Marketing jargon like CAC, ROAS, CTR aren't properly understood

My Question
Do you think domain-specific embeddings are needed for marketing?
Some thoughts:

Marketing has its own vocabulary and concept relationships
General models trained on Wikipedia/web crawl miss these nuances
But is fine-tuning worth the effort vs just using more retrieval tricks?

Quick Example
I fine-tuned all-mpnet-base-v2 on ~1000 marketing concept pairs and saw 15-20% better retrieval accuracy. But I'm curious:

Has anyone else tried this for marketing or other domains?
When do you think domain-specific embeddings are actually necessary vs overkill?
Are there better approaches I'm missing?

https://huggingface.co/blog/Sri-Vigneshwar-DJ/why-your-marketing-rag-system-needs-domain-specifi
Nymboย 
posted an update 3 days ago
view post
Post
270
I have a few Sora-2 invites - 15509N
Sri-Vigneshwar-DJย 
posted an update 3 days ago
view post
Post
4331
๐Ÿš€ Exciting News! We've released a Performance Marketing Expert Dataset from Hawky.ai [www.hawky.ai] Hawky-ai


This dataset empowers AI models with cutting-edge strategies for Meta, Google Ads, and TikTok campaigns. It includes:
1. Multi-platform strategies for e-commerce, DTC, B2B, and more
2. Creative optimization and audience targeting insights
3. ROI-driven recommendations based on 2025 best practices

Sri-Vigneshwar-DJ/Performance-Marketing-Data
prithivMLmodsย 
posted an update 4 days ago
view post
Post
4385
Try the Hugging Face Space demo for Logics-MLLM/Logics-Parsing, the latest multimodal VLM from the Logics Team at Alibaba Group. It enables end-to-end document parsing with precise content extraction in markdown format, and it also generates a clean HTML representation of the document while preserving its logical structure. ๐Ÿค—๐Ÿ”ฅ

Additionally, Iโ€™ve integrated one of my recent works โ€” prithivMLmods/Gliese-OCR-7B-Post1.0 โ€” which also excels at document comprehension.

โญ Space / App : prithivMLmods/VLM-Parsing
๐Ÿ“„ Technical Report by the Logics Team, Alibaba Group : Logics-Parsing Technical Report (2509.19760)
๐Ÿ–– MM: VLM-Parsing: prithivMLmods/mm-vlm-parsing-68e33e52bfb9ae60b50602dc
โšก Collections : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Other Pages:

โž” Multimodal VLMs - July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
โž” Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-aug25-68a56aac39fe8084f3c168bd
โž” VL caption โ€” < Sep 15 โ€™25 : prithivMLmods/vl-caption-sep-15-25-68c7f6d737985c63c13e2391

.
.
.
To know more about it, visit the app page or the respective model page!!
Sri-Vigneshwar-DJย 
posted an update 6 days ago
view post
Post
3280
๐Ÿš€ Qwen3-Omni for Marketing: A Game-Changer

Just wanted to share something exciting I've been exploringโ€”Qwen3-Omni and how it's transforming marketing workflows.

What makes it special? At Hawky.ai we are started experimenting with Qwen3 recently for Analysis and Optimization.

Unlike traditional tools that look at text, images, or audio separately, Qwen3-Omni analyzes everything together. It handles 119 languages, processes 40-minute audio sequences, and understands both images and videosโ€”all at once.

The cool part? It's 2-3x faster than similar models thanks to its MoE architecture.

Real applications I'm seeing:
Ad Analysis: It scores video ads by combining visual elements, audio tone, and textโ€”giving 25% better CTR predictions than single-mode tools.
Campaign Localization: Drop in one ad, get 10 localized versions with native voiceovers in under a minute. Perfect for testing across markets.

Market Research: Feed it competitor content, podcasts, or UGC videos. It extracts actionable insights like "3-second hooks boost retention by 15%" and saves about 70% of analysis time.

Quality Checks: Automatically catches lip-sync errors and audio-visual mismatches.

Full technical breakdown: https://huggingface.co/blog/Sri-Vigneshwar-DJ/hawky-aiqwen3-omni-advanced-architecture-and-marke

Has anyone else been experimenting with multimodal models for marketing? Would love to hear what you're building!

#MultimodalAI #MarTech #OpenSource
prithivMLmodsย 
posted an update 7 days ago
view post
Post
1144
Try Banana Zoom an advanced image enhancement web app that lets users select regions of an image for AI-powered upscaling and detail refinement. Using Googleโ€™s (nano banana), it analyzes selections, generates context-aware enhancements, and produces high-resolution outputs. Simply drag-and-drop or upload images, make precise or fixed-size selections, and watch improvements in real-time with smooth zoom and pixel-dissolve effects.

Space / App: prithivMLmods/Banana-Zoom
Collection: https://huggingface.co/collections/prithivMLmods/image-gen-apps-diffusion-lastupdated-09-23-68a2f4c5ef3e5e394eacc20a
GitHub: https://github.com/prithivsakthiur/banana-zoom

Your API will be automatically destroyed once you refresh the app or exit it, so each user's API will be cycled in this way.
prithivMLmodsย 
posted an update 13 days ago
view post
Post
4368
Photo-Mate-i2i โ€“ a space for experimenting with adapters for image manipulation using Kontext adapters, including Photo-Restore-i2i, PhotoCleanser-i2i, Polaroid-Warm-i2i, Yarn-Photo-i2i, Monochrome-Pencil, and more. Try out the demo, and to learn more, visit the app page or the respective model pages!

โšกDemo: prithivMLmods/Photo-Mate-i2i
โš™๏ธHow to Use: prithivMLmods/Photo-Mate-i2i#2
๐Ÿ‘จโ€๐Ÿ”งi2i-Kontext(Experimental LoRAs): prithivMLmods/i2i-kontext-exp-68ce573b5c0623476b636ec7

Tonicย 
posted an update 14 days ago
KnutJaegersbergย 
posted an update 14 days ago
view post
Post
337
The Formative Mind: Theories of Consciousness as Practice

Instead of treating consciousness as a passive byproduct of a powerful unconscious engine, think of it as the engine itself: a process that builds rich representations (self-organizing), predicts and models its own processing (metarepresentation), and thereby brings an agent and its world into being (individuation). A brief synthesis.


https://huggingface.co/blog/KnutJaegersberg/formative-mind
prithivMLmodsย 
posted an update 15 days ago
view post
Post
5161
Dropping some experimental adapters for FLUX.1-Kontext-dev, including Photo-Restore-i2i, PhotoCleanser-i2i, Polaroid-Warm-i2i, Yarn-Photo-i2i, and Monochrome-Pencil. These were trained under various settings with minimal image pairs to achieve optimal results. The dataset result sets end pairs were synthesized using Gemini-2.5-Flash-Image-Preview and others.๐Ÿค—โœจ

prithivMLmods/PhotoCleanser-i2i: Remove objects while preserving the rest of the image.
prithivMLmods/Photo-Restore-i2i: Restore old photos into moderately colorized, detailed images.
prithivMLmods/Polaroid-Warm-i2i: Seamless vintage Polaroid-style images with warm, faded tones.
prithivMLmods/Yarn-Photo-i2i: Convert images into yarn-stitched artwork while retaining key details.
prithivMLmods/Monochrome-Pencil: Turn images into monochrome pencil sketches while keeping original features.

โœจNote: All the above models share the same auto-labeling multimodal VLM captioning model, prithivMLmods/DeepCaption-VLA-7B, which is used for refining edit instructions and accurately understanding attributions for the generations.

โœจCollection: prithivMLmods/i2i-kontext-exp-68ce573b5c0623476b636ec7

.
.
.
To know more about it, visit the app page or the respective model page!!
Nymboย 
posted an update 19 days ago
view post
Post
880
There's now a custom Deep_Research tool in my Nymbo/Tools MCP server! TL;DR: The agent using the tools writes a summary of your requests and up to five DuckDuckGo searches (up to 50 results). Each of the webpages found in the searches are then fetched and given to our researcher (Qwen3-235B-A22B-Thinking-2507). The researcher sees the summary, searched queries, and fetched links, then writes a thorough research report. The agent using the tool provides the user with a summary of the report and a link to download research_report.txt. The researcher's instructions are similar to some leaked Perplexity sys prompts.

# Deep_Research Tool

It accomplishes everything in under a minute so it doesn't hit MCP's 60 second timeout, mostly thanks to Cerebras. The only thing required to make this work is a HF_READ_TOKEN for inference.

The Deep_Research tool could certainly be improved. It still needs some sort of mechanism for sorting URLs based on importance (I've got some ideas but I don't want it to be the responsibility of the agent using the tool). I'll probably add a second researcher to filter out the bad sources before inferencing the big researcher. I'm hellbent on keeping this all within the scope of one tool call.

# More Fetch/Web Search Improvements

The Search_DuckDuckGo tool has been further enhanced. It now allows the agent to browse through all pages of results. The results also now include published date (if detected). It also now supports every DDG search types! Default DDG search is called text, but it can also now search by news, images, videos, and books.

The Fetch_Webpage tool now specifies how much of the page has been truncated, and cursor index, allowing it to pickup where it left off without re-consuming tokens. The model can now also choose to strip CSS selectors to remove excess noise, and there's a new URL Scraper mode that only returns URLs found on the full page.

More to come soon ~
prithivMLmodsย 
posted an update 19 days ago
view post
Post
1571
Many of 'em pinged me asking to make the nano-banana-aio to available on hf.co/spaces, so Iโ€™ve transferred the appโ€™s tech stack to make it compatible for deployment on Spaces. (Can be accessed with your own Gemini API) ๐Ÿค—โญ๏ธ

โœฆ Yes, it is now available on Spaces: prithivMLmods/Nano-Banana-AIO

Nano Banana AIO (All-in-One) App, which offers seamless image manipulation features, including single/multiple image adaptation, a canvas for free-style drawing to creative image generation, and standard text-to-image generation.

All in One Banana for you! ๐Ÿ˜‰
Tonicย 
posted an update 20 days ago
view post
Post
597
COMPUTER CONTROL IS ON-DEVICE !

๐Ÿก๐Ÿค– 78 % of EU smart-home owners DONโ€™T trust cloud voice assistants.

So we killed the cloud.

Meet Extรฉ: a palm-sized Android device that sees, hears & speaks your language - 100 % offline, 0 % data sent anywhere.

๐Ÿ”“ We submitted our technologies for consideration to the Liquid AI hackathon.

๐Ÿ“Š Dataset: 79 k UI-action pairs on Hugging Face (largest Android-control corpus ever) Tonic/android-operator-episodes

โšก Model: 98 % task accuracy, 678MB compressed , fits on existing android devices ! Tonic/l-android-control

๐Ÿ›ค๏ธ Experiment Tracker : check out the training on our TrackioApp Tonic/l-android-control

๐ŸŽฎ Live Model Demo: Upload an Android Screenshot and instructions to see the model in action ! Tonic/l-operator-demo



Built in a garage, funded by pre-orders, no VC. Now weโ€™re scaling to 1 k installer units.

Weโ€™re giving 50 limited-edition prototypes to investors , installers & researchers who want to co-design the sovereign smart home.

๐Ÿ‘‡ Drop โ€œEUSKERAโ€ in the comments if you want an invite, tag a friend who still thinks Alexa is โ€œconvenient,โ€ and smash โ™ฅ๏ธ if AI should belong to people - not servers.
prithivMLmodsย 
posted an update 20 days ago
view post
Post
3061
I'm a Hugging Face Fellow now, guys!๐Ÿค—โค๏ธ

With the same passion, trust, and momentum to contribute to the community, Iโ€™m excited to do some amazing things to wrap up Q3 and Q4 of 2025. And importantly, Iโ€™ve been lucky enough to receive some knowledge and guidance from @merve to build open-source demos and stuff. Thank you for the belief.

Thank you โ€” much love.
Long live open source!

โ€” Prithiv
prithivMLmodsย 
posted an update 23 days ago
view post
Post
7159
Introducing Gliese-OCR-7B-Post1.0, a document content-structure retrieval VLM designed for content extraction(OCRs) and summarization. This is the third model in the Camel Doc OCR VLM series, following Camel-Doc-OCR-062825. The new version fixes formal table reconstruction issues in both En and Zh, achieving optimal performance for long-context inferences. This model also shows significant improvements in LaTeX and Markdown rendering for OCR tasks.

๐Ÿค— Gliese-OCR-7B-Post1.0 : prithivMLmods/Gliese-OCR-7B-Post1.0
๐Ÿ“Œ Gliese-Post1.0 Collection : prithivMLmods/gliese-post10-68c52c4a6ca4935f5259a6d7
โฌ…๏ธ Previous Versions : prithivMLmods/Camel-Doc-OCR-062825
๐Ÿงจ Gliese-OCR-7B-Post1.0 (4-bit) Notebook Demo on T4 : prithivMLmods/Gliese-OCR-7B-Post1.0
๐Ÿ“– GitHub [Gliese-OCR-7B-Post1.0(4-bit)-reportlab] : https://tinyurl.com/ys7zuerc

Other Collections:

โž” Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
โž” Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-aug25-68a56aac39fe8084f3c168bd
โž” Multimodal VLMs - July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

.
.
.
To know more about it, visit the app page or the respective model page!!
  • 2 replies
ยท
prithivMLmodsย 
posted an update 26 days ago
view post
Post
3069
The POINTS-Reader, a vision-language model for end-to-end document conversion, is a powerful, distillation-free Vision-Language Model that sets new SoTA benchmarks. The demo is now available on HF (Extraction, Preview, Documentation). The input consists of a fixed prompt and a document image, while the output contains only a string (the text extracted from the document image). ๐Ÿ”ฅ๐Ÿค—

โœฆ Space/App: prithivMLmods/POINTS-Reader-OCR
โœฆ Model: tencent/POINTS-Reader
โœฆ Paper: https://arxiv.org/pdf/2509.01215

๐Ÿค— The app is done and ready to go brrrr with zero GPU. Thankyou @merve

.
.
.
To know more about it, visit the app page or the respective model page!!
  • 4 replies
ยท
prithivMLmodsย 
posted an update 27 days ago
view post
Post
3857
Build something cool with Nano Banana aka Gemini 2.5 Flash Image AIO [All-in-One]. Draw and transform on canvas, edit images, and generate imagesโ€”all in one place!๐ŸŒ

โœฆ๏ธŽ Constructed with the Gemini API (GCP). Try it here: prithivMLmods/Nano-Banana-AIO (Added the Space recently! - Sep 18 '25)
  • 4 replies
ยท
Abhaykoulย 
posted an update 27 days ago
view post
Post
2782
๐Ÿš€ Ever dreamed of training your own Large Language Model from scratch? What if I told you it doesn't require a supercomputer or PhD in ML? ๐Ÿคฏ

Introducing LLM Trainer - the educational framework that makes LLM training accessible to EVERYONE! Whether you're on a CPU-only laptop or scaling to distributed GPUs, we've got you covered. ๐Ÿ’ปโžก๏ธ๐Ÿ–ฅ๏ธ

Why LLM Trainer? Because existing tools are either too simplistic (hiding the magic) or too complex (requiring expert knowledge). We bridge the gap with:

๐ŸŽ“ Educational transparency - every component built from scratch with clear code
๐Ÿ’ป CPU-first approach - start training immediately, no GPU needed
๐Ÿ”ง Full customization - modify anything you want
๐Ÿ“ˆ Seamless scaling - from laptop to cluster without code changes
๐Ÿค HuggingFace integration - works with existing models & tokenizers

Key highlights:
โœ… Built-in tokenizers (BPE, WordPiece, HF wrappers)
โœ… Complete Transformer implementation from scratch
โœ… Optimized for CPU training
โœ… Advanced features: mixed precision, gradient checkpointing, multiple generation strategies
โœ… Comprehensive monitoring & metrics

Perfect for:
- Students learning transformers
- Researchers prototyping new ideas
- Developers building domain-specific models

Ready to train your first LLM? It's easier than you think!

๐Ÿ”— Check it out: https://github.com/HelpingAI/llm-trainer
๐Ÿ“š Docs: Getting Started Guide
๐Ÿ’ฌ Join the community: GitHub Discussions

#AI #MachineLearning #LLM #DeepLearning #OpenSource #Python #HuggingFace #NLP

Special thanks to HuggingFace and PyTorch teams for the amazing ecosystem! ๐Ÿ™
  • 1 reply
ยท
prithivMLmodsย 
posted an update 30 days ago
view post
Post
6444
Dropped the HeadshotX : a super-realistic headshot adapter for Qwen/Qwen-Image, an image generation model by Qwen. It is an advanced LoRA adaptation of the Qwen-Image model and an upgraded version of prithivMLmods/Qwen-Image-Studio-Realism, offering more precise portrait rendering with a strong focus on realism. The model was trained on diverse face types from across the world, labeled with florence2-en and caption-optimized using prithivMLmods/DeepCaption-VLA-7B. 11(types) ร— 5 different face types: Asian, Hispanic, Caucasian, Latina, Middle Eastern, etc.

โฎž Model๐Ÿค—: prithivMLmods/Qwen-Image-HeadshotX

โฎž The Previous Adapter (LoRA): prithivMLmods/Qwen-Image-Studio-Realism

โฎž Collection: prithivMLmods/qwen-image-exp-lora-68a978fe11400bc3165b0c4d

.
.
.
To know more about it, visit the app page or the respective model page!!
  • 2 replies
ยท
Nymboย 
posted an update 30 days ago
view post
Post
983
I have a few updates to my MCP server I wanna share: New Memory tool, improvements to web search & speech generation.

# Memory_Manager Tool

We now have a Memory_Manager tool. Ask ChatGPT to write all its memories verbatim, then tell gpt-oss-20b to save each one using the tool, then take them anywhere! It stores memories in a memories.json file in the repo, no external database required.

The Memory_Manager tool is currently hidden from the HF space because it's intended for local use. It's enabled by providing a HF_READ_TOKEN in the env secrets, although it doesn't actually use the key for anything. There's probably a cleaner way of ensuring memory is only used locally, I'll come back to this.

# Fetch & Websearch

The Fetch_Webpage tool has been simplified a lot. It now converts the page to Markdown and returns the page with three length settings (Brief, Standard, Full). This is a lot more reliable than the old custom extraction method.

The Search_DuckDuckGo tool has a few small improvements. The input is easier for small models to get right, and the output is more readable.

# Speech Generation

I've added the remaining voices for Kokoro-82M, it now supports all 54 voices with all accents/languages.

I also removed the 30 second cap by making sure it computes all chunks in sequence, not just the first. I've tested it on outputs that are ~10 minutes long. Do note that when used as an MCP server, the tool will timeout after 1 minute, nothing I can do about that for right now.

# Other Thoughts

Lots of MCP use cases involve manipulating media (image editing, ASR, etc.). I've avoided adding tools like this so far for two reasons:

1. Most of these solutions would require assigning it a ZeroGPU slot.
2. The current process of uploading files like images to a Gradio space is still a bit rough. It's doable but requires additional tools.

Both of these points make it a bit painful for local usage. I'm open to suggestions for other tools that rely on text.