Do you think domain-specific embedding fine-tuners are needed? I've been working with embeddings for marketing use cases and noticed something: most embeddings don't get marketing concepts very well. They're trained in general-purpose ways. The Issue I'm Seeing When I search marketing content with general embeddings:
My Question Do you think domain-specific embeddings are needed for marketing? Some thoughts:
Marketing has its own vocabulary and concept relationships General models trained on Wikipedia/web crawl miss these nuances But is fine-tuning worth the effort vs just using more retrieval tricks?
Quick Example I fine-tuned all-mpnet-base-v2 on ~1000 marketing concept pairs and saw 15-20% better retrieval accuracy. But I'm curious:
Has anyone else tried this for marketing or other domains? When do you think domain-specific embeddings are actually necessary vs overkill? Are there better approaches I'm missing?
๐ Exciting News! We've released a Performance Marketing Expert Dataset from Hawky.ai [www.hawky.ai] Hawky-ai
This dataset empowers AI models with cutting-edge strategies for Meta, Google Ads, and TikTok campaigns. It includes: 1. Multi-platform strategies for e-commerce, DTC, B2B, and more 2. Creative optimization and audience targeting insights 3. ROI-driven recommendations based on 2025 best practices
Try the Hugging Face Space demo for Logics-MLLM/Logics-Parsing, the latest multimodal VLM from the Logics Team at Alibaba Group. It enables end-to-end document parsing with precise content extraction in markdown format, and it also generates a clean HTML representation of the document while preserving its logical structure. ๐ค๐ฅ
Additionally, Iโve integrated one of my recent works โ prithivMLmods/Gliese-OCR-7B-Post1.0 โ which also excels at document comprehension.
Just wanted to share something exciting I've been exploringโQwen3-Omni and how it's transforming marketing workflows.
What makes it special? At Hawky.ai we are started experimenting with Qwen3 recently for Analysis and Optimization.
Unlike traditional tools that look at text, images, or audio separately, Qwen3-Omni analyzes everything together. It handles 119 languages, processes 40-minute audio sequences, and understands both images and videosโall at once.
The cool part? It's 2-3x faster than similar models thanks to its MoE architecture.
Real applications I'm seeing: Ad Analysis: It scores video ads by combining visual elements, audio tone, and textโgiving 25% better CTR predictions than single-mode tools. Campaign Localization: Drop in one ad, get 10 localized versions with native voiceovers in under a minute. Perfect for testing across markets.
Market Research: Feed it competitor content, podcasts, or UGC videos. It extracts actionable insights like "3-second hooks boost retention by 15%" and saves about 70% of analysis time.
Quality Checks: Automatically catches lip-sync errors and audio-visual mismatches.
Try Banana Zoom an advanced image enhancement web app that lets users select regions of an image for AI-powered upscaling and detail refinement. Using Googleโs (nano banana), it analyzes selections, generates context-aware enhancements, and produces high-resolution outputs. Simply drag-and-drop or upload images, make precise or fixed-size selections, and watch improvements in real-time with smooth zoom and pixel-dissolve effects.
Photo-Mate-i2i โ a space for experimenting with adapters for image manipulation using Kontext adapters, including Photo-Restore-i2i, PhotoCleanser-i2i, Polaroid-Warm-i2i, Yarn-Photo-i2i, Monochrome-Pencil, and more. Try out the demo, and to learn more, visit the app page or the respective model pages!
The Formative Mind: Theories of Consciousness as Practice
Instead of treating consciousness as a passive byproduct of a powerful unconscious engine, think of it as the engine itself: a process that builds rich representations (self-organizing), predicts and models its own processing (metarepresentation), and thereby brings an agent and its world into being (individuation). A brief synthesis.
Dropping some experimental adapters for FLUX.1-Kontext-dev, including Photo-Restore-i2i, PhotoCleanser-i2i, Polaroid-Warm-i2i, Yarn-Photo-i2i, and Monochrome-Pencil. These were trained under various settings with minimal image pairs to achieve optimal results. The dataset result sets end pairs were synthesized using Gemini-2.5-Flash-Image-Preview and others.๐คโจ
โจNote: All the above models share the same auto-labeling multimodal VLM captioning model, prithivMLmods/DeepCaption-VLA-7B, which is used for refining edit instructions and accurately understanding attributions for the generations.
There's now a custom Deep_Research tool in my Nymbo/Tools MCP server! TL;DR: The agent using the tools writes a summary of your requests and up to five DuckDuckGo searches (up to 50 results). Each of the webpages found in the searches are then fetched and given to our researcher (Qwen3-235B-A22B-Thinking-2507). The researcher sees the summary, searched queries, and fetched links, then writes a thorough research report. The agent using the tool provides the user with a summary of the report and a link to download research_report.txt. The researcher's instructions are similar to some leaked Perplexity sys prompts.
# Deep_Research Tool
It accomplishes everything in under a minute so it doesn't hit MCP's 60 second timeout, mostly thanks to Cerebras. The only thing required to make this work is a HF_READ_TOKEN for inference.
The Deep_Research tool could certainly be improved. It still needs some sort of mechanism for sorting URLs based on importance (I've got some ideas but I don't want it to be the responsibility of the agent using the tool). I'll probably add a second researcher to filter out the bad sources before inferencing the big researcher. I'm hellbent on keeping this all within the scope of one tool call.
# More Fetch/Web Search Improvements
The Search_DuckDuckGo tool has been further enhanced. It now allows the agent to browse through all pages of results. The results also now include published date (if detected). It also now supports every DDG search types! Default DDG search is called text, but it can also now search by news, images, videos, and books.
The Fetch_Webpage tool now specifies how much of the page has been truncated, and cursor index, allowing it to pickup where it left off without re-consuming tokens. The model can now also choose to strip CSS selectors to remove excess noise, and there's a new URL Scraper mode that only returns URLs found on the full page.
Many of 'em pinged me asking to make the nano-banana-aio to available on hf.co/spaces, so Iโve transferred the appโs tech stack to make it compatible for deployment on Spaces. (Can be accessed with your own Gemini API) ๐คโญ๏ธ
Nano Banana AIO (All-in-One) App, which offers seamless image manipulation features, including single/multiple image adaptation, a canvas for free-style drawing to creative image generation, and standard text-to-image generation.
๐ค๏ธ Experiment Tracker : check out the training on our TrackioApp Tonic/l-android-control
๐ฎ Live Model Demo: Upload an Android Screenshot and instructions to see the model in action ! Tonic/l-operator-demo
Built in a garage, funded by pre-orders, no VC. Now weโre scaling to 1 k installer units.
Weโre giving 50 limited-edition prototypes to investors , installers & researchers who want to co-design the sovereign smart home.
๐ Drop โEUSKERAโ in the comments if you want an invite, tag a friend who still thinks Alexa is โconvenient,โ and smash โฅ๏ธ if AI should belong to people - not servers.
With the same passion, trust, and momentum to contribute to the community, Iโm excited to do some amazing things to wrap up Q3 and Q4 of 2025. And importantly, Iโve been lucky enough to receive some knowledge and guidance from @merve to build open-source demos and stuff. Thank you for the belief.
Introducing Gliese-OCR-7B-Post1.0, a document content-structure retrieval VLM designed for content extraction(OCRs) and summarization. This is the third model in the Camel Doc OCR VLM series, following Camel-Doc-OCR-062825. The new version fixes formal table reconstruction issues in both En and Zh, achieving optimal performance for long-context inferences. This model also shows significant improvements in LaTeX and Markdown rendering for OCR tasks.
The POINTS-Reader, a vision-language model for end-to-end document conversion, is a powerful, distillation-free Vision-Language Model that sets new SoTA benchmarks. The demo is now available on HF (Extraction, Preview, Documentation). The input consists of a fixed prompt and a document image, while the output contains only a string (the text extracted from the document image). ๐ฅ๐ค
Build something cool with Nano Banana aka Gemini 2.5 Flash Image AIO [All-in-One]. Draw and transform on canvas, edit images, and generate imagesโall in one place!๐
โฆ๏ธ Constructed with the Gemini API (GCP). Try it here: prithivMLmods/Nano-Banana-AIO (Added the Space recently! - Sep 18 '25)
๐ Ever dreamed of training your own Large Language Model from scratch? What if I told you it doesn't require a supercomputer or PhD in ML? ๐คฏ
Introducing LLM Trainer - the educational framework that makes LLM training accessible to EVERYONE! Whether you're on a CPU-only laptop or scaling to distributed GPUs, we've got you covered. ๐ปโก๏ธ๐ฅ๏ธ
Why LLM Trainer? Because existing tools are either too simplistic (hiding the magic) or too complex (requiring expert knowledge). We bridge the gap with:
๐ Educational transparency - every component built from scratch with clear code ๐ป CPU-first approach - start training immediately, no GPU needed ๐ง Full customization - modify anything you want ๐ Seamless scaling - from laptop to cluster without code changes ๐ค HuggingFace integration - works with existing models & tokenizers
Key highlights: โ Built-in tokenizers (BPE, WordPiece, HF wrappers) โ Complete Transformer implementation from scratch โ Optimized for CPU training โ Advanced features: mixed precision, gradient checkpointing, multiple generation strategies โ Comprehensive monitoring & metrics
Perfect for: - Students learning transformers - Researchers prototyping new ideas - Developers building domain-specific models
Ready to train your first LLM? It's easier than you think!
Dropped the HeadshotX : a super-realistic headshot adapter for Qwen/Qwen-Image, an image generation model by Qwen. It is an advanced LoRA adaptation of the Qwen-Image model and an upgraded version of prithivMLmods/Qwen-Image-Studio-Realism, offering more precise portrait rendering with a strong focus on realism. The model was trained on diverse face types from across the world, labeled with florence2-en and caption-optimized using prithivMLmods/DeepCaption-VLA-7B. 11(types) ร 5 different face types: Asian, Hispanic, Caucasian, Latina, Middle Eastern, etc.
I have a few updates to my MCP server I wanna share: New Memory tool, improvements to web search & speech generation.
# Memory_Manager Tool
We now have a Memory_Manager tool. Ask ChatGPT to write all its memories verbatim, then tell gpt-oss-20b to save each one using the tool, then take them anywhere! It stores memories in a memories.json file in the repo, no external database required.
The Memory_Manager tool is currently hidden from the HF space because it's intended for local use. It's enabled by providing a HF_READ_TOKEN in the env secrets, although it doesn't actually use the key for anything. There's probably a cleaner way of ensuring memory is only used locally, I'll come back to this.
# Fetch & Websearch
The Fetch_Webpage tool has been simplified a lot. It now converts the page to Markdown and returns the page with three length settings (Brief, Standard, Full). This is a lot more reliable than the old custom extraction method.
The Search_DuckDuckGo tool has a few small improvements. The input is easier for small models to get right, and the output is more readable.
# Speech Generation
I've added the remaining voices for Kokoro-82M, it now supports all 54 voices with all accents/languages.
I also removed the 30 second cap by making sure it computes all chunks in sequence, not just the first. I've tested it on outputs that are ~10 minutes long. Do note that when used as an MCP server, the tool will timeout after 1 minute, nothing I can do about that for right now.
# Other Thoughts
Lots of MCP use cases involve manipulating media (image editing, ASR, etc.). I've avoided adding tools like this so far for two reasons:
1. Most of these solutions would require assigning it a ZeroGPU slot. 2. The current process of uploading files like images to a Gradio space is still a bit rough. It's doable but requires additional tools.
Both of these points make it a bit painful for local usage. I'm open to suggestions for other tools that rely on text.