AI & ML interests

None defined yet.

Recent Activity

merve 
posted an update about 5 hours ago
Nymbo 
posted an update 2 days ago
view post
Post
311
I have a few updates to my MCP server I wanna share: New Memory tool, improvements to web search & speech generation.

# Memory_Manager Tool

We now have a Memory_Manager tool. Ask ChatGPT to write all its memories verbatim, then tell gpt-oss-20b to save each one using the tool, then take them anywhere! It stores memories in a memories.json file in the repo, no external database required.

The Memory_Manager tool is currently hidden from the HF space because it's intended for local use. It's enabled by providing a HF_READ_TOKEN in the env secrets, although it doesn't actually use the key for anything. There's probably a cleaner way of ensuring memory is only used locally, I'll come back to this.

# Fetch & Websearch

The Fetch_Webpage tool has been simplified a lot. It now converts the page to Markdown and returns the page with three length settings (Brief, Standard, Full). This is a lot more reliable than the old custom extraction method.

The Search_DuckDuckGo tool has a few small improvements. The input is easier for small models to get right, and the output is more readable.

# Speech Generation

I've added the remaining voices for Kokoro-82M, it now supports all 54 voices with all accents/languages.

I also removed the 30 second cap by making sure it computes all chunks in sequence, not just the first. I've tested it on outputs that are ~10 minutes long. Do note that when used as an MCP server, the tool will timeout after 1 minute, nothing I can do about that for right now.

# Other Thoughts

Lots of MCP use cases involve manipulating media (image editing, ASR, etc.). I've avoided adding tools like this so far for two reasons:

1. Most of these solutions would require assigning it a ZeroGPU slot.
2. The current process of uploading files like images to a Gradio space is still a bit rough. It's doable but requires additional tools.

Both of these points make it a bit painful for local usage. I'm open to suggestions for other tools that rely on text.
merve 
posted an update 7 days ago
view post
Post
5997
large AI labs have dropped so many open models last week 🔥 don't miss out on them

→ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
→ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
→ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486
  • 1 reply
·
merve 
posted an update 13 days ago
kadirnar 
posted an update 14 days ago
view post
Post
1717
What can you do with the VyvoTTS library?

- You can train a model in a language it has never been trained in using the PT model. There’s no need for large datasets.
- With the PT model, you can easily replicate the voice of any character you want. Just 1k samples are enough.
- You can add emotion support with a small dataset.

Github: https://github.com/Vyvo-Labs/VyvoTTS
HuggingFace: Vyvo
Nymbo 
posted an update 15 days ago
view post
Post
762
I built a general use MCP space ~ Fetch webpages, DuckDuckGo search, Python code execution, Kokoro TTS, Image Gen, Video Gen.

# Tools

1. Fetch webpage
2. Web search via DuckDuckGo (very concise, low excess context)
3. Python code executor
4. Kokoro-82M speech generation
5. Image Generation (use any model from HF Inference Providers)
6. Video Generation (use any model from HF Inference Providers)

The first four tools can be used without any API keys whatsoever. DDG search is free and the code execution and speech gen is done on CPU. Having a HF_READ_TOKEN in the env variables will show all tools. If there isn't a key present, The Image/Video Gen tools are hidden.

Nymbo/Tools
Nymbo 
posted an update 23 days ago
view post
Post
971
Anyone using Jan-v1-4B for local MCP-based web search, I highly recommend you try out Intelligent-Internet/II-Search-4B

Very impressed with this lil guy and it deserves more downloads. It's based on the original version of Qwen3-4B but find that it questions reality way less often. Jan-v1 seems to think that everything it sees is synthetic data and constantly gaslights me
Parveshiiii 
posted an update about 1 month ago
view post
Post
1010
🚀 Just Dropped: MathX-5M — Your Gateway to Math-Savvy GPTs

👨‍🔬 Wanna fine-tune your own GPT for math?
🧠 Building a reasoning agent that actually *thinks*?
📊 Benchmarking multi-step logic across domains?

Say hello to [**MathX-5M**]( XenArcAI/MathX-5M) — a **5 million+ sample** dataset crafted for training and evaluating math reasoning models at scale.

Built by **XenArcAI**, it’s optimized for:
- 🔍 Step-by-step reasoning with , , and formats
- 🧮 Coverage from arithmetic to advanced algebra and geometry
- 🧰 Plug-and-play with Gemma, Qwen, Mistral, and other open LLMs
- 🧵 Compatible with Harmony, Alpaca, and OpenChat-style instruction formats

Whether you're prototyping a math tutor, testing agentic workflows, or just want your GPT to solve equations like a pro—**MathX-5M is your launchpad**.

🔗 Dive in: ( XenArcAI/MathX-5M)

Let’s make open-source models *actually* smart at math.
#FineTuneYourGPT #MathX5M #OpenSourceAI #LLM #XenArcAI #Reasoning #Gemma #Qwen #Mistral

merve 
posted an update about 1 month ago
view post
Post
3246
GPT-4.1-mini level model right in your iPhone 🤯

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks 🔥

allows commercial use as well!
merve 
posted an update about 1 month ago
view post
Post
1130
we're all sleeping on this OCR model rednote-hilab/dots.ocr 🔥

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🤯

single e2e model to extract image, convert tables, formula, and more into markdown 📝
try it MohamedRashad/Dots-OCR
merve 
posted an update about 1 month ago
view post
Post
662
massive releases and tons of Flux 1. Krea LoRas past week!
here's some of the picks, find more models in collection 🫡 merve/releases-august-2-6890c14248203522b7d0267f

LLMs 💬
> Tencent dropped tencent/Hunyuan-7B-Instruct
> Qwen released Qwen/Qwen3-Coder-30B-A3B-Instruct, 30B MoE with 3B params for coding (OS)

vision/multimodal
> RedNote released rednote-hilab/dots.ocr - 3B OCR model (OS)
> Cohere released CohereLabs/command-a-vision-07-2025 - 112B (dense!) VLM for 6 languages
> StepFun-AI shipped stepfun-ai/step3 - 321B MoE VLM (OS)
> Skywork shipped Skywork/Skywork-UniPic-1.5B - new any-to-any model (image+text → image+text) (OS)
merve 
posted an update about 1 month ago
Parveshiiii 
posted an update about 1 month ago
view post
Post
1067
🚀 Launch Alert: Dev-Stack-Agents
Meet your 50-agent senior AI team — principal-level experts in engineering, AI, DevOps, security, product, and more — all bundled into one modular repo.

+ Code. Optimize. Scale. Secure.
- Full-stack execution, Claude-powered. No human bottlenecks.


🔧 Built for Claude Code
Seamlessly plug into Claude’s dev environment:

* 🧠 Each .md file = a fully defined expert persona
* ⚙️ Claude indexes them as agents with roles, skills & strategy
* 🤖 You chat → Claude auto-routes to the right agent(s)
* ✍️ Want precision? Just call @agent-name directly
* 👥 Complex task? Mention multiple agents for team execution

Examples:

"@security-auditor please review auth flow for risks"
"@cloud-architect + @devops-troubleshooter → design a resilient multi-region setup"
"@ai-engineer + @legal-advisor → build a privacy-safe RAG pipeline"


🔗 https://github.com/Parveshiiii/Dev-Stack-Agents
MIT License | Claude-Ready | PRs Welcome

  • 1 reply
·
merve 
posted an update about 1 month ago
view post
Post
3617
past week in open AI was insane 🔥 here's some of picks, find more here merve/releases-july-25-688768ca47fe3693407e02d1

💬 LLMs & VLMs
> Qwen/Qwen3-235B-A22B-Thinking-2507 had a new update (OS)
> Qwen/Qwen3-Coder-480B-A35B-Instruct is out with 480B total 35B active params 🤯 (OS)
> AllenAI dropped an update to allenai/olmOCR-7B-0725 📝
> InternLM released internlm/Intern-S1 - 235B Qwen3 MoE + 6B InternViT encoder (OS)
> OmniSVG/OmniSVG is a new SVG generation VLM (OS)

🖼️ image/video/3D generation
> WanAI released Wan2.2 series - both T2V and I2V 14B models for high-quality video generation (OS) multimodalart/wan-22-688767e313337b434ed55112
> Tencent dropped tencent/HunyuanWorld-1 - image-to-3D scene generation model
  • 1 reply
·
merve 
posted an update about 1 month ago
view post
Post
4372
🤯 241B VLM with apache-2.0 license internlm/Intern-S1

internlm released Intern-S1: multimodal reasoning model based on 235B MoE Qwen3 and 6B InternViT 😍

benchmarks look great (👑 best model ✅ best open model)
sayakpaul 
posted an update about 2 months ago
view post
Post
1196
Fast LoRA inference for Flux with Diffusers and PEFT 🚨

There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.

In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of:

1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯

We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗

Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs.

Learn the details and the full code here:
https://huggingface.co/blog/lora-fast
merve 
posted an update about 2 months ago
view post
Post
819
so many open LLMs and image LoRAs dropped past week, here's some picks for you 🫡 merve/releases-july-18-687e3fbd2ab9b39c51f9238b

LLMs
> ByteDance released a bunch of translation models called Seed-X-RM (7B) ByteDance-Seed/Seed-X-RM-7B
> NVIDIA released reasoning models of which 32B surpassing the giant Qwen3-235B with cc-by-4.0 license 👏 nvidia/openreasoning-nemotron-687730dae0170059860f1f01
> LG released a new EXAONE model (32B) LGAI-EXAONE/EXAONE-4.0-32B

VLMs/any-to-any
> vidore/colqwen-omni-v0.1 is a new any-to-any retriever (MIT)
> HiDream-ai/HiDream-E1-1 is image+text in image+text out model (MIT)

LoRAs
> There's a bunch of LoRAs based on Flux Kontext, gotta check out the collection 🤠
merve 
posted an update about 2 months ago
ariG23498 
posted an update about 2 months ago
merve 
posted an update about 2 months ago