Mengqi Li's picture

1 5 5

Mengqi Li

Kullpar

·

ElementQi

AI & ML interests

None yet

Recent Activity

reacted to andito's post with 🔥 about 1 month ago

🧠👁️ Can AI visualize solutions? Humans often solve visual problems by sketching ideas in our minds. What if Vision-Language Models (VLMs) could do something similar, not by generating full images, but by using internal “mental sketches”? That’s the idea behind Mirage, a new framework that empowers VLMs to reason using latent visual tokens. Instead of just thinking in words, Mirage mixes in abstract visual representations that help the model solve complex tasks. These aren't photorealistic images. They're compact, internal representations optimized purely to support reasoning. 🔧 Mirage is trained in two phases: 1) Grounding: It learns to produce latent tokens anchored in real images. 2) Refinement: The model drops the images and learns to generate visual tokens on its own. 📈 And yes, it works! On challenging benchmarks like Visual Spatial Planning, Jigsaw puzzles, and Spatial Attention Tasks, Mirage clearly outperforms GPT-4o and other strong baselines. Smart sketches > empty words. By mimicking the way humans visualize solutions, Mirage gives AI a new kind of imagination, one that’s faster, more efficient, and more human-like. Kudos to the teams at UMass Amherst and MIT behind this exciting work. Check the paper: https://huggingface.co/papers/2506.17218

reacted to yeonseok-zeticai's post with 👍 about 2 months ago

Hi everyone, I’ve been running small language models (SLLMs) directly on smartphones — completely offline, with no cloud backend or server API calls. I wanted to share: 1. ⚡ Tokens/sec performance across several SLLMs 2. 🤖 Observations on hardware utilization (where the workload actually runs) 3. 📏 Trade-offs between model size, latency, and feasibility for mobile apps There are reports for below models - QWEN3 0.6B - NVIDIA/Nemotron QWEN 1.5B - SimpleScaling S1 - TinyLlama - Unsloth tuned Llama 3.2 1B - Naver HyperClova 0.5B 📜Comparable Benchmark reports (no cloud, all on-device): I’d really value your thoughts on: - Creative ideas to further optimize inference under these hardware constraints - Other compact LLMs worth testing on-device - Experiences you’ve had trying to deploy LLMs at the edge If there’s interest, I’m happy to share more details on the test setup, hardware specs, or the tooling we used for these comparisons. Thanks for taking a look, and you can build your own through at "https://mlange.zetic.ai"!

reacted to yeonseok-zeticai's post with 🔥 about 2 months ago

Hi everyone, I’ve been running small language models (SLLMs) directly on smartphones — completely offline, with no cloud backend or server API calls. I wanted to share: 1. ⚡ Tokens/sec performance across several SLLMs 2. 🤖 Observations on hardware utilization (where the workload actually runs) 3. 📏 Trade-offs between model size, latency, and feasibility for mobile apps There are reports for below models - QWEN3 0.6B - NVIDIA/Nemotron QWEN 1.5B - SimpleScaling S1 - TinyLlama - Unsloth tuned Llama 3.2 1B - Naver HyperClova 0.5B 📜Comparable Benchmark reports (no cloud, all on-device): I’d really value your thoughts on: - Creative ideas to further optimize inference under these hardware constraints - Other compact LLMs worth testing on-device - Experiences you’ve had trying to deploy LLMs at the edge If there’s interest, I’m happy to share more details on the test setup, hardware specs, or the tooling we used for these comparisons. Thanks for taking a look, and you can build your own through at "https://mlange.zetic.ai"!

View all activity

Organizations

Papers 1

arxiv:2506.03077

models 0

None public yet

datasets 0

None public yet