Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
5
5
Mengqi Li
Kullpar
Follow
asunyuchen's profile picture
1 follower
·
7 following
ElementQi
AI & ML interests
None yet
Recent Activity
reacted
to
yeonseok-zeticai
's
post
with 👍
3 days ago
Hi everyone, I’ve been running small language models (SLLMs) directly on smartphones — completely offline, with no cloud backend or server API calls. I wanted to share: 1. ⚡ Tokens/sec performance across several SLLMs 2. 🤖 Observations on hardware utilization (where the workload actually runs) 3. 📏 Trade-offs between model size, latency, and feasibility for mobile apps There are reports for below models - QWEN3 0.6B - NVIDIA/Nemotron QWEN 1.5B - SimpleScaling S1 - TinyLlama - Unsloth tuned Llama 3.2 1B - Naver HyperClova 0.5B 📜Comparable Benchmark reports (no cloud, all on-device): I’d really value your thoughts on: - Creative ideas to further optimize inference under these hardware constraints - Other compact LLMs worth testing on-device - Experiences you’ve had trying to deploy LLMs at the edge If there’s interest, I’m happy to share more details on the test setup, hardware specs, or the tooling we used for these comparisons. Thanks for taking a look, and you can build your own through at "https://mlange.zetic.ai"!
reacted
to
yeonseok-zeticai
's
post
with 🔥
3 days ago
Hi everyone, I’ve been running small language models (SLLMs) directly on smartphones — completely offline, with no cloud backend or server API calls. I wanted to share: 1. ⚡ Tokens/sec performance across several SLLMs 2. 🤖 Observations on hardware utilization (where the workload actually runs) 3. 📏 Trade-offs between model size, latency, and feasibility for mobile apps There are reports for below models - QWEN3 0.6B - NVIDIA/Nemotron QWEN 1.5B - SimpleScaling S1 - TinyLlama - Unsloth tuned Llama 3.2 1B - Naver HyperClova 0.5B 📜Comparable Benchmark reports (no cloud, all on-device): I’d really value your thoughts on: - Creative ideas to further optimize inference under these hardware constraints - Other compact LLMs worth testing on-device - Experiences you’ve had trying to deploy LLMs at the edge If there’s interest, I’m happy to share more details on the test setup, hardware specs, or the tooling we used for these comparisons. Thanks for taking a look, and you can build your own through at "https://mlange.zetic.ai"!
upvoted
a
paper
21 days ago
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
View all activity
Organizations
Papers
1
arxiv:
2506.03077
models
0
None public yet
datasets
0
None public yet