8 36 143

PZ PRO

philipp-zettl

philipp-zettl

AI & ML interests

NLP/CV/Multimodal learning

Recent Activity

upvoted a paper about 7 hours ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

liked a Space about 8 hours ago

huggingface/HuggingDiscussions

reacted to hexgrad's post with 🔥 7 days ago

https://huggingface.co/hexgrad/Kokoro-82M got an upgrade! ⬆️ More voices, more languages, `pip install kokoro`, and still 82M parameters. GitHub: https://github.com/hexgrad/kokoro PyPI: https://pypi.org/project/kokoro/ Space: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

View all activity

Organizations

philipp-zettl's activity

upvoted a paper about 7 hours ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 2 days ago • 80

liked a Space about 8 hours ago

HuggingDiscussions

🏢

Join discussions on Hugging Face Hub

reacted to hexgrad's post with 🔥 7 days ago

Post

8216

hexgrad/Kokoro-82M got an upgrade! ⬆️ More voices, more languages, pip install kokoro, and still 82M parameters.

GitHub: https://github.com/hexgrad/kokoro
PyPI: https://pypi.org/project/kokoro/
Space: hexgrad/Kokoro-TTS

11 replies

upvoted an article 10 days ago

Article

FineWeb2-C: Help Build Better Language Models in Your Language

and 5 others •

Dec 23, 2024

• 18

reacted to mitkox's post with 🚀 10 days ago

Post

2170

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

7 replies

reacted to fdaudens's post with ❤️ 10 days ago

Post

8110

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. 🚀

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.