
Bluesky Community
community
AI & ML interests
Tools for Bluesky π¦
Recent Activity
View all activity
bluesky-community's activity

BrigitteTousiΒ
posted
an
update
3 days ago
Post
3045
LeRobot goes to driving school! πππ
Hugging Face just announced a new collab with Yaak to bring the largest open-source self-driving dataset to LeRobot!
Major kudos to HF's @cadene , as well as @sandhawalia , @Shnissen and the Yaak team!
Check out the blog post here: https://huggingface.co/blog/lerobot-goes-to-driving-school
Hugging Face just announced a new collab with Yaak to bring the largest open-source self-driving dataset to LeRobot!
Major kudos to HF's @cadene , as well as @sandhawalia , @Shnissen and the Yaak team!
Check out the blog post here: https://huggingface.co/blog/lerobot-goes-to-driving-school

BrigitteTousiΒ
posted
an
update
4 days ago
Post
6990
I was chatting with
@peakji
, one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use βHUGGINGFACEβ to get access!
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use βHUGGINGFACEβ to get access!
Post
4642
10,000+ models based on Deepseek R1 have been publicly shared on Hugging Face! Which ones are your favorite ones: https://huggingface.co/models?sort=trending&search=r1. Truly game-changer!
Post
5870
Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!
Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise
Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise

davanstrienΒ
posted
an
update
14 days ago
Post
2684
π Introducing "Hugging Face Dataset Spotlight" π
I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!
This first episode explores mathematical reasoning datasets:
- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains
- open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models.
- facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.
Plus a bonus segment on bespokelabs/bespoke-manim!
https://www.youtube.com/watch?v=-TgmRq45tW4
I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!
This first episode explores mathematical reasoning datasets:
- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains
- open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models.
- facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.
Plus a bonus segment on bespokelabs/bespoke-manim!
https://www.youtube.com/watch?v=-TgmRq45tW4

davanstrienΒ
posted
an
update
15 days ago
Post
3615
Quick POC: Turn a Hugging Face dataset card into a short podcast introducing the dataset using all open models.
I think I'm the only weirdo who would enjoy listening to something like this though π
Here is an example for eth-nlped/stepverify
I think I'm the only weirdo who would enjoy listening to something like this though π
Here is an example for eth-nlped/stepverify

davanstrienΒ
posted
an
update
22 days ago
Post
2587
Hacked together a way to log trl GRPO training completions to a π€ dataset repo. This allows you to:
- Track rewards from multiple reward functions
- Treat the completion and rewards from training as a "proper" dataset and do EDA
- Share results for open science
The implementation is super hacky, but I'm curious if people would find this useful.
To push completions to the Hub, you just need two extra parameters:
Example dataset: davanstrien/test-logs
Colab: https://colab.research.google.com/drive/1wzBFPVthRYYTp-mEYlznLg_e_0Za1M3g
- Track rewards from multiple reward functions
- Treat the completion and rewards from training as a "proper" dataset and do EDA
- Share results for open science
The implementation is super hacky, but I'm curious if people would find this useful.
To push completions to the Hub, you just need two extra parameters:
log_completions=True
log_completions_hub_repo='your-username/repo-name'
Example dataset: davanstrien/test-logs
Colab: https://colab.research.google.com/drive/1wzBFPVthRYYTp-mEYlznLg_e_0Za1M3g
Post
2821
What are the best organizations to follow on
@huggingface
?
On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch
Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course
On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch
Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course
Post
3483
We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.
Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.
Have you been using any integration and how can we make it better?
https://huggingface.co/blog/inference-providers
Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.
Have you been using any integration and how can we make it better?
https://huggingface.co/blog/inference-providers

davanstrienΒ
posted
an
update
26 days ago
Post
2236
Dataset descriptions for trending Hugging Face datasets? Powered by a Smol model
davanstrien/Smol-Hub-tldr

davanstrienΒ
posted
an
update
28 days ago
Post
1919
How do you make 1M+ Hugging Face models & datasets more discoverable?
davanstrien/Smol-Hub-tldr!
I fine-tuned HuggingFaceTB/SmolLM2-360M to generate one-line summaries from a model or dataset README.
Its own self-description?
"A model for generating concise summaries of model & dataset cards from the Hugging Face Hub"
The goal? Make it easier to find the right models and datasets for your specific needs. It's already powering a semantic search for datasets Space.
It's still a WIP but thanks to @loubnabnl , @anton-l , @eliebak et al, for cooking such a nice base model for fine-tuning small, efficient models for specific domains and tasks. π
davanstrien/Smol-Hub-tldr!
I fine-tuned HuggingFaceTB/SmolLM2-360M to generate one-line summaries from a model or dataset README.
Its own self-description?
"A model for generating concise summaries of model & dataset cards from the Hugging Face Hub"
The goal? Make it easier to find the right models and datasets for your specific needs. It's already powering a semantic search for datasets Space.
It's still a WIP but thanks to @loubnabnl , @anton-l , @eliebak et al, for cooking such a nice base model for fine-tuning small, efficient models for specific domains and tasks. π

davanstrienΒ
posted
an
update
29 days ago
Post
1363
Made some significant updates to my π€ semantic datasets search app. If you love falling into a wiki black hole, you might like this...
https://huggingface.co/spaces/librarian-bots/huggingface-datasets-semantic-search
https://huggingface.co/spaces/librarian-bots/huggingface-datasets-semantic-search

cfahlgren1Β
authored
a
paper
about 1 month ago

davanstrienΒ
posted
an
update
about 1 month ago
Post
1830
Why choose between strong LLM reasoning and efficient models?
Use DeepSeek to generate high-quality training data, then distil that knowledge into ModernBERT answerdotai/ModernBERT-base for fast, efficient classification.
Blog post: https://danielvanstrien.xyz/posts/2025/deepseek/distil-deepseek-modernbert.html
Use DeepSeek to generate high-quality training data, then distil that knowledge into ModernBERT answerdotai/ModernBERT-base for fast, efficient classification.
Blog post: https://danielvanstrien.xyz/posts/2025/deepseek/distil-deepseek-modernbert.html

cfahlgren1Β
posted
an
update
about 1 month ago
Post
2062
If you haven't seen yet, we just released Inference Providers π
> 4 new serverless inference providers on the Hub π€―
> Use your HF API key or personal key with all providers π
> Chat with Deepseek R1, V3, and more on HF Hub π
> We support Sambanova, TogetherAI, Replicate, and Fal.ai πͺ
Best of all, we don't charge any markup on top of the provider π«° Have you tried it out yet? HF Pro accounts get $2 of free usage for the provider inference.
> 4 new serverless inference providers on the Hub π€―
> Use your HF API key or personal key with all providers π
> Chat with Deepseek R1, V3, and more on HF Hub π
> We support Sambanova, TogetherAI, Replicate, and Fal.ai πͺ
Best of all, we don't charge any markup on top of the provider π«° Have you tried it out yet? HF Pro accounts get $2 of free usage for the provider inference.

davanstrienΒ
posted
an
update
about 1 month ago
Post
1941
Updated the ColPali Query Generator Space
davanstrien/ColPali-Query-Generator to use
Qwen/Qwen2.5-VL-7B-Instruct.
Given an input image, it generates several queries along with explanations to justify them. This approach can generate synthetic data for fine-tuning ColPali models.
Given an input image, it generates several queries along with explanations to justify them. This approach can generate synthetic data for fine-tuning ColPali models.
Post
7231
AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!

davanstrienΒ
posted
an
update
about 2 months ago
Post
2039
π Big step for multilingual AI data!
The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:
β’ Japanese
β’ Italian
β’ Old High German
Learn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-community
These ratings can help enhance training data for major world languages.
The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:
β’ Japanese
β’ Italian
β’ Old High German
Learn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-community
These ratings can help enhance training data for major world languages.