AI & ML interests

None defined yet.

Recent Activity

AmeeeeeΒ 
posted an update 10 days ago
view post
Post
1542
πŸ€— Here’s a fun educational video I made to show how Sheets and AI can upgrade your structured content.

Better tables and clearer messages with just a little help from open-source AI!

aisheets/sheets
AmeeeeeΒ 
posted an update 25 days ago
view post
Post
1754
With Sheets, try a new way to create structured content with the help of AI!

No installs. No login. Just open a link and 🀩

This app lets you create a dataset by importing a file or starting from a prompt.

What’s different about SHEETS?
πŸ”Ž Web search integration to ground answers in real-world data
πŸ“š In-context learning from validated sources
πŸ”— Transparent sourcing β€” every result is linked
🧩 Runs on multiple open-source models

Fight hallucinations and start creating content you can rely on.

dvilasueroΒ 
posted an update 25 days ago
view post
Post
2608
Super excited to launch Hugging Face Sheets: Spreadsheets meet AI and unstructured data.

A few months ago, we started imagining new ways to build and transform datasets with the latest open-source models.

Today, I'm thrilled to introduce our first step in this direction.


In a nutshell:

πŸ“ Effortlessly run prompts and models over your data.
🌐 Agentic search for accuracy and real-time information.
πŸ–ΌοΈ Familiar, minimalistic interface for interacting with data.
🎯 Human feedback 2.0: Your input directly improves generated data.
πŸ’― Access hundreds of open models and leading inference providers.

Go to this space to try it out!

aisheets/sheets

Leave your questions below, we're just getting started!
  • 2 replies
Β·
davanstrienΒ 
posted an update 27 days ago
view post
Post
2874
Inspired by Hugging Face's official MCP server, I've developed a complementary tool that exposes my semantic search API to enhance discovery across the HF platform.

Key capabilities:

- AI-powered semantic search for models and datasets
- Parameter count analysis via safetensors metadata
- Trending content discovery
- Find similar models/datasets functionality
- 11 tools total for enhanced ecosystem navigation

The semantic search goes beyond simple keyword matching, understanding context and relationships between different models and datasets.

Example query: "Find around 10 reasoning Hugging Face datasets published in 2025 focusing on topics other than maths and science. Show a link and a short summary for each dataset." (results in video!)

https://github.com/davanstrien/hub-semantic-search-mcp
loubnabnlΒ 
posted an update about 2 months ago
davanstrienΒ 
posted an update 2 months ago
view post
Post
2261
Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)
thomwolfΒ 
posted an update 3 months ago
view post
Post
5388
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Faceβ€”in robotics and across all AI fieldsβ€”we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!
  • 1 reply
Β·
davanstrienΒ 
posted an update 3 months ago
view post
Post
1700
I've created a v1 dataset ( davanstrien/reasoning-required) and model ( davanstrien/ModernBERT-based-Reasoning-Required) to help curate "wild text" data for generating reasoning examples beyond the usual code/math/science domains.

- I developed a "Reasoning Required" dataset with a 0-4 scoring system for reasoning complexity
- I used educational content from HuggingFaceFW/fineweb-edu, adding annotations for domains, reasoning types, and example questions

My approach enables a more efficient workflow: filter text with small models first, then use LLMs only on high-value content.

This significantly reduces computation costs while expanding reasoning dataset domain coverage.