Dataset Tools

community

AI & ML interests

Tools for creating and exploring datasets

Recent Activity

Dataset-Tools's activity

fdaudensΒ 
posted an update about 14 hours ago
view post
Post
295
πŸ’ͺ The open-source community is really unstoppable:

+5M total downloads for DeepSeek models on @hf .co
+4M are from the 700 models created by the community
That's 30% more than yesterday!
davanstrienΒ 
posted an update about 23 hours ago
prithivMLmodsΒ 
posted an update 1 day ago
view post
Post
1759
Deepswipe by
.
.
.
. DeepseekπŸ¬πŸ—Ώ






Everything is now in recovery. πŸ“‰πŸ“ˆ
  • 2 replies
Β·
fdaudensΒ 
posted an update 1 day ago
view post
Post
1238
πŸš€ The open source community is unstoppable: 4M total downloads for DeepSeek models on Hugging Face, with 3.2M coming from the +600 models created by the community.

That's 30% more than yesterday!
  • 1 reply
Β·
TonicΒ 
posted an update 2 days ago
view post
Post
1887
πŸ™‹πŸ»β€β™‚οΈ Hey there folks ,

our team made a game during the @mistral-game-jam and we're trying to win the community award !

try our game out and drop us a ❀️ like basically to vote for us !

Mistral-AI-Game-Jam/TextToSurvive

hope you like it !
davanstrienΒ 
posted an update 2 days ago
fdaudensΒ 
posted an update 3 days ago
view post
Post
7186
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mβ€”nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. πŸš€

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version β€” 1M downloads alone.
Β·
davanstrienΒ 
posted an update 3 days ago
view post
Post
1866
🌍 Big step for multilingual AI data!

The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:
β€’ Japanese
β€’ Italian
β€’ Old High German

Learn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-community

These ratings can help enhance training data for major world languages.
  • 1 reply
Β·
davidberenstein1957Β 
posted an update 3 days ago
fdaudensΒ 
posted an update 9 days ago
davidberenstein1957Β 
posted an update 9 days ago
fdaudensΒ 
posted an update 10 days ago
view post
Post
1799
Reminder: Don’t. Use. ChatGPT. As. A. Calculator. Seriously. πŸ€–

Loved listening to @sasha on Hard Forkβ€”it really made me think.

A few takeaways that hit home:
- Individual culpability only gets you so far. The real priority: demanding accountability and transparency from companies.
- Evaluate if generative AI is the right tool for certain tasks (like search) before using it.

Curious about the full conversation? https://www.nytimes.com/2025/01/17/podcasts/hardfork-tiktok-rednote-environment.html. Give it a listenβ€”it’s worth it! 🌍
  • 1 reply
Β·
prithivMLmodsΒ 
posted an update 10 days ago
davidberenstein1957Β 
posted an update 13 days ago
nataliaElvΒ 
posted an update 13 days ago
view post
Post
1425
New chapter in the Hugging Face NLP course! πŸ€— πŸš€

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub.Β 

Any feedback for improvements welcome!

https://huggingface.co/learn/nlp-course/chapter10
prithivMLmodsΒ 
posted an update 14 days ago
view post
Post
3063
ChemQwen-vL [ Qwen for Chem Vision ] πŸ§‘πŸ»β€πŸ”¬

πŸ§ͺModel : prithivMLmods/ChemQwen-vL

πŸ“ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/

πŸ“’Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju

Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/

πŸ€—: @prithivMLmods
  • 1 reply
Β·
TonicΒ 
posted an update 14 days ago
view post
Post
1497
πŸ™‹πŸ»β€β™‚οΈ Hey there folks ,

Facebook AI just released JASCO models that make music stems .

you can try it out here : Tonic/audiocraft

hope you like it
fdaudensΒ 
posted an update 15 days ago
view post
Post
1752
AI agents are coming. But who's in control?

@meg , one of the best researchers in AI ethics, makes a critical point about autonomy: fully autonomous systems carry unknowable risks because they operate on computer logic rather than human logic.

The solution? Build systems that support & assist rather than override human decisions.

I highly recommend reading the blog post written by Meg, @evijit @sasha and @giadap . They define different levels of agent autonomy & provide a values-based analysis of risks, benefits, and uses of AI agents to help you make better decisions.

πŸ‘‰ https://huggingface.co/blog/ethics-soc-7