Daniel van Strien's picture

Daniel van Strien PRO

davanstrien

AI & ML interests

Machine Learning Librarian

Recent Activity

updated a dataset about 5 hours ago
data-is-better-together/fineweb-c-progress
updated a dataset about 5 hours ago
librarian-bots/model_cards_with_metadata
liked a model about 6 hours ago
Qwen/Qwen2.5-VL-32B-Instruct
View all activity

Organizations

Hugging Face's profile picture Notebooks-explorers's profile picture Nasjonalbiblioteket AI Lab's profile picture Living with Machines's profile picture BigScience Workshop's profile picture Spaces-explorers's profile picture BigScience Catalogue Data's profile picture Hacks/Hackers's profile picture flyswot's profile picture BigScience: LMs for Historical Texts's profile picture Webhooks Explorers (BETA)'s profile picture HuggingFaceM4's profile picture Open Access AI Collective's profile picture HF Canonical Model Maintainers's profile picture BigLAM: BigScience Libraries, Archives and Museums's profile picture Hugging Face OSS Metrics's profile picture ImageIN's profile picture Stable Diffusion Bias Eval's profile picture Librarian Bots's profile picture Blog-explorers's profile picture Hacktoberfest 2023's profile picture Hugging Face Smol Models Research's profile picture geospatial's profile picture HPLT's profile picture HF-IA-archiving's profile picture 2A2I Legacy Models & Datasets's profile picture testy's profile picture DIBT-for-Klingon's profile picture Wikimedia Movement's profile picture DIBT-for-Esperanto's profile picture Journalists on Hugging Face's profile picture PleIAs's profile picture Persian AI Community's profile picture HuggingFaceFW's profile picture Data Is Better Together's profile picture Social Post Explorers's profile picture OMOTO AI's profile picture academic-datasets's profile picture HuggingFaceFW-Dev's profile picture Hugging Face Discord Community's profile picture UCSF-JHU Opioid Industry Documents Archive's profile picture Dataset Tools's profile picture PDFPages's profile picture dibt-private's profile picture Data Is Better Together Contributor's profile picture Bluesky Community's profile picture Open R1's profile picture

davanstrien's activity

published an article 2 months ago
view article
Article

Explore, Curate and Vector Search Any Hugging Face Dataset with Nomic Atlas

By MaxNomic and 4 others β€’
β€’ 30
published an article 3 months ago
view article
Article

FineWeb2-C: Help Build Better Language Models in Your Language

By davanstrien and 5 others β€’
β€’ 19
published an article 4 months ago
view article
Article

Open Preference Dataset for Text-to-Image Generation by the πŸ€— Community

By davidberenstein1957 and 6 others β€’
β€’ 56
published an article 4 months ago
view article
Article

Let’s make a generation of amazing image generation models

By burtenshaw and 4 others β€’
β€’ 33
published an article 4 months ago
view article
Article

Share your open ML datasets on Hugging Face Hub!

By davanstrien and 3 others β€’
β€’ 28
published an article 6 months ago
view article
Article

Scaling AI-based Data Processing with Hugging Face + Dask

By scj13 and 3 others β€’
β€’ 30
published an article 9 months ago
view article
Article

Introducing Synthetic Data Workshop: Your Gateway to Easy Synthetic Dataset Creation

By davanstrien β€’
β€’ 12
published an article 9 months ago
view article
Article

Data Is Better Together: A Look Back and Forward

By sdiazlor and 2 others β€’
β€’ 20
published an article 10 months ago
view article
Article

Synthetic dataset generation techniques: generating custom sentence similarity data

By davanstrien β€’
β€’ 16
published an article 10 months ago
view article
Article

Synthetic dataset generation techniques: Self-Instruct

By davanstrien β€’
β€’ 14
published an article 11 months ago
view article
Article

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

By davanstrien β€’
β€’ 8
published an article about 1 year ago
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

By loubnabnl and 2 others β€’
β€’ 82
published an article about 1 year ago
published an article over 1 year ago
view article
Article

Extracting Insights from Model Cards Using Open Large Language Models

By davanstrien β€’
published an article over 1 year ago
view article
Article

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

By VictorSanh and 10 others β€’
β€’ 31
published an article over 1 year ago
view article
Article

Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub

By davanstrien β€’
β€’ 1
published an article almost 2 years ago
view article
Article

The Hugging Face Hub for Galleries, Libraries, Archives and Museums

By davanstrien β€’
β€’ 1
published an article almost 2 years ago
view article
Article

Introducing BERTopic Integration with Hugging Face Hub

By davanstrien and 1 other β€’
β€’ 8
published an article about 2 years ago