Hynek Kydlicek's picture

Hynek Kydlicek

hynky

·

AI & ML interests

Data-processing

Recent Activity

updated a dataset about 14 hours ago

macrodata/aloha_static_battery_ep005_009

published a dataset about 14 hours ago

macrodata/aloha_static_battery_ep005_009

updated a dataset about 14 hours ago

macrodata/aloha_static_battery_ep000_004

View all activity

Organizations

liked a Space 1 day ago

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

Explore synthetic data experiments in a bookshelf view

liked a Space 29 days ago

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Who needs 1T parameters? Olympiad proofs with a 4B model

liked a dataset 2 months ago

HuggingFaceFW/finetranslations

Viewer • Updated Jan 9 • 3.33B • 26.9k • 276

liked a Space 2 months ago

FinePDFs: Liberating 3T of the finest tokens from PDFs

liked a Space 3 months ago

Evaluation Guidebook

Explore LLM benchmark trends over time

liked a dataset 6 months ago

HuggingFaceFW/finepdfs

Viewer • Updated Jan 9 • 476M • 36k • 824

liked a Space 7 months ago

Bringing paper to life: A modern template for scientific writing

Explore and download a modern scientific paper template

liked a Space about 1 year ago

The Ultra-Scale Playbook

The ultimate guide to training LLM on large GPU Clusters

liked a dataset about 1 year ago

data-is-better-together/fineweb-c

Viewer • Updated Jul 8, 2025 • 88.7k • 930 • 58

liked a dataset over 1 year ago

HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 70.8k • 768

liked a Space over 1 year ago

Number Tokenization Blog

Explore how tokenization affects arithmetic in LLMs

liked 2 datasets over 1 year ago

CohereLabs/Global-MMLU

Viewer • Updated Aug 14, 2025 • 602k • 10k • 150

ClusterlabAi/InstAr-500k

Viewer • Updated Jul 30, 2024 • 481k • 94 • 15

liked a Space over 1 year ago

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

Evaluate multilingual models using FineTasks

liked a dataset over 1 year ago

LLM360/TxT360

Updated May 26, 2025 • 31k • 248

liked 2 Spaces over 1 year ago

Hub LFS Analysis

An analysis of LFS files on the Hub.

TxT360: Trillion Extracted Text

Explore and download the TxT360 LLM pre‑training dataset

liked a dataset over 1 year ago

Cleanlab/bad_data_gsm8k_svamp.csv

Viewer • Updated Apr 25, 2024 • 34 • 48 • 3

liked a Space over 1 year ago

Datasets Metrics Explorer

Launch an interactive demo interface

liked a dataset over 1 year ago

ThaiSyntheticQA/ThaiQA-v1

Viewer • Updated Jul 24, 2024 • 12.7k • 23 • 4