1 1 40

Eric Xian

ericxian1997

AI & ML interests

None yet

Recent Activity

liked a Space 6 days ago

LLM360/TxT360

liked a dataset 3 months ago

HuggingFaceTB/smollm-corpus

upvoted a paper 3 months ago

MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion

View all activity

Organizations

liked a Space 6 days ago

111

TxT360: Trillion Extracted Text

📖

Create a large, deduplicated dataset for LLM pre-training

liked a dataset 3 months ago

HuggingFaceTB/smollm-corpus

Viewer • Updated Sep 6, 2024 • 237M • 8.81k • 340

upvoted a paper 3 months ago

MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion

Paper • 2502.04235 • Published Feb 6 • 22

liked a dataset 3 months ago

m-a-p/FineFineWeb

Viewer • Updated Dec 19, 2024 • 4.89B • 3.28M • 61

liked a model 4 months ago

jinaai/jina-embeddings-v3

Feature Extraction • Updated Feb 24 • 3.94M • 1.02k

liked a Space 4 months ago

2.72k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

liked a dataset 4 months ago

PrimeIntellect/SYNTHETIC-1

Viewer • Updated Feb 21 • 1.99M • 839 • 55

liked a model 4 months ago

perplexity-ai/r1-1776

Text Generation • Updated Feb 26 • 12.1k • 2.28k

liked a dataset 6 months ago

allenai/dolmino-mix-1124

Viewer • Updated Dec 17, 2024 • 165M • 19k • 62

liked a model 7 months ago

Snowflake/snowflake-arctic-embed-m

liked 2 datasets 7 months ago

opencsg/chinese-fineweb-edu

Viewer • Updated Jan 20 • 84.6M • 5.72k • 103

SirNeural/flan_v2

Viewer • Updated Feb 24, 2023 • 336M • 595 • 193

liked a model 7 months ago

xiaozaa/catvton-flux-alpha

Image-to-Image • Updated Nov 26, 2024 • 659 • 45

liked a dataset 7 months ago

AI-MO/NuminaMath-CoT

Viewer • Updated Nov 25, 2024 • 860k • 3.52k • 457

liked 3 models 8 months ago

liked a dataset 8 months ago

HuggingFaceFW/fineweb-edu

Viewer • Updated Jan 31 • 3.3B • 119k • 701

liked a model 8 months ago

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jun 6, 2024 • 54 • 4

liked a dataset 8 months ago

di-zhang-fdu/OpenLongCoT-Pretrain

Viewer • Updated Oct 28, 2024 • 103k • 82 • 86

Eric Xian

AI & ML interests

Recent Activity

Organizations

ericxian1997's activity

TxT360: Trillion Extracted Text

The Ultra-Scale Playbook