Zengzhi Wang

SinclairWang

AI & ML interests

Data Engineering for Generative AI

Organizations

SinclairWang's activity

upvoted an article 3 days ago
view article
Article

Releasing the largest multilingual open pretraining dataset

88
upvoted an article about 1 month ago
view article
Article

Scaling AI-based Data Processing with Hugging Face + Dask

23
upvoted an article about 2 months ago
view article
Article

RegMix: Data Mixture as Regression for Language Model Pre-training

10