xincheng yang's picture
1 7

xincheng yang

Vertax
ยท

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago
Vertax/xense_bi_arx5_tie_shoelaces_tactile
published a dataset 1 day ago
Vertax/xense_bi_arx5_tie_shoelaces_tactile
reacted to andito's post with โค๏ธ 1 day ago
Finally, our new paper is out! "๐—™๐—ถ๐—ป๐—ฒ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป: ๐—ข๐—ฝ๐—ฒ๐—ป ๐——๐—ฎ๐˜๐—ฎ ๐—œ๐˜€ ๐—”๐—น๐—น ๐—ฌ๐—ผ๐˜‚ ๐—ก๐—ฒ๐—ฒ๐—ฑ"! ๐Ÿฅณ https://huggingface.co/papers/2510.17269 If you've ever trained a VLM, you know this problem: nobody shares their data mixtures. It's a black box, making replicating SOTA work impossible. We wanted to change that. FineVision unifies 200 sources into 24 million samples. With 17.3 million images and 9.5 billion answer tokens, it's the largest open resource of its kind. In the paper, we share how we built it: ๐Ÿ” finding and cleaning data at scale ๐Ÿงน removing excessive duplicates across sources ๐Ÿค— decontaminating against 66 public benchmarks My favorite part is Figure 6 (in the video!). It's our visual diversity analysis. It shows that FineVision isn't just bigger; it's more balanced and conceptually richer than other open datasets. NVIDIA's Eagle 2 paper highlighted just how critical this visual diversity is, and our results confirm it: models trained on FineVision consistently outperform those trained on any other open dataset on 11 benchmarks! ๐ŸŽ‰ To celebrate the paper, Iโ€™m also releasing a concatenated and shuffled version of the full dataset! ๐Ÿ‘‰`HuggingFaceM4/FineVision_full_shuffled` Itโ€™s ready to stream, so you can start training your own models right away: from datasets import load_dataset d = load_dataset("HuggingFaceM4/FineVision_full_shuffled", split="train", streaming=True) print(next(iter(d))) A big shoutout to the first authors: Luis Wiedmann and Orr Zohar. They are rockstars!
View all activity

Organizations

Nanjing University of Science and Technology's profile picture