55 34 41

Huu Nguyen

huu-ontocord

AI & ML interests

None yet

Recent Activity

updated a dataset about 11 hours ago

ontocord/MixtureVitae-300BT

updated a model 1 day ago

ontocord/seed2

updated a dataset 6 days ago

ontocord/MixtureVitae-Backup

View all activity

Organizations

commented a paper 9 days ago

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Paper • 2505.02881 • Published May 5 • 3 •

commented 5 papers 10 days ago

A Typology for Exploring the Mitigation of Shortcut Behavior

Paper • 2203.03668 • Published Mar 4, 2022 • 1 •

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Paper • 2305.15296 • Published May 24, 2023 • 1 •

Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations

Paper • 2303.09289 • Published Mar 16, 2023 • 2 •

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Paper • 2505.02881 • Published May 5 • 3 •

PL-Guard: Benchmarking Language Model Safety for Polish

Paper • 2506.16322 • Published 28 days ago • 1 •

commented a paper 13 days ago

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published 14 days ago • 9 •

commented a paper 15 days ago

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Paper • 2505.20033 • Published May 26 • 4 •

New activity in hkust-nlp/CodeIO-PyEdu-Reasoning 29 days ago

License info. needed

#3 opened 4 months ago by

shuanga

New activity in openbmb/Ultra-FineWeb 30 days ago

could you release the URLs associated with each item in the dataset.

#14 opened about 1 month ago by

huu-ontocord

New activity in ontocord/MixtureVitae-300BT 30 days ago

Cool dataset

#2 opened 30 days ago by

Fishtiks

New activity in BAAI/CCI4.0-M2-CoT-v1 about 1 month ago

Very good dataset

#3 opened about 1 month ago by

huu-ontocord

New activity in nvidia/Nemotron-MIND about 1 month ago

Document ID -> OWM Urls

#2 opened about 1 month ago by

huu-ontocord

New activity in open-r1/Mixture-of-Thoughts about 1 month ago

Request a license

#1 opened about 2 months ago by

TuzelKO

New activity in openbmb/DCAD-2000 2 months ago

Missing some fineweb-2

#2 opened 2 months ago by

huu-ontocord

commented a paper 5 months ago

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Paper • 2502.19261 • Published Feb 26 • 7 •

New activity in Qwen/Qwen2.5-7B-Instruct-1M 5 months ago

Did the base 1M start from qwen2.5-7b?

#4 opened 5 months ago by

huu-ontocord

New activity in HuggingFaceTB/smoltalk 5 months ago

Update SmolTalk distilabel pipelines link.

#6 opened 5 months ago by

tranhd95

New activity in HuggingFaceFV/finevideo 7 months ago

Cleanup TTS

#16 opened 10 months ago by

huu-ontocord

New activity in lmms-lab/muchomusic 7 months ago

Can you add a description of dataset and license?

#3 opened 7 months ago by

huu-ontocord

Huu Nguyen

AI & ML interests

Recent Activity

Organizations

huu-ontocord's activity

License info. needed

could you release the URLs associated with each item in the dataset.

Cool dataset

Very good dataset

Document ID -> OWM Urls

Request a license

Missing some fineweb-2

Did the base 1M start from qwen2.5-7b?

Update SmolTalk distilabel pipelines link.

Cleanup TTS

Can you add a description of dataset and license?