unidisc / docs /DATA.md
aswerdlow's picture
Add huggingface checkpoints
7a11d07

See unidisc/datasets/preprocessing for instructions on how to preprocess datasets.

We support the following datasets:

  • Cambrian
  • CapsFusion
  • CC12M
  • DataComp1B
  • JourneyDB
  • LAION400M
  • MMC4
  • PixelProse

Additionally, we generated our own synthetic dataset available here and provide the generation scripts as well as the raw data.