File size: 458 Bytes
131da64
 
 
 
 
 
 
 
 
 
 
 
 
7a11d07
1
2
3
4
5
6
7
8
9
10
11
12
13
14
See `unidisc/datasets/preprocessing` for instructions on how to preprocess datasets.

We support the following datasets:

- Cambrian
- CapsFusion
- CC12M
- DataComp1B
- JourneyDB
- LAION400M
- MMC4
- PixelProse

Additionally, we generated our own synthetic dataset available [here](https://huggingface.co/datasets/aswerdlow/unidisc_hq) and provide the [generation scripts](../unidisc/datasets/preprocessing/unidisc_dataset/README.md) as well as the raw data.