timaeus 's Collections

Datasets: Pile Subsets

100k-row datasets filtered from https://huggingface.co/datasets/monology/pile-uncopyrighted. Doesn't include Books3, BookCorpus2, OpenSubtitles, YTSub