Existing speech corpora preprocessed into Hugging Face datasets with a consistent format and column naming scheme.