feature request: download as Parquet?

#1
by julien-c - opened

CSV is ok but Parquet is hub native:)

Hugging Face Sheets org

Should be really easy to add! cc @frascuchon

Hugging Face Sheets org

It's done, @julien-c
Captura de pantalla 2025-06-12 a las 14.10.40.png

nice!!! will use it

julien-c changed discussion status to closed

@kszucs @lhoestq is there something special to check here in AISheets to make sure the exported Parquet will chunk/deduplicate nicely?

Not sure what the app uses for the Parquet export. It's likely in JS and therefore unlikely to use the Parquet writer for Xet from Arrow

Hugging Face Sheets org
โ€ข
edited about 18 hours ago

Exactly, all is TS code.

Current export is straightforward/simplistic.

  1. Data is stored on DuckDB.
  2. We export the file locally
COPY (SELECT ...) TO 'file.parquet' (FORMAT PARQUET)
  1. We push the file to the hub with the hub js library

Here's an example of the result: https://huggingface.co/datasets/dvilasuero/hands_playing_instruments

Hugging Face Sheets org

Thanks! @lhoestq . Does the datasets library make any special preparations before sending Parquet files? Or is enough with using huggingface_hub + hf_xet?

Sign up or log in to comment