Introducing AI Sheets: a tool to work with datasets using open AI models!
🧭TL;DR
Hugging Face AI Sheets is a new, open-source tool for building, enriching, and transforming datasets using AI models with no code. The tool can be deployed locally or on the Hub. It lets you use thousands of open models from the Hugging Face Hub via Inference Providers or local models, including gpt-oss
from OpenAI!
Useful links
Try the tool for free (no installation required): https://huggingface.co/spaces/aisheets/sheets
Install and run locally: https://github.com/huggingface/sheets
What is AI Sheets
AI Sheets is a no-code tool for building, transforming, and enriching datasets using (open) AI models. It’s tightly integrated with the Hub and the open-source AI ecosystem.
AI Sheets uses an easy-to-learn user interface, similar to a spreadsheet. The tool is built around quick experimentation, starting with small datasets before running long/costly data generation pipelines.
In AI Sheets, new columns are created by writing prompts, and you can iterate as many times as you need and edit the cells/validate cells to teach the model what you want. But more on this later!
What can I use it for
You can use AI Sheets to:
Compare and vibe test models. Imagine you want to test the latest models on your data. You can import a dataset with prompts/questions, and create several columns (one per model) with a prompt like this: Answer the following: {{prompt}}
, where prompt
is a column in your dataset. You can validate the results manually or create a new column with an LLM as a judge prompt like this: Evaluate the responses to the following question: {{prompt}}. Response 1: {{model1}}. Response 2: {{model2}}
, where model1
and model2
are columns in your dataset with different model responses.
Improve prompts for your data and specific models. Imagine you want to build an application to process customer requests and give automatic answers. You can load a sample dataset with customer requests and start playing and iterating with different prompts and models to generate responses. One cool feature of AI Sheets is that you can provide feedback by editing or validating cells. These example cells will be added to your prompts automatically. You can think of it as a tool to fine-tune prompts and add a few-shot examples to your prompts very efficiently, by looking at your data in real-time!
Transform a dataset. Imagine you want to clean up a column of your dataset. You can add a new column with a prompt like Remove extra punctuation marks from the following text: {{text}}
, where text
is a column in your dataset containing the texts you want to clean up.
Classify a dataset. Imagine you want to classify some content in your dataset. You can add a new column with a prompt like Categorize the following text: {{text}}
, where text
is a column in your dataset containing the texts you want to categorize.
Analyze a dataset. Imagine you want to extract the main ideas in your dataset. You can add a new column with a prompt like this: Extract the most important ideas from the following: {{text}}
, where text
is a column in your dataset containing the texts you want to analyze.
Enrich a dataset. Imagine you have a dataset with addresses that are missing zip codes. You can add a new column with a prompt like this: Find the zip code of the following address: {{address}}
(in this case, you must enable the "Search the web" option to ensure accurate results).
Generate a synthetic dataset. Imagine you need a dataset with realistic emails, but the data is not available for data privacy reasons. You can create a dataset with a prompt like this: Write a short description of a professional in the field of pharma companies
and name the column person_bio
. Then you can create another column with a prompt like this Write a realistic professional email as it was written by the following person: {{person_bio}}
.
Now let’s dive into how to use it!
How to use it
AI Sheets gives you two ways to start: import existing data or generate a dataset from scratch. Once your data is loaded, you can refine it by adding columns, editing cells, and regenerating content.
Getting started
To get started, you need create one from scratch describing it in natural language or import an existing dataset.
Generate Dataset from Scratch
Best for: Familiarizing with AI Sheets, brainstorming, rapid experiments, and creating test datasets.
Think of this as an auto-dataset or prompt-to-dataset feature—you describe what you want, and AI Sheets creates the entire dataset structure and content for you.
When to use this:
- You're exploring AI Sheets for the first time
- You need synthetic data for testing or prototyping
- Data accuracy and diversity are not critical (e.g., brainstorming use cases, quick research, generating test datasets)
- You want to experiment with ideas quickly
How it works:
- Describe the dataset you want in the prompt area
- Example: "A list of fictional startups with name, industry, and slogan"
- AI Sheets generates the schema and creates 5 sample rows
- Extend to up to 1,000 rows or modify the prompt to change structure
Example
If you type this prompt: cities of the world, alongside countries they belong to and a landmark image for each, generated in Ghibli style
:
AI Sheets will automatically generate a dataset with three columns, as shown below:
This dataset contains only five rows, but you can add more cells by dragging down on each column, including the image one! You can also write items in any of the cells and complete the others by dragging.
The following sections will show you how to iterate and expand the dataset.
Import your dataset (recommended)
Best for: Most use cases where you want to transform, classify, enrich, and analyze real-world data.
This is recommended for most use cases, as importing real data gives you more control and flexibility than starting from scratch.
When to use this:
- You have existing data to transform or enrich using AI models
- You want to generate synthetic data, and accuracy and diversity are important
How it works:
- Upload your data in XLS, TSV, CSV, or Parquet format
- Ensure your file includes at least one column name and one row of data
- Upload up to 1,000 rows (unlimited columns)
- Your data appears in a familiar spreadsheet format
Pro tip: If your file contains minimal data, you can manually add more entries by typing directly into the spreadsheet.
Working with your dataset
Once your data is loaded (regardless of how you started), you'll see it in an editable spreadsheet interface. Here's what you need to know:
Understanding AI Sheets
- Imported cells: Manually editable but can't be modified by AI prompts
- AI-generated cells: Can be regenerated and refined using prompts and your feedback (edits + thumbs-up)
- New columns: Always AI-powered and fully customizable
Getting Started with AI columns
- Click the "+" button to add a new column
- Choose from recommended actions:
- Extract specific information
- Summarize long text
- Translate content
- Or write custom prompts with "Do something with {{column}}"
Refining and expanding the dataset
Now that you have AI columns, you can improve their results and expand your data. You can improve results by providing feedback through manual edits and likes or by adjusting the column configuration. Both require regeneration to take effect.
1. How to add more cells
- Drag down: From the last cell in a column to generate additional rows immediately
- No regeneration needed - new cells are created instantly
- You can use this to regenerate errored cells too
2. Manual editing and feedback
- Edit cells: Click any cell to edit content directly - this gives the model examples of your preferred output
- Like results: Use thumbs-up to mark examples of good output
- Regenerate to apply feedback to other cells in the column.
Under the hood, these manually edited and liked cells will be used as few-shot examples for generating the cells when you regenerate or add more cells in the column!
3. Adjust column configuration Change the prompt, switch models or providers, or modify settings, then regenerate to get better results.
Rewrite the prompt
- Each column has its generation prompt
- Edit anytime to change or improve output
- Column regenerates with new results
Switch models/providers
- Try different models for different performance or compare them.
- Some are more accurate, creative, or structured than others for specific tasks.
- Some providers have faster inference and different context lengths; test different providers for the selected model.
Toggle Search
- Enable: Model pulls up-to-date information from the web
- Disable: Offline, model-only generation
Exporting your final dataset to the Hub
Once you're happy with your new dataset, export it to the Hub! This has the additional benefit of generating a config file you can reuse for (1) generating more data with HF jobs using this script, and (2) reusing the prompts for downstream applications, including the few shots from your edited and liked cells.
Here's an example dataset created with AISheets, which produces this config.
Running data generation scripts using HF Jobs
If you want to generate a larger dataset, you can use the above-mentioned config and script, like this:
hf jobs uv run \
-s HF_TOKEN=$HF_TOKEN \
https://huggingface.co/datasets/aisheets/uv-scripts/raw/main/extend_dataset/script.py \ # script for running the pipeline
--config https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions/raw/main/config.yml \ # config with prompts
--num-rows 100 \ # limit to 100 rows, leave empty for the full dataset
nvidia/Nemotron-Personas dvilasuero/nemotron-kimi-qa-distilled
Examples
This section provides examples of datasets you can build with AI Sheets to inspire your next project.
Vibe testing and comparing models
AI Sheets is your perfect companion if you want to test the latest models on different prompts and data you care about.
You just need to import a dataset (or create one from scratch) and then add different columns with the models you want to test.
Then, you can either inspect the results manually or add a column to use LLMs to judge the quality of each model.
Below is an example, comparing open frontier models for mini web apps. AI Sheets lets you see the interactive results and play with each app. Additionally, the dataset includes several columns using LLM to judge and compare the quality of the apps.
Example dataset exported from a session like the one we just described: : https://huggingface.co/datasets/dvilasuero/jsvibes-qwen-gpt-oss-judged
Config:
columns:
gpt-oss:
modelName: openai/gpt-oss-120b
modelProvider: groq
userPrompt: Create a complete, runnable HTML+JS file implementing {{description}}
searchEnabled: false
columnsReferences:
- description
eval-qwen-coder:
modelName: Qwen/Qwen3-Coder-480B-A35B-Instruct
modelProvider: cerebras
userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
searchEnabled: false
columnsReferences:
- gpt-oss
- description
- qwen3-coder
eval-gpt-oss:
modelName: openai/gpt-oss-120b
modelProvider: groq
userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
searchEnabled: false
columnsReferences:
- gpt-oss
- description
- qwen3-coder
eval-kimi:
modelName: moonshotai/Kimi-K2-Instruct
modelProvider: groq
userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
searchEnabled: false
columnsReferences:
- gpt-oss
- description
- qwen3-coder
Add categories to a Hub dataset
AI Sheets can also augment existing datasets and help you with quick data analysis and data science projects that involve analyzing text datasets.
Here's an example of adding categories to an existing Hub dataset.
A cool feature is that you can validate or edit manually the initial categorization outputs and regenerate the full column to improve the results, as seen below:
Config:
columns:
category:
modelName: moonshotai/Kimi-K2-Instruct
modelProvider: groq
userPrompt: |-
Categorize the main topics of the following question:
{{question}}
prompt: "
You are a rigorous, intelligent data-processing engine. Generate only the
requested response format, with no explanations following the user
instruction. You might be provided with positive, accurate examples of how
the user instruction must be completed.
# Examples
The following are correct, accurate example outputs with respect to the
user instruction:
## Example
### Input
question: Given the area of a parallelogram is 420 square centimeters and
its height is 35 cm, find the corresponding base. Show all work and label
your answer.
### Output
Mathematics – Geometry
## Example
### Input
question: What is the minimum number of red squares required to ensure
that each of $n$ green axis-parallel squares intersects 4 red squares,
assuming the green squares can be scaled and translated arbitrarily
without intersecting each other?
### Output
Geometry, Combinatorics
# User instruction
Categorize the main topics of the following question:
{{question}}
# Your response
"
searchEnabled: false
columnsReferences:
- question
Evaluate models with LLMs-as-Judge
Another use case is evaluating the outputs of models using an LLM as a judge approach. This can be useful for comparing models or assessing the quality of an existing dataset, for example, fine-tuning a model on an existing dataset on the Hugging Face Hub.
In the first example, we combined vibe testing with a judge LLM column. Here's the judge prompt:
Example dataset: https://huggingface.co/datasets/dvilasuero/jsvibes-qwen-gpt-oss-judged
Config:
columns:
object_name:
modelName: meta-llama/Llama-3.3-70B-Instruct
modelProvider: groq
userPrompt: Generate the name of a common day to day object
searchEnabled: false
columnsReferences: []
object_description:
modelName: meta-llama/Llama-3.3-70B-Instruct
modelProvider: groq
userPrompt: Describe a {{object_name}} with adjectives and short word groups separated by commas. No more than 10 words
searchEnabled: false
columnsReferences:
- object_name
object_image_with_desc:
modelName: multimodalart/isometric-skeumorphic-3d-bnb
modelProvider: fal-ai
userPrompt: RBNBICN, icon, white background, isometric perspective, {{object_name}} , {{object_description}}
searchEnabled: false
columnsReferences:
- object_description
- object_name
object_image_without_desc:
modelName: multimodalart/isometric-skeumorphic-3d-bnb
modelProvider: fal-ai
userPrompt: "RBNBICN, icon, white background, isometric perspective, {{object_name}} "
searchEnabled: false
columnsReferences:
- object_name
glowing_colors:
modelName: multimodalart/isometric-skeumorphic-3d-bnb
modelProvider: fal-ai
userPrompt: "RBNBICN, icon, white background, isometric perspective, {{object_name}}, glowing colors "
searchEnabled: false
columnsReferences:
- object_name
flux:
modelName: black-forest-labs/FLUX.1-dev
modelProvider: fal-ai
userPrompt: Create an isometric icon for the object {{object_name}} based on {{object_description}}
searchEnabled: false
columnsReferences:
- object_description
- object_name
Next steps
You can try AI Sheets without installing anything or download and deploy it locally from the GitHub repo. For running locally and get the most out of it, we recommend you to subscribe to PRO and get 20x monthly inference usage.
If you have questions or suggestions, let us know in the Community tab or by opening an issue on GitHub.