@hannayukhymenko on Hugging Face: "Releasing the Jupyter Agent Dataset! 🚀 Built from 7 TB of real Kaggle…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

hannayukhymenko

posted an update 6 days ago

Post

2530

Releasing the Jupyter Agent Dataset! 🚀

Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B.
Training on this data dramatically improves the ability to execute code and analyze data.

We ( @baptistecolle @hannayukhymenko @lvwerra ) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agent🔥With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering.

Link: data-agents/jupyter-agent-dataset

donhuvy

4 days ago

Please help me, has error in space

hannayukhymenko

4 days ago

Does the https://huggingface.co/spaces/lvwerra/jupyter-agent-2 work for you?

In this post