Nihal Nayak's picture

12 8

Nihal Nayak

nihalnayak

·

https://nihalnayak.github.io/

nihalnayak

AI & ML interests

None yet

Recent Activity

published a dataset 18 days ago

nihalnayak/dataset_selection

updated a dataset 18 days ago

nihalnayak/dataset_selection

updated a dataset 19 days ago

nihalnayak/tulu-v2-unfiltered-folds

View all activity

Organizations

published a dataset 18 days ago

nihalnayak/dataset_selection

Viewer • Updated 18 days ago • 17.2k • 163

updated a dataset 18 days ago

nihalnayak/dataset_selection

Viewer • Updated 18 days ago • 17.2k • 163

updated a dataset 19 days ago

nihalnayak/tulu-v2-unfiltered-folds

Viewer • Updated 19 days ago • 1M • 142

published a dataset 19 days ago

nihalnayak/tulu-v2-unfiltered-folds

Viewer • Updated 19 days ago • 1M • 142

New activity in nvidia/NV-Embed-v2 24 days ago

Update modeling_nvembed.py

#44 opened 24 days ago by

updated a collection 10 months ago

Bonito

Models and datasets from the Bonito paper (https://arxiv.org/abs/2402.18334) • 8 items • Updated Oct 1, 2024 • 1

New activity in BatsResearch/Llama-3.1-8B-bonito-v1 11 months ago

Use in Ollama/lm studio etc

#1 opened 11 months ago by

liked a model 12 months ago

BatsResearch/Llama-3.1-8B-bonito-v1

Text Generation • 8B • Updated Aug 13, 2024 • 160 • 6

updated a model 12 months ago

BatsResearch/Llama-3.1-8B-bonito-v1

Text Generation • 8B • Updated Aug 13, 2024 • 160 • 6

updated a collection about 1 year ago

Bonito

Models and datasets from the Bonito paper (https://arxiv.org/abs/2402.18334) • 8 items • Updated Oct 1, 2024 • 1

liked a dataset about 1 year ago

ludwigschmidt/squadshifts

Updated Jan 18, 2024 • 2.49k • 4

liked a Space about 1 year ago

Bonito

Generate task-specific instructions and responses from text

replied to davanstrien's post about 1 year ago

I just created a Spaces demo for Bonito!

Link: https://huggingface.co/spaces/nihalnayak/bonito

Feel free to clone/copy the code and add it to your collection.

reacted to davanstrien's post with ❤️ about 1 year ago

Post

2319

Several methods/models have recently been shared to generate synthetic data from minimal or no initial seeds, essentially creating data directly from raw text.

IMO, these approaches that rely on smaller models for synthetic data generation are quite valuable for scaling up synthetic data and democratizing access to creating domain-specific synthetic datasets.

I've compiled a collection of Gradio demos showcasing some of these methods here: davanstrien/synthetic-data-generation-demos-667573f248b97360ff3668a5

5 replies

·

replied to davanstrien's post about 1 year ago

@davanstrien Thanks for the wonderful demos! Just wanted to highlight that we recently released Bonito, an open-source model that converts user's raw text into instruction tuning dataset. It would be awesome if you could add our model to the collection! Happy to help :)

Model: https://huggingface.co/BatsResearch/bonito-v1
Paper: https://arxiv.org/abs/2402.18334
GitHub: https://github.com/BatsResearch/bonito

updated 2 datasets about 1 year ago

BatsResearch/bonito-experiment

Viewer • Updated Jun 11, 2024 • 4.11M • 342 • 10

BatsResearch/ctga-v1

Viewer • Updated Jun 11, 2024 • 1.65M • 70 • 18

updated a model about 1 year ago

BatsResearch/bonito-v1

Text Generation • Updated Jun 11, 2024 • 80 • 96

updated a collection over 1 year ago

Bonito

Models and datasets from the Bonito paper (https://arxiv.org/abs/2402.18334) • 8 items • Updated Oct 1, 2024 • 1