snorkelai
/

instruction-response-quality

weak supervision

Model card Files Files and versions

Christopher Glaze commited on Jul 20, 2023

Commit

cbaea3e

·

1 Parent(s): 4bd3fde

Add blogpost link

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -17,6 +17,10 @@ tags:
 - data filtering
 ---
 # Summary
 Instruction tuning has emerged as an important step in developing performant large language models (LLMs) for generative AI tasks. While industry-backed LLMs such as ChatGPT, Bard, Claude, and even the open-source Llama 2 have relied on massive, expensive proprietary datasets unavailable to the public, the open source community has banded together to create similar datasets such as OpenAssistant and Dolly that are available to everyone.  However, high variance in the quality and distribution of responses collected by volunteers has limited the quality of resulting open source models.

 - data filtering
 ---
+## See [our blogpost](https://snorkel.ai/how-we-built-a-better-genai-with-programmatic-data-development/) for a more in-depth discussion of the model.
+<br>
 # Summary
 Instruction tuning has emerged as an important step in developing performant large language models (LLMs) for generative AI tasks. While industry-backed LLMs such as ChatGPT, Bard, Claude, and even the open-source Llama 2 have relied on massive, expensive proprietary datasets unavailable to the public, the open source community has banded together to create similar datasets such as OpenAssistant and Dolly that are available to everyone.  However, high variance in the quality and distribution of responses collected by volunteers has limited the quality of resulting open source models.