Joblib
English
llm
human-feedback
weak supervision
data filtering
Christopher Glaze commited on
Commit
cbaea3e
·
1 Parent(s): 4bd3fde

Add blogpost link

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -17,6 +17,10 @@ tags:
17
  - data filtering
18
  ---
19
 
 
 
 
 
20
  # Summary
21
  Instruction tuning has emerged as an important step in developing performant large language models (LLMs) for generative AI tasks. While industry-backed LLMs such as ChatGPT, Bard, Claude, and even the open-source Llama 2 have relied on massive, expensive proprietary datasets unavailable to the public, the open source community has banded together to create similar datasets such as OpenAssistant and Dolly that are available to everyone. However, high variance in the quality and distribution of responses collected by volunteers has limited the quality of resulting open source models.
22
 
 
17
  - data filtering
18
  ---
19
 
20
+ ## See [our blogpost](https://snorkel.ai/how-we-built-a-better-genai-with-programmatic-data-development/) for a more in-depth discussion of the model.
21
+
22
+ <br>
23
+
24
  # Summary
25
  Instruction tuning has emerged as an important step in developing performant large language models (LLMs) for generative AI tasks. While industry-backed LLMs such as ChatGPT, Bard, Claude, and even the open-source Llama 2 have relied on massive, expensive proprietary datasets unavailable to the public, the open source community has banded together to create similar datasets such as OpenAssistant and Dolly that are available to everyone. However, high variance in the quality and distribution of responses collected by volunteers has limited the quality of resulting open source models.
26