dhruvabansal commited on
Commit
cfc5783
·
verified ·
1 Parent(s): 6306fb5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -84
README.md CHANGED
@@ -1,84 +1,84 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
4
- <div style="width: auto; margin-left: auto; margin-right: auto; background-color:black">
5
- <img src="https://assets-global.website-files.com/6423879a8f63c1bb18d74bfa/648818d56d04c3bdf36d71ab_Refuel_rev8-01_ts-p-1600.png" alt="Refuel.ai" style="width: 100%; min-width: 400px; display: block; margin: auto;">
6
- </div>
7
-
8
- ## Model Details
9
-
10
- We’re thrilled to introduce RefuelLLM-2 and RefuelLLM-2-small, the next version of our large language models purpose built for data labeling, enrichment and cleaning.
11
-
12
- 1. RefuelLLM-2 (83.82%) outperforms all state-of-the-art LLMs, including GPT-4-Turbo (80.88%), Claude-3-Opus (79.19%) and Gemini-1.5-Pro (74.59%), across a benchmark of ~30 data labeling tasks.
13
- 2. RefuelLLM-2 is a Mixtral-8x7B base model, trained on a corpus of 2750+ datasets, spanning tasks such as classification, reading comprehension, structured attribute extraction and entity resolution.
14
- 3. RefuelLLM-2-small (79.67%), aka Llama-3-Refueled, outperforms all comparable LLMs including Claude3-Sonnet (70.99%), Haiku (69.23%) and GPT-3.5-Turbo (68.13%). The model was trained with the same recipe as RefuelLLM-2, but on top of Llama3-8B base.
15
-
16
- As a part of this announcement, we are open-sourcing RefuelLLM-2-small for the community to build on top of.
17
-
18
- **Model developers** Refuel AI
19
-
20
- **Input** Models input text only.
21
-
22
- **Output** Models generate text only.
23
-
24
- **Model Architecture** RefuelLLM-2-small is built on top of Llama-3-8B-instruct which is an auto-regressive language model that uses an optimized transformer architecture.
25
-
26
- **Model Release Date** May 7, 2024.
27
-
28
- ## How to use
29
-
30
- This repository contains weights for RefuelLLM-2-small that are compatible for use with HuggingFace.
31
-
32
- ### Use with transformers
33
-
34
- See the snippet below for usage with Transformers:
35
-
36
- ```python
37
- >>> import torch
38
- >>> from transformers import AutoModelForCausalLM, AutoTokenizer
39
-
40
- >>> model_id = "refuelai/Llama-3-Refueled"
41
- >>> tokenizer = AutoTokenizer.from_pretrained(model_id)
42
- >>> model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
43
-
44
- >>> messages = [{"role": "user", "content": "Is this comment toxic or non-toxic: RefuelLLM is the new way to label text data!"}]
45
-
46
- >>> inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
47
-
48
- >>> outputs = model.generate(inputs, max_new_tokens=20)
49
- >>> print(tokenizer.decode(outputs[0]))
50
- ```
51
-
52
- ## Training Data
53
-
54
- RefuelLLM-2 and RefuelLLM-2-small were both trained on over 4 Billion tokens, spanning 2750+ NLP tasks. Our training collection consists majorly of:
55
- 1. Human annotated datasets like Flan, Task Source, and the Aya collection
56
- 2. Synthetic datasets like OpenOrca, OpenHermes and WizardLM
57
- 3. Proprietary datasets developed or licensed by Refuel
58
-
59
- ## Benchmarks
60
-
61
- In this section, we report the results for Refuel models on our benchmark of labeling tasks. For details on the methodology see [here](https://refuel.ai/blog-posts/announcing-refuel-llm-2).
62
-
63
- <table>
64
- <tr></tr>
65
- <tr><th>Provider</th><th>Model</th><th colspan="4" style="text-align: center">LLM Output Quality (by task type)</tr>
66
- <tr><td></td><td></td><td>Overall</td><td>Classification</td><td>Reading Comprehension</td><td>Structure Extraction</td><td>Entity Matching</td><td></td></tr>
67
- <tr><td>Refuel</td><td>RefuelLLM-2</td><td>83.82%</td><td>84.94%</td><td>76.03%</td><td>88.16%</td><td>92.00%</td><td></td></tr>
68
- <tr><td>OpenAI</td><td>GPT-4-Turbo</td><td>80.88%</td><td>81.77%</td><td>72.08%</td><td>84.79%</td><td>97.20%</td><td></td></tr>
69
- <tr><td>Refuel</td><td>RefuelLLM-2-small</td><td>79.67%</td><td>81.72%</td><td>70.04%</td><td>84.28%</td><td>92.00%</td><td></td></tr>
70
- <tr><td>Anthropic</td><td>Claude-3-Opus</td><td>79.19%</td><td>82.49%</td><td>67.30%</td><td>88.25%</td><td>94.96%</td><td></td></tr>
71
- <tr><td>Meta</td><td>Llama3-70B-Instruct</td><td>78.20%</td><td>79.38%</td><td>66.03%</td><td>85.96%</td><td>94.13%</td><td></td></tr>
72
- <tr><td>Google</td><td>Gemini-1.5-Pro</td><td>74.59%</td><td>73.52%</td><td>60.67%</td><td>84.27%</td><td>98.48%</td><td></td></tr>
73
- <tr><td>Mistral</td><td>Mixtral-8x7B-Instruct</td><td>62.87%</td><td>79.11%</td><td>45.56%</td><td>47.08%</td><td>86.52%</td><td></td></tr>
74
- <tr><td>Anthropic</td><td>Claude-3-Sonnet</td><td>70.99%</td><td>79.91%</td><td>45.44%</td><td>78.10%</td><td>96.34%</td><td></td></tr>
75
- <tr><td>Anthropic</td><td>Claude-3-Haiku</td><td>69.23%</td><td>77.27%</td><td>50.19%</td><td>84.97%</td><td>54.08%</td><td></td></tr>
76
- <tr><td>OpenAI</td><td>GPT-3.5-Turbo</td><td>68.13%</td><td>74.39%</td><td>53.21%</td><td>69.40%</td><td>80.41%</td><td></td></tr>
77
- <tr><td>Meta</td><td>Llama3-8B-Instruct</td><td>62.30%</td><td>68.52%</td><td>49.16%</td><td>65.09%</td><td>63.61%</td><td></td></tr>
78
- </table>
79
-
80
-
81
- ## Limitations
82
-
83
- The RefuelLLM-v2-small does not have any moderation mechanisms. We're looking forward to engaging with the community
84
- on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+ <div style="width: auto; margin-left: auto; margin-right: auto; background-color:black">
5
+ <img src="https://assets-global.website-files.com/6423879a8f63c1bb18d74bfa/648818d56d04c3bdf36d71ab_Refuel_rev8-01_ts-p-1600.png" alt="Refuel.ai" style="width: 100%; min-width: 400px; display: block; margin: auto;">
6
+ </div>
7
+
8
+ ## Model Details
9
+
10
+ We’re thrilled to introduce RefuelLLM-2 and RefuelLLM-2-small, the next version of our large language models purpose built for data labeling, enrichment and cleaning.
11
+
12
+ 1. RefuelLLM-2 (83.82%) outperforms all state-of-the-art LLMs, including GPT-4-Turbo (80.88%), Claude-3-Opus (79.19%) and Gemini-1.5-Pro (74.59%), across a benchmark of ~30 data labeling tasks.
13
+ 2. RefuelLLM-2 is a Mixtral-8x7B base model, trained on a corpus of 2750+ datasets, spanning tasks such as classification, reading comprehension, structured attribute extraction and entity resolution.
14
+ 3. RefuelLLM-2-small (79.67%), aka Llama-3-Refueled, outperforms all comparable LLMs including Claude3-Sonnet (70.99%), Haiku (69.23%) and GPT-3.5-Turbo (68.13%). The model was trained with the same recipe as RefuelLLM-2, but on top of Llama3-8B base.
15
+
16
+ As a part of this announcement, we are open-sourcing RefuelLLM-2-small for the community to build on top of.
17
+
18
+ **Model developers** Refuel AI
19
+
20
+ **Input** Models input text only.
21
+
22
+ **Output** Models generate text only.
23
+
24
+ **Model Architecture** RefuelLLM-2-small is built on top of Llama-3-8B-instruct which is an auto-regressive language model that uses an optimized transformer architecture.
25
+
26
+ **Model Release Date** May 8, 2024.
27
+
28
+ ## How to use
29
+
30
+ This repository contains weights for RefuelLLM-2-small that are compatible for use with HuggingFace.
31
+
32
+ ### Use with transformers
33
+
34
+ See the snippet below for usage with Transformers:
35
+
36
+ ```python
37
+ >>> import torch
38
+ >>> from transformers import AutoModelForCausalLM, AutoTokenizer
39
+
40
+ >>> model_id = "refuelai/Llama-3-Refueled"
41
+ >>> tokenizer = AutoTokenizer.from_pretrained(model_id)
42
+ >>> model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
43
+
44
+ >>> messages = [{"role": "user", "content": "Is this comment toxic or non-toxic: RefuelLLM is the new way to label text data!"}]
45
+
46
+ >>> inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
47
+
48
+ >>> outputs = model.generate(inputs, max_new_tokens=20)
49
+ >>> print(tokenizer.decode(outputs[0]))
50
+ ```
51
+
52
+ ## Training Data
53
+
54
+ RefuelLLM-2 and RefuelLLM-2-small were both trained on over 4 Billion tokens, spanning 2750+ NLP tasks. Our training collection consists majorly of:
55
+ 1. Human annotated datasets like Flan, Task Source, and the Aya collection
56
+ 2. Synthetic datasets like OpenOrca, OpenHermes and WizardLM
57
+ 3. Proprietary datasets developed or licensed by Refuel
58
+
59
+ ## Benchmarks
60
+
61
+ In this section, we report the results for Refuel models on our benchmark of labeling tasks. For details on the methodology see [here](https://refuel.ai/blog-posts/announcing-refuel-llm-2).
62
+
63
+ <table>
64
+ <tr></tr>
65
+ <tr><th>Provider</th><th>Model</th><th colspan="4" style="text-align: center">LLM Output Quality (by task type)</tr>
66
+ <tr><td></td><td></td><td>Overall</td><td>Classification</td><td>Reading Comprehension</td><td>Structure Extraction</td><td>Entity Matching</td><td></td></tr>
67
+ <tr><td>Refuel</td><td>RefuelLLM-2</td><td>83.82%</td><td>84.94%</td><td>76.03%</td><td>88.16%</td><td>92.00%</td><td></td></tr>
68
+ <tr><td>OpenAI</td><td>GPT-4-Turbo</td><td>80.88%</td><td>81.77%</td><td>72.08%</td><td>84.79%</td><td>97.20%</td><td></td></tr>
69
+ <tr><td>Refuel</td><td>RefuelLLM-2-small</td><td>79.67%</td><td>81.72%</td><td>70.04%</td><td>84.28%</td><td>92.00%</td><td></td></tr>
70
+ <tr><td>Anthropic</td><td>Claude-3-Opus</td><td>79.19%</td><td>82.49%</td><td>67.30%</td><td>88.25%</td><td>94.96%</td><td></td></tr>
71
+ <tr><td>Meta</td><td>Llama3-70B-Instruct</td><td>78.20%</td><td>79.38%</td><td>66.03%</td><td>85.96%</td><td>94.13%</td><td></td></tr>
72
+ <tr><td>Google</td><td>Gemini-1.5-Pro</td><td>74.59%</td><td>73.52%</td><td>60.67%</td><td>84.27%</td><td>98.48%</td><td></td></tr>
73
+ <tr><td>Mistral</td><td>Mixtral-8x7B-Instruct</td><td>62.87%</td><td>79.11%</td><td>45.56%</td><td>47.08%</td><td>86.52%</td><td></td></tr>
74
+ <tr><td>Anthropic</td><td>Claude-3-Sonnet</td><td>70.99%</td><td>79.91%</td><td>45.44%</td><td>78.10%</td><td>96.34%</td><td></td></tr>
75
+ <tr><td>Anthropic</td><td>Claude-3-Haiku</td><td>69.23%</td><td>77.27%</td><td>50.19%</td><td>84.97%</td><td>54.08%</td><td></td></tr>
76
+ <tr><td>OpenAI</td><td>GPT-3.5-Turbo</td><td>68.13%</td><td>74.39%</td><td>53.21%</td><td>69.40%</td><td>80.41%</td><td></td></tr>
77
+ <tr><td>Meta</td><td>Llama3-8B-Instruct</td><td>62.30%</td><td>68.52%</td><td>49.16%</td><td>65.09%</td><td>63.61%</td><td></td></tr>
78
+ </table>
79
+
80
+
81
+ ## Limitations
82
+
83
+ The RefuelLLM-v2-small does not have any moderation mechanisms. We're looking forward to engaging with the community
84
+ on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.