Pinkstack commited on
Commit
2b47c1b
·
verified ·
1 Parent(s): c7dcc7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -50,7 +50,7 @@ Advanced, high-quality and **lite** reasoning for a tiny size that you can run o
50
 
51
  At original quality, it runs at ~400 tokens/second on a single H100 Nvidia GPU from Friendli.
52
 
53
- Trained similarly to Deepseek R1, we used Smollm2 as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset which includes a mix of code, website generation, day-to-day chats, math and counting problems. And then we modified the tokenizer slightly, after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities.
54
 
55
  <div style="background-color: #ffebee; padding: 16px; border-radius: 4px; border-left: 4px solid #ef5350;">
56
  <h1 style="color: #c62828; margin: 0 0 8px 0;">⚠️ WARNING</h1>
 
50
 
51
  At original quality, it runs at ~400 tokens/second on a single H100 Nvidia GPU from Friendli.
52
 
53
+ Trained similarly to Deepseek R1, we used Smollm2 as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset which includes a mix of code, website generation, day-to-day chats, math and counting problems. And then we modified the tokenizer slightly, after the SFT fine tuning we used Grpo (arXiv:2402.03300) to further amplify it's mathematics & problem solving abilities.
54
 
55
  <div style="background-color: #ffebee; padding: 16px; border-radius: 4px; border-left: 4px solid #ef5350;">
56
  <h1 style="color: #c62828; margin: 0 0 8px 0;">⚠️ WARNING</h1>