psynote123 commited on
Commit
fee6219
·
verified ·
1 Parent(s): 594ce38

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +23 -22
README.md CHANGED
@@ -43,7 +43,7 @@ __Goals of elastic models:__
43
 
44
  > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
45
 
46
-
47
  -----
48
 
49
  ## Inference
@@ -149,7 +149,7 @@ Benchmarking is one of the most important procedures during model acceleration.
149
  | Metric/Model | S | M | L | XL | Original | W8A8, int8 |
150
  |---------------|---|---|---|----|----------|------------|
151
  | arc_challenge | 65.30 | 66.30 | 66.70 | 66.80 | 66.80 | 65.30 | - |
152
- | gsm8k | 87.70 | 87.80 | 88.00 | - | - | 87.70 | - |
153
  | mmlu | 79.00 | 79.40 | 79.70 | 80.20 | 80.20 | 79.00 | - |
154
  | piqa | 82.90 | 83.10 | 82.60 | 83.00 | 83.00 | 82.90 | - |
155
  | winogrande | 78.20 | 79.40 | 79.30 | 79.50 | 79.50 | 78.20 | - |
@@ -160,6 +160,7 @@ Benchmarking is one of the most important procedures during model acceleration.
160
  * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
161
  * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
162
  * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
 
163
 
164
  ### Latency benchmarks
165
 
@@ -167,8 +168,8 @@ __100 input/300 output; tok/s:__
167
 
168
  | GPU/Model | S | M | L | XL | Original | W8A8, int8 |
169
  |-----------|-----|---|---|----|----------|------------|
170
- | H100 | 90 | -1 | -1 | -1 | -1 | -1 | - |
171
- | L40S | -1 | -1 | -1 | -1 | -1 | -1 | - |
172
 
173
 
174
  ### Performance by Context Size
@@ -181,25 +182,25 @@ The tables below show performance (tokens per second) for different input contex
181
 
182
  | Context | Input Tokens | S | M | L | XL | Original |
183
  |---------|-------------|---|---|---|----|---------|
184
- | Small | 93 | 90.3 | - | - | - | - |
185
- | Medium | 1024 | 89.6 | - | - | - | - |
186
- | Large | 4096 | 87.5 | - | - | - | - |
187
 
188
  *Batch Size 8:*
189
 
190
  | Context | Input Tokens | S | M | L | XL | Original |
191
  |---------|-------------|---|---|---|----|---------|
192
- | Small | 93 | 87.3 | - | - | - | - |
193
- | Medium | 1024 | 79.9 | - | - | - | - |
194
- | Large | 4096 | 63.2 | - | - | - | - |
195
 
196
  *Batch Size 16:*
197
 
198
  | Context | Input Tokens | S | M | L | XL | Original |
199
  |---------|-------------|---|---|---|----|---------|
200
- | Small | 93 | 85.8 | - | - | - | - |
201
- | Medium | 1024 | 79.0 | - | - | - | - |
202
- | Large | 4096 | 62.2 | - | - | - | - |
203
 
204
 
205
  **L40S:**
@@ -208,25 +209,25 @@ The tables below show performance (tokens per second) for different input contex
208
 
209
  | Context | Input Tokens | S | M | L | XL | Original |
210
  |---------|-------------|---|---|---|----|---------|
211
- | Small | 93 | - | - | - | - | - |
212
- | Medium | 1024 | - | - | - | - | - |
213
- | Large | 4096 | - | - | - | - | - |
214
 
215
  *Batch Size 8:*
216
 
217
  | Context | Input Tokens | S | M | L | XL | Original |
218
  |---------|-------------|---|---|---|----|---------|
219
- | Small | 93 | - | - | - | - | - |
220
- | Medium | 1024 | - | - | - | - | - |
221
- | Large | 4096 | - | - | - | - | - |
222
 
223
  *Batch Size 16:*
224
 
225
  | Context | Input Tokens | S | M | L | XL | Original |
226
  |---------|-------------|---|---|---|----|---------|
227
- | Small | 93 | - | - | - | - | - |
228
- | Medium | 1024 | - | - | - | - | - |
229
- | Large | 4096 | - | - | - | - | - |
230
 
231
 
232
 
 
43
 
44
  > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
45
 
46
+ ![Performance Graph](images/performance_graph.png)
47
  -----
48
 
49
  ## Inference
 
149
  | Metric/Model | S | M | L | XL | Original | W8A8, int8 |
150
  |---------------|---|---|---|----|----------|------------|
151
  | arc_challenge | 65.30 | 66.30 | 66.70 | 66.80 | 66.80 | 65.30 | - |
152
+ | gsm8k | 87.70 | 88.40 | 87.70 | - | - | 87.70 | - |
153
  | mmlu | 79.00 | 79.40 | 79.70 | 80.20 | 80.20 | 79.00 | - |
154
  | piqa | 82.90 | 83.10 | 82.60 | 83.00 | 83.00 | 82.90 | - |
155
  | winogrande | 78.20 | 79.40 | 79.30 | 79.50 | 79.50 | 78.20 | - |
 
160
  * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
161
  * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
162
  * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
163
+ * **GSM8K**: GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems.
164
 
165
  ### Latency benchmarks
166
 
 
168
 
169
  | GPU/Model | S | M | L | XL | Original | W8A8, int8 |
170
  |-----------|-----|---|---|----|----------|------------|
171
+ | H100 | 90 | 82 | 72 | 54 | 41 | 95 | - |
172
+ | L40S | 25 | 24 | 20 | -1 | -1 | 27 | - |
173
 
174
 
175
  ### Performance by Context Size
 
182
 
183
  | Context | Input Tokens | S | M | L | XL | Original |
184
  |---------|-------------|---|---|---|----|---------|
185
+ | Small | 256 | 90.3 | 82.5 | 72.2 | 54.4 | 41.2 | - |
186
+ | Medium | 1024 | 90.1 | 82.2 | 71.8 | - | 38.8 | - |
187
+ | Large | 4096 | 88.2 | 81.0 | 70.4 | - | 33.8 | - |
188
 
189
  *Batch Size 8:*
190
 
191
  | Context | Input Tokens | S | M | L | XL | Original |
192
  |---------|-------------|---|---|---|----|---------|
193
+ | Small | 256 | 86.5 | 79.9 | 69.1 | - | 36.7 | - |
194
+ | Medium | 1024 | 80.3 | 74.9 | 65.1 | - | 29.0 | - |
195
+ | Large | 4096 | 63.3 | 59.5 | 53.1 | - | 15.5 | - |
196
 
197
  *Batch Size 16:*
198
 
199
  | Context | Input Tokens | S | M | L | XL | Original |
200
  |---------|-------------|---|---|---|----|---------|
201
+ | Small | 256 | 84.7 | 78.1 | 68.0 | - | 32.2 | - |
202
+ | Medium | 1024 | 79.8 | 73.3 | 64.1 | - | 21.8 | - |
203
+ | Large | 4096 | 62.5 | 58.1 | 52.7 | - | 9.7 | - |
204
 
205
 
206
  **L40S:**
 
209
 
210
  | Context | Input Tokens | S | M | L | XL | Original |
211
  |---------|-------------|---|---|---|----|---------|
212
+ | Small | 256 | 26.0 | 24.0 | 21.0 | - | - | - |
213
+ | Medium | 1024 | 25.8 | 23.8 | 20.9 | - | - | - |
214
+ | Large | 4096 | 25.2 | 23.3 | 20.5 | - | - | - |
215
 
216
  *Batch Size 8:*
217
 
218
  | Context | Input Tokens | S | M | L | XL | Original |
219
  |---------|-------------|---|---|---|----|---------|
220
+ | Small | 256 | 25.3 | 23.4 | 20.5 | - | - | - |
221
+ | Medium | 1024 | 24.3 | 22.4 | 19.7 | - | - | - |
222
+ | Large | 4096 | - | - | - | - | - | - |
223
 
224
  *Batch Size 16:*
225
 
226
  | Context | Input Tokens | S | M | L | XL | Original |
227
  |---------|-------------|---|---|---|----|---------|
228
+ | Small | 256 | 24.9 | 22.9 | 20.2 | - | - | - |
229
+ | Medium | 1024 | 22.8 | 21.1 | - | - | - | - |
230
+ | Large | 4096 | - | - | - | - | - | - |
231
 
232
 
233