Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ __Goals of elastic models:__
|
|
43 |
|
44 |
> It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
|
45 |
|
46 |
-
|
47 |
-----
|
48 |
|
49 |
## Inference
|
@@ -149,7 +149,7 @@ Benchmarking is one of the most important procedures during model acceleration.
|
|
149 |
| Metric/Model | S | M | L | XL | Original | W8A8, int8 |
|
150 |
|---------------|---|---|---|----|----------|------------|
|
151 |
| arc_challenge | 65.30 | 66.30 | 66.70 | 66.80 | 66.80 | 65.30 | - |
|
152 |
-
| gsm8k | 87.70 |
|
153 |
| mmlu | 79.00 | 79.40 | 79.70 | 80.20 | 80.20 | 79.00 | - |
|
154 |
| piqa | 82.90 | 83.10 | 82.60 | 83.00 | 83.00 | 82.90 | - |
|
155 |
| winogrande | 78.20 | 79.40 | 79.30 | 79.50 | 79.50 | 78.20 | - |
|
@@ -160,6 +160,7 @@ Benchmarking is one of the most important procedures during model acceleration.
|
|
160 |
* **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
|
161 |
* **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
|
162 |
* **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
|
|
|
163 |
|
164 |
### Latency benchmarks
|
165 |
|
@@ -167,8 +168,8 @@ __100 input/300 output; tok/s:__
|
|
167 |
|
168 |
| GPU/Model | S | M | L | XL | Original | W8A8, int8 |
|
169 |
|-----------|-----|---|---|----|----------|------------|
|
170 |
-
| H100 | 90 |
|
171 |
-
| L40S |
|
172 |
|
173 |
|
174 |
### Performance by Context Size
|
@@ -181,25 +182,25 @@ The tables below show performance (tokens per second) for different input contex
|
|
181 |
|
182 |
| Context | Input Tokens | S | M | L | XL | Original |
|
183 |
|---------|-------------|---|---|---|----|---------|
|
184 |
-
| Small |
|
185 |
-
| Medium | 1024 |
|
186 |
-
| Large | 4096 |
|
187 |
|
188 |
*Batch Size 8:*
|
189 |
|
190 |
| Context | Input Tokens | S | M | L | XL | Original |
|
191 |
|---------|-------------|---|---|---|----|---------|
|
192 |
-
| Small |
|
193 |
-
| Medium | 1024 |
|
194 |
-
| Large | 4096 | 63.
|
195 |
|
196 |
*Batch Size 16:*
|
197 |
|
198 |
| Context | Input Tokens | S | M | L | XL | Original |
|
199 |
|---------|-------------|---|---|---|----|---------|
|
200 |
-
| Small |
|
201 |
-
| Medium | 1024 | 79.
|
202 |
-
| Large | 4096 | 62.
|
203 |
|
204 |
|
205 |
**L40S:**
|
@@ -208,25 +209,25 @@ The tables below show performance (tokens per second) for different input contex
|
|
208 |
|
209 |
| Context | Input Tokens | S | M | L | XL | Original |
|
210 |
|---------|-------------|---|---|---|----|---------|
|
211 |
-
| Small |
|
212 |
-
| Medium | 1024 |
|
213 |
-
| Large | 4096 |
|
214 |
|
215 |
*Batch Size 8:*
|
216 |
|
217 |
| Context | Input Tokens | S | M | L | XL | Original |
|
218 |
|---------|-------------|---|---|---|----|---------|
|
219 |
-
| Small |
|
220 |
-
| Medium | 1024 |
|
221 |
-
| Large | 4096 | - | - | - | - | - |
|
222 |
|
223 |
*Batch Size 16:*
|
224 |
|
225 |
| Context | Input Tokens | S | M | L | XL | Original |
|
226 |
|---------|-------------|---|---|---|----|---------|
|
227 |
-
| Small |
|
228 |
-
| Medium | 1024 |
|
229 |
-
| Large | 4096 | - | - | - | - | - |
|
230 |
|
231 |
|
232 |
|
|
|
43 |
|
44 |
> It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
|
45 |
|
46 |
+

|
47 |
-----
|
48 |
|
49 |
## Inference
|
|
|
149 |
| Metric/Model | S | M | L | XL | Original | W8A8, int8 |
|
150 |
|---------------|---|---|---|----|----------|------------|
|
151 |
| arc_challenge | 65.30 | 66.30 | 66.70 | 66.80 | 66.80 | 65.30 | - |
|
152 |
+
| gsm8k | 87.70 | 88.40 | 87.70 | - | - | 87.70 | - |
|
153 |
| mmlu | 79.00 | 79.40 | 79.70 | 80.20 | 80.20 | 79.00 | - |
|
154 |
| piqa | 82.90 | 83.10 | 82.60 | 83.00 | 83.00 | 82.90 | - |
|
155 |
| winogrande | 78.20 | 79.40 | 79.30 | 79.50 | 79.50 | 78.20 | - |
|
|
|
160 |
* **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
|
161 |
* **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
|
162 |
* **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
|
163 |
+
* **GSM8K**: GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems.
|
164 |
|
165 |
### Latency benchmarks
|
166 |
|
|
|
168 |
|
169 |
| GPU/Model | S | M | L | XL | Original | W8A8, int8 |
|
170 |
|-----------|-----|---|---|----|----------|------------|
|
171 |
+
| H100 | 90 | 82 | 72 | 54 | 41 | 95 | - |
|
172 |
+
| L40S | 25 | 24 | 20 | -1 | -1 | 27 | - |
|
173 |
|
174 |
|
175 |
### Performance by Context Size
|
|
|
182 |
|
183 |
| Context | Input Tokens | S | M | L | XL | Original |
|
184 |
|---------|-------------|---|---|---|----|---------|
|
185 |
+
| Small | 256 | 90.3 | 82.5 | 72.2 | 54.4 | 41.2 | - |
|
186 |
+
| Medium | 1024 | 90.1 | 82.2 | 71.8 | - | 38.8 | - |
|
187 |
+
| Large | 4096 | 88.2 | 81.0 | 70.4 | - | 33.8 | - |
|
188 |
|
189 |
*Batch Size 8:*
|
190 |
|
191 |
| Context | Input Tokens | S | M | L | XL | Original |
|
192 |
|---------|-------------|---|---|---|----|---------|
|
193 |
+
| Small | 256 | 86.5 | 79.9 | 69.1 | - | 36.7 | - |
|
194 |
+
| Medium | 1024 | 80.3 | 74.9 | 65.1 | - | 29.0 | - |
|
195 |
+
| Large | 4096 | 63.3 | 59.5 | 53.1 | - | 15.5 | - |
|
196 |
|
197 |
*Batch Size 16:*
|
198 |
|
199 |
| Context | Input Tokens | S | M | L | XL | Original |
|
200 |
|---------|-------------|---|---|---|----|---------|
|
201 |
+
| Small | 256 | 84.7 | 78.1 | 68.0 | - | 32.2 | - |
|
202 |
+
| Medium | 1024 | 79.8 | 73.3 | 64.1 | - | 21.8 | - |
|
203 |
+
| Large | 4096 | 62.5 | 58.1 | 52.7 | - | 9.7 | - |
|
204 |
|
205 |
|
206 |
**L40S:**
|
|
|
209 |
|
210 |
| Context | Input Tokens | S | M | L | XL | Original |
|
211 |
|---------|-------------|---|---|---|----|---------|
|
212 |
+
| Small | 256 | 26.0 | 24.0 | 21.0 | - | - | - |
|
213 |
+
| Medium | 1024 | 25.8 | 23.8 | 20.9 | - | - | - |
|
214 |
+
| Large | 4096 | 25.2 | 23.3 | 20.5 | - | - | - |
|
215 |
|
216 |
*Batch Size 8:*
|
217 |
|
218 |
| Context | Input Tokens | S | M | L | XL | Original |
|
219 |
|---------|-------------|---|---|---|----|---------|
|
220 |
+
| Small | 256 | 25.3 | 23.4 | 20.5 | - | - | - |
|
221 |
+
| Medium | 1024 | 24.3 | 22.4 | 19.7 | - | - | - |
|
222 |
+
| Large | 4096 | - | - | - | - | - | - |
|
223 |
|
224 |
*Batch Size 16:*
|
225 |
|
226 |
| Context | Input Tokens | S | M | L | XL | Original |
|
227 |
|---------|-------------|---|---|---|----|---------|
|
228 |
+
| Small | 256 | 24.9 | 22.9 | 20.2 | - | - | - |
|
229 |
+
| Medium | 1024 | 22.8 | 21.1 | - | - | - | - |
|
230 |
+
| Large | 4096 | - | - | - | - | - | - |
|
231 |
|
232 |
|
233 |
|