TheStageAI
/

Elastic-Mistral-Small-3.1-24B-Instruct-2503

Text Generation

text2text-generation

Model card Files Files and versions Community

psynote123 commited on Jun 2

Commit

fee6219

verified ·

1 Parent(s): 594ce38

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +23 -22

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ __Goals of elastic models:__
 > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
 -----
 ## Inference
@@ -149,7 +149,7 @@ Benchmarking is one of the most important procedures during model acceleration.
 | Metric/Model  | S | M | L | XL | Original | W8A8, int8 |
 |---------------|---|---|---|----|----------|------------|
 | arc_challenge | 65.30 | 66.30 | 66.70 | 66.80 | 66.80 | 65.30 | - |
-| gsm8k | 87.70 | 87.80 | 88.00 | - | - | 87.70 | - |
 | mmlu | 79.00 | 79.40 | 79.70 | 80.20 | 80.20 | 79.00 | - |
 | piqa | 82.90 | 83.10 | 82.60 | 83.00 | 83.00 | 82.90 | - |
 | winogrande | 78.20 | 79.40 | 79.30 | 79.50 | 79.50 | 78.20 | - |
@@ -160,6 +160,7 @@ Benchmarking is one of the most important procedures during model acceleration.
 * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
 * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
 * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
 ### Latency benchmarks
@@ -167,8 +168,8 @@ __100 input/300 output; tok/s:__
 | GPU/Model | S   | M | L | XL | Original | W8A8, int8 |
 |-----------|-----|---|---|----|----------|------------|
-| H100 | 90 | -1 | -1 | -1 | -1 | -1 | - |
-| L40S | -1 | -1 | -1 | -1 | -1 | -1 | - |
 ### Performance by Context Size
@@ -181,25 +182,25 @@ The tables below show performance (tokens per second) for different input contex
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 93 | 90.3 | - | - | - | - |
-| Medium | 1024 | 89.6 | - | - | - | - |
-| Large | 4096 | 87.5 | - | - | - | - |
 *Batch Size 8:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 93 | 87.3 | - | - | - | - |
-| Medium | 1024 | 79.9 | - | - | - | - |
-| Large | 4096 | 63.2 | - | - | - | - |
 *Batch Size 16:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 93 | 85.8 | - | - | - | - |
-| Medium | 1024 | 79.0 | - | - | - | - |
-| Large | 4096 | 62.2 | - | - | - | - |
 **L40S:**
@@ -208,25 +209,25 @@ The tables below show performance (tokens per second) for different input contex
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 93 | - | - | - | - | - |
-| Medium | 1024 | - | - | - | - | - |
-| Large | 4096 | - | - | - | - | - |
 *Batch Size 8:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 93 | - | - | - | - | - |
-| Medium | 1024 | - | - | - | - | - |
-| Large | 4096 | - | - | - | - | - |
 *Batch Size 16:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 93 | - | - | - | - | - |
-| Medium | 1024 | - | - | - | - | - |
-| Large | 4096 | - | - | - | - | - |

 > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
+![Performance Graph](images/performance_graph.png)
 -----
 ## Inference
 | Metric/Model  | S | M | L | XL | Original | W8A8, int8 |
 |---------------|---|---|---|----|----------|------------|
 | arc_challenge | 65.30 | 66.30 | 66.70 | 66.80 | 66.80 | 65.30 | - |
+| gsm8k | 87.70 | 88.40 | 87.70 | - | - | 87.70 | - |
 | mmlu | 79.00 | 79.40 | 79.70 | 80.20 | 80.20 | 79.00 | - |
 | piqa | 82.90 | 83.10 | 82.60 | 83.00 | 83.00 | 82.90 | - |
 | winogrande | 78.20 | 79.40 | 79.30 | 79.50 | 79.50 | 78.20 | - |
 * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
 * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
 * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
+* **GSM8K**: GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems.
 ### Latency benchmarks
 | GPU/Model | S   | M | L | XL | Original | W8A8, int8 |
 |-----------|-----|---|---|----|----------|------------|
+| H100 | 90 | 82 | 72 | 54 | 41 | 95 | - |
+| L40S | 25 | 24 | 20 | -1 | -1 | 27 | - |
 ### Performance by Context Size
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 90.3 | 82.5 | 72.2 | 54.4 | 41.2 | - |
+| Medium | 1024 | 90.1 | 82.2 | 71.8 | - | 38.8 | - |
+| Large | 4096 | 88.2 | 81.0 | 70.4 | - | 33.8 | - |
 *Batch Size 8:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 86.5 | 79.9 | 69.1 | - | 36.7 | - |
+| Medium | 1024 | 80.3 | 74.9 | 65.1 | - | 29.0 | - |
+| Large | 4096 | 63.3 | 59.5 | 53.1 | - | 15.5 | - |
 *Batch Size 16:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 84.7 | 78.1 | 68.0 | - | 32.2 | - |
+| Medium | 1024 | 79.8 | 73.3 | 64.1 | - | 21.8 | - |
+| Large | 4096 | 62.5 | 58.1 | 52.7 | - | 9.7 | - |
 **L40S:**
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 26.0 | 24.0 | 21.0 | - | - | - |
+| Medium | 1024 | 25.8 | 23.8 | 20.9 | - | - | - |
+| Large | 4096 | 25.2 | 23.3 | 20.5 | - | - | - |
 *Batch Size 8:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 25.3 | 23.4 | 20.5 | - | - | - |
+| Medium | 1024 | 24.3 | 22.4 | 19.7 | - | - | - |
+| Large | 4096 | - | - | - | - | - | - |
 *Batch Size 16:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 24.9 | 22.9 | 20.2 | - | - | - |
+| Medium | 1024 | 22.8 | 21.1 | - | - | - | - |
+| Large | 4096 | - | - | - | - | - | - |