TheStageAI
/

Elastic-Mistral-Small-3.1-24B-Instruct-2503

Text Generation

text2text-generation

Model card Files Files and versions Community

psynote123 commited on Jun 2

Commit

f9a40c0

·

verified ·

1 Parent(s): e43f0af

Update README.md

Files changed (1) hide show

README.md +0 -8

README.md CHANGED Viewed

@@ -164,14 +164,6 @@ Benchmarking is one of the most important procedures during model acceleration.
 ### Latency benchmarks
-__100 input/300 output; tok/s:__
-| GPU/Model | S   | M | L | XL | Original | W8A8, int8 |
-|-----------|-----|---|---|----|----------|------------|
-| H100 | 90 | 82 | 72 | 54 | 41 | 95 | - |
-| L40S | 25 | 23 | 20 | -1 | -1 | 27 | - |
 ### Performance by Context Size
 The tables below show performance (tokens per second) for different input context sizes across different GPU models and batch sizes:

 ### Latency benchmarks
 ### Performance by Context Size
 The tables below show performance (tokens per second) for different input context sizes across different GPU models and batch sizes: