Update README.md
Browse files
README.md
CHANGED
@@ -164,14 +164,6 @@ Benchmarking is one of the most important procedures during model acceleration.
|
|
164 |
|
165 |
### Latency benchmarks
|
166 |
|
167 |
-
__100 input/300 output; tok/s:__
|
168 |
-
|
169 |
-
| GPU/Model | S | M | L | XL | Original | W8A8, int8 |
|
170 |
-
|-----------|-----|---|---|----|----------|------------|
|
171 |
-
| H100 | 90 | 82 | 72 | 54 | 41 | 95 | - |
|
172 |
-
| L40S | 25 | 23 | 20 | -1 | -1 | 27 | - |
|
173 |
-
|
174 |
-
|
175 |
### Performance by Context Size
|
176 |
|
177 |
The tables below show performance (tokens per second) for different input context sizes across different GPU models and batch sizes:
|
|
|
164 |
|
165 |
### Latency benchmarks
|
166 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
167 |
### Performance by Context Size
|
168 |
|
169 |
The tables below show performance (tokens per second) for different input context sizes across different GPU models and batch sizes:
|