psynote123 commited on
Commit
fedd277
·
verified ·
1 Parent(s): b8efeeb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -13
README.md CHANGED
@@ -50,6 +50,8 @@ __Goals of elastic models:__
50
 
51
  ## Inference
52
 
 
 
53
  To infer our models, you just need to replace `transformers` import with `elastic_models.transformers`:
54
 
55
  ```python
@@ -115,7 +117,7 @@ print(f"# A:\n{output}\n")
115
  ```
116
 
117
  __System requirements:__
118
- * GPUs: Nvidia GeForce RTX 4090
119
  * CPU: AMD, Intel
120
  * Python: 3.10-3.12
121
 
@@ -128,8 +130,18 @@ pip install elastic_models[nvidia]\
128
  --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
129
  --extra-index-url https://pypi.nvidia.com\
130
  --extra-index-url https://pypi.org/simple
131
-
132
  pip install flash_attn==2.7.3 --no-build-isolation
 
 
 
 
 
 
 
 
 
 
 
133
  pip uninstall apex
134
  ```
135
 
@@ -162,7 +174,6 @@ Benchmarking is one of the most important procedures during model acceleration.
162
  * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
163
  * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
164
  * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
165
- * **GSM8K**: GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems.
166
 
167
 
168
  ### Performance by Context Size
@@ -198,31 +209,31 @@ The tables below show performance (tokens per second) for different input contex
198
  | Large | 4096 | 52.5 | - | - | - | - | - |
199
 
200
 
201
- <!-- **RTX 5090:**
202
 
203
  *Batch Size 1:*
204
 
205
  | Context | Input Tokens | S | M | L | XL | Original |
206
  |---------|-------------|---|---|---|----|---------|
207
- | Small | 256 | - | - | - | - | - | - |
208
- | Medium | 1024 | - | - | - | - | - | - |
209
- | Large | 4096 | - | - | - | - | - | - |
210
 
211
  *Batch Size 2:*
212
 
213
  | Context | Input Tokens | S | M | L | XL | Original |
214
  |---------|-------------|---|---|---|----|---------|
215
- | Small | 256 | - | - | - | - | - | - |
216
- | Medium | 1024 | - | - | - | - | - | - |
217
- | Large | 4096 | - | - | - | - | - | - |
218
 
219
  *Batch Size 4:*
220
 
221
  | Context | Input Tokens | S | M | L | XL | Original |
222
  |---------|-------------|---|---|---|----|---------|
223
- | Small | 256 | - | - | - | - | - | - |
224
- | Medium | 1024 | - | - | - | - | - | - |
225
- | Large | 4096 | - | - | - | - | - | - | -->
226
 
227
 
228
 
 
50
 
51
  ## Inference
52
 
53
+ > Compiled versions are currently available only for batch sizes 1, 2 and 4. Other versions are not yet accessible. Stay tuned for updates!
54
+
55
  To infer our models, you just need to replace `transformers` import with `elastic_models.transformers`:
56
 
57
  ```python
 
117
  ```
118
 
119
  __System requirements:__
120
+ * GPUs: Nvidia GeForce RTX 4090, Nvidia GeForce RTX 5090
121
  * CPU: AMD, Intel
122
  * Python: 3.10-3.12
123
 
 
130
  --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
131
  --extra-index-url https://pypi.nvidia.com\
132
  --extra-index-url https://pypi.org/simple
 
133
  pip install flash_attn==2.7.3 --no-build-isolation
134
+
135
+ # or for blackwell support
136
+ pip install elastic_models[blackwell]\
137
+ --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
138
+ --extra-index-url https://pypi.nvidia.com\
139
+ --extra-index-url https://pypi.org/simple
140
+ pip install torch==2.7.0+cu128 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
141
+ # please download the appropriate version of Wheels for your system from https://github.com/Zarrac/flashattention-blackwell-wheels-whl-ONLY-5090-5080-5070-5060-flash-attention-/releases/tag/FlashAttention
142
+ mv flash_attn-2.7.4.post1-rtx5090-torch2.7.0cu128cxx11abiTRUE-cp311-linux_x86_64.whl flash_attn-2.7.4.post1-0rtx5090torch270cu128cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
143
+ pip install flash_attn-2.7.4.post1-0rtx5090torch270cu128cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
144
+
145
  pip uninstall apex
146
  ```
147
 
 
174
  * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
175
  * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
176
  * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
 
177
 
178
 
179
  ### Performance by Context Size
 
209
  | Large | 4096 | 52.5 | - | - | - | - | - |
210
 
211
 
212
+ **RTX 5090:**
213
 
214
  *Batch Size 1:*
215
 
216
  | Context | Input Tokens | S | M | L | XL | Original |
217
  |---------|-------------|---|---|---|----|---------|
218
+ | Small | 256 | 100.2 | 88.8 | 81.3 | - | 48.7 | - |
219
+ | Medium | 1024 | 99.4 | 88.3 | 80.7 | - | 47.2 | - |
220
+ | Large | 4096 | 94.9 | 84.6 | 77.7 | - | 41.1 | - |
221
 
222
  *Batch Size 2:*
223
 
224
  | Context | Input Tokens | S | M | L | XL | Original |
225
  |---------|-------------|---|---|---|----|---------|
226
+ | Small | 256 | 99.6 | 88.4 | 80.7 | - | 44.8 | - |
227
+ | Medium | 1024 | 97.9 | 86.8 | 79.4 | - | 41.8 | - |
228
+ | Large | 4096 | 92.3 | 82.3 | 75.6 | - | 33.2 | - |
229
 
230
  *Batch Size 4:*
231
 
232
  | Context | Input Tokens | S | M | L | XL | Original |
233
  |---------|-------------|---|---|---|----|---------|
234
+ | Small | 256 | 97.4 | 86.6 | 79.0 | - | 43.1 | - |
235
+ | Medium | 1024 | 94.7 | 84.1 | 77.0 | - | 38.2 | - |
236
+ | Large | 4096 | 81.1 | 73.3 | 67.8 | - | 24.5 | - |
237
 
238
 
239