Update README.md
Browse files
README.md
CHANGED
@@ -50,6 +50,8 @@ __Goals of elastic models:__
|
|
50 |
|
51 |
## Inference
|
52 |
|
|
|
|
|
53 |
To infer our models, you just need to replace `transformers` import with `elastic_models.transformers`:
|
54 |
|
55 |
```python
|
@@ -115,7 +117,7 @@ print(f"# A:\n{output}\n")
|
|
115 |
```
|
116 |
|
117 |
__System requirements:__
|
118 |
-
* GPUs: Nvidia GeForce RTX 4090
|
119 |
* CPU: AMD, Intel
|
120 |
* Python: 3.10-3.12
|
121 |
|
@@ -128,8 +130,18 @@ pip install elastic_models[nvidia]\
|
|
128 |
--index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
|
129 |
--extra-index-url https://pypi.nvidia.com\
|
130 |
--extra-index-url https://pypi.org/simple
|
131 |
-
|
132 |
pip install flash_attn==2.7.3 --no-build-isolation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
133 |
pip uninstall apex
|
134 |
```
|
135 |
|
@@ -162,7 +174,6 @@ Benchmarking is one of the most important procedures during model acceleration.
|
|
162 |
* **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
|
163 |
* **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
|
164 |
* **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
|
165 |
-
* **GSM8K**: GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems.
|
166 |
|
167 |
|
168 |
### Performance by Context Size
|
@@ -198,31 +209,31 @@ The tables below show performance (tokens per second) for different input contex
|
|
198 |
| Large | 4096 | 52.5 | - | - | - | - | - |
|
199 |
|
200 |
|
201 |
-
|
202 |
|
203 |
*Batch Size 1:*
|
204 |
|
205 |
| Context | Input Tokens | S | M | L | XL | Original |
|
206 |
|---------|-------------|---|---|---|----|---------|
|
207 |
-
| Small | 256 |
|
208 |
-
| Medium | 1024 |
|
209 |
-
| Large | 4096 |
|
210 |
|
211 |
*Batch Size 2:*
|
212 |
|
213 |
| Context | Input Tokens | S | M | L | XL | Original |
|
214 |
|---------|-------------|---|---|---|----|---------|
|
215 |
-
| Small | 256 |
|
216 |
-
| Medium | 1024 |
|
217 |
-
| Large | 4096 |
|
218 |
|
219 |
*Batch Size 4:*
|
220 |
|
221 |
| Context | Input Tokens | S | M | L | XL | Original |
|
222 |
|---------|-------------|---|---|---|----|---------|
|
223 |
-
| Small | 256 |
|
224 |
-
| Medium | 1024 |
|
225 |
-
| Large | 4096 |
|
226 |
|
227 |
|
228 |
|
|
|
50 |
|
51 |
## Inference
|
52 |
|
53 |
+
> Compiled versions are currently available only for batch sizes 1, 2 and 4. Other versions are not yet accessible. Stay tuned for updates!
|
54 |
+
|
55 |
To infer our models, you just need to replace `transformers` import with `elastic_models.transformers`:
|
56 |
|
57 |
```python
|
|
|
117 |
```
|
118 |
|
119 |
__System requirements:__
|
120 |
+
* GPUs: Nvidia GeForce RTX 4090, Nvidia GeForce RTX 5090
|
121 |
* CPU: AMD, Intel
|
122 |
* Python: 3.10-3.12
|
123 |
|
|
|
130 |
--index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
|
131 |
--extra-index-url https://pypi.nvidia.com\
|
132 |
--extra-index-url https://pypi.org/simple
|
|
|
133 |
pip install flash_attn==2.7.3 --no-build-isolation
|
134 |
+
|
135 |
+
# or for blackwell support
|
136 |
+
pip install elastic_models[blackwell]\
|
137 |
+
--index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
|
138 |
+
--extra-index-url https://pypi.nvidia.com\
|
139 |
+
--extra-index-url https://pypi.org/simple
|
140 |
+
pip install torch==2.7.0+cu128 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
|
141 |
+
# please download the appropriate version of Wheels for your system from https://github.com/Zarrac/flashattention-blackwell-wheels-whl-ONLY-5090-5080-5070-5060-flash-attention-/releases/tag/FlashAttention
|
142 |
+
mv flash_attn-2.7.4.post1-rtx5090-torch2.7.0cu128cxx11abiTRUE-cp311-linux_x86_64.whl flash_attn-2.7.4.post1-0rtx5090torch270cu128cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
|
143 |
+
pip install flash_attn-2.7.4.post1-0rtx5090torch270cu128cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
|
144 |
+
|
145 |
pip uninstall apex
|
146 |
```
|
147 |
|
|
|
174 |
* **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
|
175 |
* **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
|
176 |
* **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
|
|
|
177 |
|
178 |
|
179 |
### Performance by Context Size
|
|
|
209 |
| Large | 4096 | 52.5 | - | - | - | - | - |
|
210 |
|
211 |
|
212 |
+
**RTX 5090:**
|
213 |
|
214 |
*Batch Size 1:*
|
215 |
|
216 |
| Context | Input Tokens | S | M | L | XL | Original |
|
217 |
|---------|-------------|---|---|---|----|---------|
|
218 |
+
| Small | 256 | 100.2 | 88.8 | 81.3 | - | 48.7 | - |
|
219 |
+
| Medium | 1024 | 99.4 | 88.3 | 80.7 | - | 47.2 | - |
|
220 |
+
| Large | 4096 | 94.9 | 84.6 | 77.7 | - | 41.1 | - |
|
221 |
|
222 |
*Batch Size 2:*
|
223 |
|
224 |
| Context | Input Tokens | S | M | L | XL | Original |
|
225 |
|---------|-------------|---|---|---|----|---------|
|
226 |
+
| Small | 256 | 99.6 | 88.4 | 80.7 | - | 44.8 | - |
|
227 |
+
| Medium | 1024 | 97.9 | 86.8 | 79.4 | - | 41.8 | - |
|
228 |
+
| Large | 4096 | 92.3 | 82.3 | 75.6 | - | 33.2 | - |
|
229 |
|
230 |
*Batch Size 4:*
|
231 |
|
232 |
| Context | Input Tokens | S | M | L | XL | Original |
|
233 |
|---------|-------------|---|---|---|----|---------|
|
234 |
+
| Small | 256 | 97.4 | 86.6 | 79.0 | - | 43.1 | - |
|
235 |
+
| Medium | 1024 | 94.7 | 84.1 | 77.0 | - | 38.2 | - |
|
236 |
+
| Large | 4096 | 81.1 | 73.3 | 67.8 | - | 24.5 | - |
|
237 |
|
238 |
|
239 |
|