TheStageAI
/

Elastic-MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS

Text2Text Generation

Model card Files Files and versions Community

psynote123 commited on 1 day ago

Commit

fedd277

verified ·

1 Parent(s): b8efeeb

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -13

README.md CHANGED Viewed

@@ -50,6 +50,8 @@ __Goals of elastic models:__
 ## Inference
 To infer our models, you just need to replace `transformers` import with `elastic_models.transformers`:
 ```python
@@ -115,7 +117,7 @@ print(f"# A:\n{output}\n")
 ```
 __System requirements:__
-* GPUs: Nvidia GeForce RTX 4090
 * CPU: AMD, Intel
 * Python: 3.10-3.12
@@ -128,8 +130,18 @@ pip install elastic_models[nvidia]\
  --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
  --extra-index-url https://pypi.nvidia.com\
  --extra-index-url https://pypi.org/simple
 pip install flash_attn==2.7.3 --no-build-isolation
 pip uninstall apex
 ```
@@ -162,7 +174,6 @@ Benchmarking is one of the most important procedures during model acceleration.
 * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
 * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
 * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
-* **GSM8K**: GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems.
 ### Performance by Context Size
@@ -198,31 +209,31 @@ The tables below show performance (tokens per second) for different input contex
 | Large | 4096 | 52.5 | - | - | - | - | - |
-<!-- **RTX 5090:**
 *Batch Size 1:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 256 | - | - | - | - | - | - |
-| Medium | 1024 | - | - | - | - | - | - |
-| Large | 4096 | - | - | - | - | - | - |
 *Batch Size 2:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 256 | - | - | - | - | - | - |
-| Medium | 1024 | - | - | - | - | - | - |
-| Large | 4096 | - | - | - | - | - | - |
 *Batch Size 4:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
-| Small | 256 | - | - | - | - | - | - |
-| Medium | 1024 | - | - | - | - | - | - |
-| Large | 4096 | - | - | - | - | - | - | -->

 ## Inference
+> Compiled versions are currently available only for batch sizes 1, 2 and 4. Other versions are not yet accessible. Stay tuned for updates!
 To infer our models, you just need to replace `transformers` import with `elastic_models.transformers`:
 ```python
 ```
 __System requirements:__
+* GPUs: Nvidia GeForce RTX 4090, Nvidia GeForce RTX 5090
 * CPU: AMD, Intel
 * Python: 3.10-3.12
  --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
  --extra-index-url https://pypi.nvidia.com\
  --extra-index-url https://pypi.org/simple
 pip install flash_attn==2.7.3 --no-build-isolation
+# or for blackwell support
+pip install elastic_models[blackwell]\
+ --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
+ --extra-index-url https://pypi.nvidia.com\
+ --extra-index-url https://pypi.org/simple
+pip install torch==2.7.0+cu128 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
+# please download the appropriate version of Wheels for your system from https://github.com/Zarrac/flashattention-blackwell-wheels-whl-ONLY-5090-5080-5070-5060-flash-attention-/releases/tag/FlashAttention
+mv flash_attn-2.7.4.post1-rtx5090-torch2.7.0cu128cxx11abiTRUE-cp311-linux_x86_64.whl flash_attn-2.7.4.post1-0rtx5090torch270cu128cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
+pip install flash_attn-2.7.4.post1-0rtx5090torch270cu128cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
 pip uninstall apex
 ```
 * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
 * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
 * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
 ### Performance by Context Size
 | Large | 4096 | 52.5 | - | - | - | - | - |
+**RTX 5090:**
 *Batch Size 1:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 100.2 | 88.8 | 81.3 | - | 48.7 | - |
+| Medium | 1024 | 99.4 | 88.3 | 80.7 | - | 47.2 | - |
+| Large | 4096 | 94.9 | 84.6 | 77.7 | - | 41.1 | - |
 *Batch Size 2:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 99.6 | 88.4 | 80.7 | - | 44.8 | - |
+| Medium | 1024 | 97.9 | 86.8 | 79.4 | - | 41.8 | - |
+| Large | 4096 | 92.3 | 82.3 | 75.6 | - | 33.2 | - |
 *Batch Size 4:*
 | Context | Input Tokens | S | M | L | XL | Original |
 |---------|-------------|---|---|---|----|---------|
+| Small | 256 | 97.4 | 86.6 | 79.0 | - | 43.1 | - |
+| Medium | 1024 | 94.7 | 84.1 | 77.0 | - | 38.2 | - |
+| Large | 4096 | 81.1 | 73.3 | 67.8 | - | 24.5 | - |