File size: 4,649 Bytes

59ed44e
 
 
 
 
 
2940e42
59ed44e
 
 
 
 
 
 
 
 
 
 
 
2940e42
59ed44e
2940e42
 
 
 
59ed44e
 
 
 
 
 
 
4d77a1a
59ed44e
 
 
2209068
fa58578
59ed44e
 
 
8ce8a10
59ed44e
cf26aa7
59ed44e
 
 
8ce8a10
 
59ed44e
19beff0
8ce8a10
 
 
 
 
 
 
 
 
59ed44e
 
 
 
 
2940e42
e005364
59ed44e
 
 
 
2940e42
59ed44e
 
 
b288bdf
25bebb9
 
 
 
06481c0
 
 
 
 
0347962
 
06481c0
d522aa3
2940e42
59ed44e
 
48bd05a
59ed44e
 
 
 
 
 
 
 
 
 
 
2940e42
59ed44e
 
 
2940e42
 
 
 
 
 
59ed44e
 
 
 
2940e42
 
 
b540f2c
2940e42
e8358c6
8a6cab3
59ed44e
 
 
48bd05a
 
59ed44e

---
license: apache-2.0
base_model:
- black-forest-labs/FLUX.1-schnell
---

# Elastic model: Fastest self-serving models. FLUX.1-schnell.

Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:

* __XL__: Mathematically equivalent neural network, optimized with our DNN compiler. 

* __L__: Near lossless model, with less than 1% degradation obtained on corresponding benchmarks.

* __M__: Faster model, with accuracy degradation less than 1.5%.

* __S__: The fastest model, with accuracy degradation less than 2%.


__Goals of Elastic Models:__

* Provide the fastest models and service for self-hosting.
* Provide flexibility in cost vs quality selection for inference.
* Provide clear quality and latency benchmarks.
* Provide interface of HF libraries: transformers and diffusers with a single line of code.
* Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.

> It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.

-----

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6487003ecd55eec571d14f96/ouz3FYQzG8C7Fl3XpNe6t.jpeg)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/Zt16Ce2uT1GNcMHLO-6Yv.png)

## Inference

Currently, our demo model supports 1024x1024, 768x768 and 512x512 outputs without batching (for B200 - only 1024x1024). This will be updated in the near future.
To infer our models, you just need to replace `diffusers` import with `elastic_models.diffusers`:

```python
import torch
from elastic_models.diffusers import FluxPipeline

mode_name = 'black-forest-labs/FLUX.1-schnell'
hf_token = ''
device = torch.device("cuda")

pipeline = FluxPipeline.from_pretrained(
    mode_name,
    torch_dtype=torch.bfloat16,
    token=hf_token,
    mode='S'
)
pipeline.to(device)

prompts = ["Kitten eating a banana"]
output = pipeline(prompt=prompts)

for prompt, output_image in zip(prompts, output.images):
    output_image.save((prompt.replace(' ', '_') + '.png'))
```

### Installation


__System requirements:__
* GPUs: H100, L40s
* CPU: AMD, Intel
* Python: 3.10-3.12


To work with our models just run these lines in your terminal:

```shell
pip install thestage
pip install elastic_models[nvidia]\
 --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
 --extra-index-url https://pypi.nvidia.com\
 --extra-index-url https://pypi.org/simple

# or for blackwell support
pip install elastic_models[blackwell]\
 --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
 --extra-index-url https://pypi.nvidia.com\
 --extra-index-url https://pypi.org/simple
pip install -U --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
pip install -U --pre torchvision --index-url https://download.pytorch.org/whl/nightly/cu128

pip install flash_attn==2.7.3 --no-build-isolation
pip uninstall apex
```

Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:

```shell
thestage config set --api-token <YOUR_API_TOKEN>
```

Congrats, now you can use accelerated models!

----

## Benchmarks

Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms.

### Quality benchmarks

For quality evaluation we have used: PSNR, SSIM and CLIP score. PSNR and SSIM were computed using outputs of original model.
| Metric/Model  | S | M | L | XL | Original |
|---------------|---|---|---|----|----------|
| PSNR          | 29.9 | 30.2 | 31 | inf  | inf        |
| SSIM          | 0.66 | 0.71 | 0.86 | 1.0  | 1.0 |
| CLIP          | 11.5 | 11.6 | 11.8 | 11.9  | 11.9|


### Latency benchmarks

Time in seconds to generate one image 1024x1024
| GPU/Model | S   | M | L | XL | Original |
|-----------|-----|---|---|----|----------|
| H100      | 0.5 | 0.57 | 0.65 | 0.7  | 1.04 | 
| L40s      | 1.4  | 1.6 | 1.9 | 2.1  | 2.5|
| B200      | 0.3  | 0.4 | 0.42 | 0.43  | 0.74|
| GeForce RTX 5090      | 0.94  | - | - | -  | -|

## Links

* __Platform__: [app.thestage.ai](https://app.thestage.ai)
<!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) -->
* __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
* __Contact email__: [email protected]