Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,80 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Flux Lite 8B – 1024×1024 (Tensor Parallelism 4, AWS Inf2)
|
2 |
+
|
3 |
+
🚀 This repository contains the **compiled NeuronX graph** for running [Freepik’s Flux.1-Lite-8B](https://huggingface.co/Freepik/flux.1-lite-8B) model on **AWS Inferentia2 (Inf2)** instances, optimized for **1024×1024 image generation** with **tensor parallelism = 4**.
|
4 |
+
|
5 |
+
The model has been compiled using [🤗 Optimum Neuron](https://huggingface.co/docs/optimum/neuron/index) to leverage AWS NeuronCores for efficient inference at scale.
|
6 |
+
|
7 |
+
---
|
8 |
+
|
9 |
+
## 🔧 Compilation Details
|
10 |
+
- **Base model:** `Freepik/flux.1-lite-8B`
|
11 |
+
- **Framework:** [optimum-neuron](https://github.com/huggingface/optimum-neuron)
|
12 |
+
- **Tensor Parallelism:** `4` (splits model across 4 NeuronCores)
|
13 |
+
- **Input resolution:** `1024 × 1024`
|
14 |
+
- **Batch size:** `1`
|
15 |
+
- **Precision:** `bfloat16`
|
16 |
+
- **Auto-casting:** disabled (`auto_cast="none"`)
|
17 |
+
|
18 |
+
---
|
19 |
+
|
20 |
+
## 📥 Installation
|
21 |
+
|
22 |
+
Make sure you are running on an **AWS Inf2 instance** with the [AWS Neuron SDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-intro.html) installed.
|
23 |
+
|
24 |
+
```bash
|
25 |
+
pip install "optimum[neuron]" torch torchvision
|
26 |
+
```
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
|
31 |
+
# 🚀 Usage
|
32 |
+
|
33 |
+
from optimum.neuron import NeuronFluxPipeline
|
34 |
+
|
35 |
+
# Load compiled pipeline from Hugging Face
|
36 |
+
```bash
|
37 |
+
pipe = NeuronFluxPipeline.from_pretrained(
|
38 |
+
"kutayozbay/flux-lite-8B-1024x1024-tp4",
|
39 |
+
device="neuron", # run on AWS Inf2 NeuronCores
|
40 |
+
torch_dtype="bfloat16",
|
41 |
+
batch_size=1,
|
42 |
+
height=1024,
|
43 |
+
width=1024,
|
44 |
+
tensor_parallel_size=4,
|
45 |
+
)
|
46 |
+
```
|
47 |
+
|
48 |
+
# Generate an image
|
49 |
+
|
50 |
+
```bash
|
51 |
+
prompt = "A futuristic city skyline at sunset"
|
52 |
+
image = pipe(prompt).images[0]
|
53 |
+
image.save("flux_output.png")
|
54 |
+
|
55 |
+
```
|
56 |
+
|
57 |
+
|
58 |
+
# 🛠 Re-compilation Example
|
59 |
+
|
60 |
+
To compile this model yourself:
|
61 |
+
|
62 |
+
```bash
|
63 |
+
|
64 |
+
from optimum.neuron import NeuronFluxPipeline
|
65 |
+
|
66 |
+
compiler_args = {"auto_cast": "none"}
|
67 |
+
input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
|
68 |
+
|
69 |
+
pipe = NeuronFluxPipeline.from_pretrained(
|
70 |
+
"Freepik/flux.1-lite-8B",
|
71 |
+
torch_dtype="bfloat16",
|
72 |
+
export=True,
|
73 |
+
tensor_parallel_size=4,
|
74 |
+
**compiler_args,
|
75 |
+
**input_shapes,
|
76 |
+
)
|
77 |
+
|
78 |
+
pipe.save_pretrained("flux_lite_neuronx_1024_tp4/")
|
79 |
+
|
80 |
+
```
|