Flux Lite 8B β 1024Γ1024 (Tensor Parallelism 4, AWS Inf2)
π This repository contains the compiled NeuronX graph for running Freepikβs Flux.1-Lite-8B model on AWS Inferentia2 (Inf2) instances, optimized for 1024Γ1024 image generation with tensor parallelism = 4.
The model has been compiled using π€ Optimum Neuron to leverage AWS NeuronCores for efficient inference at scale.
π§ Compilation Details
- Base model:
Freepik/flux.1-lite-8B
- Framework: optimum-neuron
- Tensor Parallelism:
4
(splits model across 4 NeuronCores) - Input resolution:
1024 Γ 1024
- Batch size:
1
- Precision:
bfloat16
- Auto-casting: disabled (
auto_cast="none"
)
π₯ Installation
Make sure you are running on an AWS Inf2 instance with the AWS Neuron SDK installed.
pip install "optimum[neuron]" torch torchvision
π Usage
from optimum.neuron import NeuronFluxPipeline
Load compiled pipeline from Hugging Face
pipe = NeuronFluxPipeline.from_pretrained(
"kutayozbay/flux-lite-8B-1024x1024-tp4",
device="neuron", # run on AWS Inf2 NeuronCores
torch_dtype="bfloat16",
batch_size=1,
height=1024,
width=1024,
tensor_parallel_size=4,
)
Generate an image
prompt = "A futuristic city skyline at sunset"
image = pipe(prompt).images[0]
image.save("flux_output.png")
π Re-compilation Example
To compile this model yourself:
from optimum.neuron import NeuronFluxPipeline
compiler_args = {"auto_cast": "none"}
input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
pipe = NeuronFluxPipeline.from_pretrained(
"Freepik/flux.1-lite-8B",
torch_dtype="bfloat16",
export=True,
tensor_parallel_size=4,
**compiler_args,
**input_shapes,
)
pipe.save_pretrained("flux_lite_neuronx_1024_tp4/")