For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11

Feel free to request for other models for compression as well (for either the diffusers library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.

How to Use

diffusers

import torch
from diffusers import ZImagePipeline, ZImageTransformer2DModel
from dfloat11 import DFloat11Model
from transformers.modeling_utils import no_init_weights

text_encoder = DFloat11Model.from_pretrained("DFloat11/Qwen3-4B-DF11", device="cpu")
with no_init_weights():
    transformer = ZImageTransformer2DModel.from_config(
        ZImageTransformer2DModel.load_config(
            "Tongyi-MAI/Z-Image-Turbo", subfolder="transformer"
        ),
        torch_dtype=torch.bfloat16
    ).to(torch.bfloat16)
DFloat11Model.from_pretrained("mingyi456/Z-Image-Turbo-DF11", device="cpu", bfloat16_model=transformer)

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    text_encoder=text_encoder,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,  # This actually results in 8 DiT forwards
    guidance_scale=0.0,     # Guidance should be 0 for the Turbo models
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("example.png")

ComfyUI

Refer to this model instead.

Compression details

This is the pattern_dict for compression:

pattern_dict = {
    r"noise_refiner\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
        "adaLN_modulation.0"
    ),
    r"context_refiner\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
    ),
    r"layers\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
        "adaLN_modulation.0"
    ),
    r"cap_embedder": (
        "1",
    )
}
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mingyi456/Z-Image-Turbo-DF11

Quantized
(29)
this model