LCM SDXL (OpenVINO) – UNet INT8 Quantized (NNCF) [UNet-only PTQ]

This repository provides an OpenVINO-optimized Latent Consistency Model (SDXL variant) where only the UNet has been post-training quantized to INT8 using NNCF. All other components (text encoders, VAE encoder/decoder, schedulers, tokenizers) remain in FP16 to preserve output quality while accelerating the most compute‑intensive part.

The export is based on the OpenVINO Notebook - latent-consistency-models-image-generation
To export the model see: Github: ovgenai-lcm-sdxl

What's Inside

Directory structure mirrors a standard Optimum Intel / OpenVINO diffusion pipeline:

unet/ – INT8 quantized OpenVINO IR (openvino_model.xml/bin)
text_encoder/, text_encoder_2/ – FP16 IR
vae_encoder/, vae_decoder/ – FP16 IR
tokenizer/, tokenizer_2/ – Tokenizer assets
scheduler/ – Scheduler config
model_index.json – Pipeline index (unchanged)

Quantization Summary

Technique: Post-Training Quantization (PTQ) via nncf.quantize
Scope: UNet only
Subset size: 200 calibration samples (conceptual_captions subset)
Bias correction: Disabled (matches original notebook flow)
Model type hint: nncf.ModelType.TRANSFORMER
Other submodules: Unmodified FP16

Usage With OpenVINO GenAI (Python)

python3 -m venv ov-infer-lcm-sdxl-env
source ov-infer-lcm-sdxl-env/bin/activate
pip install openvino-genai pillow 

git lfs install
git clone https://huggingface.co/rpanchum/lcm-sdxl-ov-fp16-quant_unet/
wget https://raw.githubusercontent.com/ravi9/ovgenai-lcm-sdxl/refs/heads/main/run-lcm-sdxl-ov.py

python run-lcm-sdxl-ov.py -m lcm-sdxl-ov-fp16-quant_unet

Usage With Optimum: (Python)

from optimum.intel.openvino import OVDiffusionPipeline
from pathlib import Path

model_dir = Path("./lcm-sdxl-ov-fp16-quant_unet")
pipe = OVDiffusionPipeline.from_pretrained(model_dir, device="CPU")  # or "GPU" / "AUTO"

prompt = "a beautiful pink unicorn, 8k"
image = pipe(prompt, num_inference_steps=4, guidance_scale=8.0, height=1024, width=1024).images[0]
image.save("sample.png")

Performance Notes

UNet dominates diffusion compute time; INT8 compression provides speedup (exact factor depends on hardware: CPU vector width, memory bandwidth, thread count).
Remaining FP16 modules ensure text conditioning & decoding quality are preserved.

Reproducibility Steps (High Level)

License & Source

The quantized UNet inherits the original model's license (Apache-2.0 placeholder here). Ensure compatibility with upstream SDXL LCM license and any dataset usage terms (conceptual_captions) before redistribution.

Acknowledgements

Original diffusion & LCM concepts: Stability AI / open-source diffusion community.
OpenVINO Runtime & GenAI library.
NNCF for PTQ framework.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for rpanchum/lcm-sdxl-ov-fp16-quant_unet

Base model

stabilityai/stable-diffusion-xl-base-1.0

Finetuned

latent-consistency/lcm-sdxl