LCM SDXL (OpenVINO) – UNet INT8 Quantized (NNCF) [UNet-only PTQ]
This repository provides an OpenVINO-optimized Latent Consistency Model (SDXL variant) where only the UNet has been post-training quantized to INT8 using NNCF. All other components (text encoders, VAE encoder/decoder, schedulers, tokenizers) remain in FP16 to preserve output quality while accelerating the most compute‑intensive part.
- The export is based on the OpenVINO Notebook - latent-consistency-models-image-generation
- To export the model see: Github: ovgenai-lcm-sdxl
What's Inside
Directory structure mirrors a standard Optimum Intel / OpenVINO diffusion pipeline:
unet/
– INT8 quantized OpenVINO IR (openvino_model.xml/bin
)text_encoder/
,text_encoder_2/
– FP16 IRvae_encoder/
,vae_decoder/
– FP16 IRtokenizer/
,tokenizer_2/
– Tokenizer assetsscheduler/
– Scheduler configmodel_index.json
– Pipeline index (unchanged)
Quantization Summary
- Technique: Post-Training Quantization (PTQ) via
nncf.quantize
- Scope: UNet only
- Subset size: 200 calibration samples (conceptual_captions subset)
- Bias correction: Disabled (matches original notebook flow)
- Model type hint:
nncf.ModelType.TRANSFORMER
- Other submodules: Unmodified FP16
Usage With OpenVINO GenAI (Python)
python3 -m venv ov-infer-lcm-sdxl-env
source ov-infer-lcm-sdxl-env/bin/activate
pip install openvino-genai pillow
git lfs install
git clone https://huggingface.co/rpanchum/lcm-sdxl-ov-fp16-quant_unet/
wget https://raw.githubusercontent.com/ravi9/ovgenai-lcm-sdxl/refs/heads/main/run-lcm-sdxl-ov.py
python run-lcm-sdxl-ov.py -m lcm-sdxl-ov-fp16-quant_unet
Usage With Optimum: (Python)
from optimum.intel.openvino import OVDiffusionPipeline
from pathlib import Path
model_dir = Path("./lcm-sdxl-ov-fp16-quant_unet")
pipe = OVDiffusionPipeline.from_pretrained(model_dir, device="CPU") # or "GPU" / "AUTO"
prompt = "a beautiful pink unicorn, 8k"
image = pipe(prompt, num_inference_steps=4, guidance_scale=8.0, height=1024, width=1024).images[0]
image.save("sample.png")
Performance Notes
- UNet dominates diffusion compute time; INT8 compression provides speedup (exact factor depends on hardware: CPU vector width, memory bandwidth, thread count).
- Remaining FP16 modules ensure text conditioning & decoding quality are preserved.
Reproducibility Steps (High Level)
License & Source
The quantized UNet inherits the original model's license (Apache-2.0 placeholder here). Ensure compatibility with upstream SDXL LCM license and any dataset usage terms (conceptual_captions) before redistribution.
Acknowledgements
- Original diffusion & LCM concepts: Stability AI / open-source diffusion community.
- OpenVINO Runtime & GenAI library.
- NNCF for PTQ framework.
Model tree for rpanchum/lcm-sdxl-ov-fp16-quant_unet
Base model
stabilityai/stable-diffusion-xl-base-1.0
Finetuned
latent-consistency/lcm-sdxl