LCM SDXL (OpenVINO) – UNet INT8 Quantized (NNCF) [UNet-only PTQ]

This repository provides an OpenVINO-optimized Latent Consistency Model (SDXL variant) where only the UNet has been post-training quantized to INT8 using NNCF. All other components (text encoders, VAE encoder/decoder, schedulers, tokenizers) remain in FP16 to preserve output quality while accelerating the most compute‑intensive part.

What's Inside

Directory structure mirrors a standard Optimum Intel / OpenVINO diffusion pipeline:

  • unet/ – INT8 quantized OpenVINO IR (openvino_model.xml/bin)
  • text_encoder/, text_encoder_2/ – FP16 IR
  • vae_encoder/, vae_decoder/ – FP16 IR
  • tokenizer/, tokenizer_2/ – Tokenizer assets
  • scheduler/ – Scheduler config
  • model_index.json – Pipeline index (unchanged)

Quantization Summary

  • Technique: Post-Training Quantization (PTQ) via nncf.quantize
  • Scope: UNet only
  • Subset size: 200 calibration samples (conceptual_captions subset)
  • Bias correction: Disabled (matches original notebook flow)
  • Model type hint: nncf.ModelType.TRANSFORMER
  • Other submodules: Unmodified FP16

Usage With OpenVINO GenAI (Python)

python3 -m venv ov-infer-lcm-sdxl-env
source ov-infer-lcm-sdxl-env/bin/activate
pip install openvino-genai pillow 

git lfs install
git clone https://huggingface.co/rpanchum/lcm-sdxl-ov-fp16-quant_unet/
wget https://raw.githubusercontent.com/ravi9/ovgenai-lcm-sdxl/refs/heads/main/run-lcm-sdxl-ov.py

python run-lcm-sdxl-ov.py -m lcm-sdxl-ov-fp16-quant_unet

Usage With Optimum: (Python)

from optimum.intel.openvino import OVDiffusionPipeline
from pathlib import Path

model_dir = Path("./lcm-sdxl-ov-fp16-quant_unet")
pipe = OVDiffusionPipeline.from_pretrained(model_dir, device="CPU")  # or "GPU" / "AUTO"

prompt = "a beautiful pink unicorn, 8k"
image = pipe(prompt, num_inference_steps=4, guidance_scale=8.0, height=1024, width=1024).images[0]
image.save("sample.png")

Performance Notes

  • UNet dominates diffusion compute time; INT8 compression provides speedup (exact factor depends on hardware: CPU vector width, memory bandwidth, thread count).
  • Remaining FP16 modules ensure text conditioning & decoding quality are preserved.

Reproducibility Steps (High Level)

License & Source

The quantized UNet inherits the original model's license (Apache-2.0 placeholder here). Ensure compatibility with upstream SDXL LCM license and any dataset usage terms (conceptual_captions) before redistribution.

Acknowledgements

  • Original diffusion & LCM concepts: Stability AI / open-source diffusion community.
  • OpenVINO Runtime & GenAI library.
  • NNCF for PTQ framework.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rpanchum/lcm-sdxl-ov-fp16-quant_unet

Finetuned
(1)
this model

Dataset used to train rpanchum/lcm-sdxl-ov-fp16-quant_unet