DFloat11 Compressed Model: OmniGen2/OmniGen2
MLLM
This is a DFloat11 losslessly compressed version of the original OmniGen2/OmniGen2
model. It reduces model size by 32% compared to the original BFloat16 model, while maintaining bit-identical outputs and supporting efficient GPU inference.
π Performance Comparison
Metric | OmniGen2 (BFloat16) | OmniGen2 (DFloat11) |
---|---|---|
Model Size | 16.23 GB | 11.11 GB |
Peak GPU Memory (1024Γ1024 image generation) |
18.41 GB | 14.36 GB |
Generation Time (A100 GPU) |
25 seconds | 27 seconds |
π§ How to Use
A complete usage guide is available in our GitHub repository (forked from the official OmniGen2 repository).
π https://github.com/LeanModels/OmniGen2-DFloat11 π
π How It Works
We apply Huffman coding to losslessly compress the exponent bits of BFloat16 model weights, which are highly compressible (their 8 bits carry only ~2.6 bits of actual information). To enable fast inference, we implement a highly efficient CUDA kernel that performs on-the-fly weight decompression directly on the GPU.
The result is a model that is ~32% smaller, delivers bit-identical outputs, and achieves performance comparable to the original BFloat16 model.
Learn more in our research paper.
π Learn More
- Downloads last month
- 0
Model tree for DFloat11/OmniGen2-mllm-DF11
Base model
OmniGen2/OmniGen2