arxiv:2507.07997

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Published on Jul 10

Authors:

Abstract

MGVQ enhances VQ-VAEs by augmenting discrete codebooks, preserving latent dimensions, and incorporating sub-codebooks, achieving state-of-the-art performance in image reconstruction.

AI-generated summary

Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental models that compress continuous visual data into discrete tokens. Existing methods have tried to improve the quantization strategy for better reconstruction quality, however, there still exists a large gap between VQ-VAEs and VAEs. To narrow this gap, we propose MGVQ, a novel method to augment the representation capability of discrete codebooks, facilitating easier optimization for codebooks and minimizing information loss, thereby enhancing reconstruction quality. Specifically, we propose to retain the latent dimension to preserve encoded features and incorporate a set of sub-codebooks for quantization. Furthermore, we construct comprehensive zero-shot benchmarks featuring resolutions of 512p and 2k to evaluate the reconstruction performance of existing methods rigorously. MGVQ achieves the state-of-the-art performance on both ImageNet and 8 zero-shot benchmarks across all VQ-VAEs. Notably, compared with SD-VAE, we outperform them on ImageNet significantly, with rFID 0.49 v.s. 0.91, and achieve superior PSNR on all zero-shot benchmarks. These results highlight the superiority of MGVQ in reconstruction and pave the way for preserving fidelity in HD image processing tasks. Code will be publicly available at https://github.com/MKJia/MGVQ.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.07997 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.07997 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.07997 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.