Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation
Abstract
This paper introduces a novel approach for topic modeling utilizing latent codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely encapsulating the rich information of the pre-trained embeddings such as the pre-trained language model. From the novel interpretation of the latent codebooks and embeddings as conceptual bag-of-words, we propose a new generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates the original documents related to the respective latent codebook. The TVQ-VAE can visualize the topics with various generative distributions including the traditional BoW distribution and the autoregressive image generation. Our experimental results on document analysis and image generation demonstrate that TVQ-VAE effectively captures the topic context which reveals the underlying structures of the dataset and supports flexible forms of document generation. Official implementation of the proposed TVQ-VAE is available at https://github.com/clovaai/TVQ-VAE.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM (2023)
- Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling (2023)
- Prompting Large Language Models for Topic Modeling (2023)
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation (2023)
- KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper