|
--- |
|
datasets: |
|
- ffurfaro/PixelBytes-Pokemon |
|
language: en |
|
library_name: pytorch |
|
license: mit |
|
pipeline_tag: text-to-image |
|
tags: |
|
- image-generation |
|
- text-generation |
|
- multimodal |
|
--- |
|
|
|
# PixelBytes: Unified Multimodal Generation |
|
|
|
Welcome to the **PixelBytes** repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding. |
|
|
|
## Overview |
|
|
|
### Key Concepts |
|
- **Image Transformer**: Generates images pixel by pixel. |
|
- **Bi-Mamba+**: A bidirectional model for time series prediction. |
|
- **MambaByte**: A selective state-space model without tokens. |
|
|
|
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency. |
|
|
|
## Dataset |
|
|
|
We use the **PixelBytes-Pokemon** dataset, available on Hugging Face: [PixelBytes-Pokemon](https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon). It contains text and image sequences of Pokémon for training our model. |
|
|
|
## Models Trained |
|
|
|
- **8 LSTM Models**: Bidirectional + 1, 2, 3 layers (including p_embed + bi-2 layers) |
|
- **6 Mamba Models**: Bidirectional + 1, 2, 3 layers |
|
- **3 Transformer Models**: 1, 2, 3 layers |
|
|
|
--- |
|
|
|
Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects. |