PixelBytes-Pokemon / README.md
ffurfaro's picture
Upload tokenizer
214fc0a verified
|
raw
history blame
1.31 kB
---
datasets:
- ffurfaro/PixelBytes-Pokemon
language: en
library_name: pytorch
license: mit
pipeline_tag: text-to-image
tags:
- image-generation
- text-generation
- multimodal
---
# PixelBytes: Unified Multimodal Generation
Welcome to the **PixelBytes** repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding.
## Overview
### Key Concepts
- **Image Transformer**: Generates images pixel by pixel.
- **Bi-Mamba+**: A bidirectional model for time series prediction.
- **MambaByte**: A selective state-space model without tokens.
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.
## Dataset
We use the **PixelBytes-Pokemon** dataset, available on Hugging Face: [PixelBytes-Pokemon](https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon). It contains text and image sequences of Pokémon for training our model.
## Models Trained
- **8 LSTM Models**: Bidirectional + 1, 2, 3 layers (including p_embed + bi-2 layers)
- **6 Mamba Models**: Bidirectional + 1, 2, 3 layers
- **3 Transformer Models**: 1, 2, 3 layers
---
Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects.