Spaces:
Runtime error
Runtime error
doc: reference dalle playground
Browse files
README.md
CHANGED
|
@@ -19,21 +19,25 @@ _Generate images from a text prompt_
|
|
| 19 |
|
| 20 |
Our logo was generated with DALL·E mini using the prompt "logo of an armchair in the shape of an avocado".
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
[
|
| 33 |
|
| 34 |
## Contributing
|
| 35 |
|
| 36 |
-
Join the community on the [
|
| 37 |
Any contribution is welcome, from reporting issues to proposing fixes/improvements or testing the model with cool prompts!
|
| 38 |
|
| 39 |
## Development
|
|
@@ -45,14 +49,6 @@ For inference only, use `pip install git+https://github.com/borisdayma/dalle-min
|
|
| 45 |
For development, clone the repo and use `pip install -e ".[dev]"`.
|
| 46 |
Before making a PR, check style with `make style`.
|
| 47 |
|
| 48 |
-
### Image Encoder
|
| 49 |
-
|
| 50 |
-
We use a VQGAN from [taming-transformers](https://github.com/CompVis/taming-transformers), which can also be fine-tuned.
|
| 51 |
-
|
| 52 |
-
Use [patil-suraj/vqgan-jax](https://github.com/patil-suraj/vqgan-jax) if you want to convert a checkpoint to JAX (does not support Gumbel).
|
| 53 |
-
|
| 54 |
-
Any image encoder that turns an image into a fixed sequence of tokens can be used.
|
| 55 |
-
|
| 56 |
### Training of DALL·E mini
|
| 57 |
|
| 58 |
Use [`tools/train/train.py`](tools/train/train.py).
|
|
@@ -65,8 +61,8 @@ You can also adjust the [sweep configuration file](https://docs.wandb.ai/guides/
|
|
| 65 |
|
| 66 |
Trained models are on 🤗 Model Hub:
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
|
| 71 |
### Where does the logo come from?
|
| 72 |
|
|
@@ -74,29 +70,29 @@ The "armchair in the shape of an avocado" was used by OpenAI when releasing DALL
|
|
| 74 |
|
| 75 |
## Acknowledgements
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
|
| 81 |
## Authors & Contributors
|
| 82 |
|
| 83 |
DALL·E mini was initially developed by:
|
| 84 |
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
|
| 94 |
Many thanks to the people who helped make it better:
|
| 95 |
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
|
| 101 |
## Citing DALL·E mini
|
| 102 |
|
|
@@ -121,13 +117,13 @@ Image encoder from "[Taming Transformers for High-Resolution Image Synthesis](ht
|
|
| 121 |
|
| 122 |
Sequence to sequence model based on "[BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461v1)" with implementation of a few variants:
|
| 123 |
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
|
| 132 |
Main optimizer (Distributed Shampoo) from "[Scalable Second Order Optimization for Deep Learning](https://arxiv.org/abs/2002.09018)".
|
| 133 |
|
|
|
|
| 19 |
|
| 20 |
Our logo was generated with DALL·E mini using the prompt "logo of an armchair in the shape of an avocado".
|
| 21 |
|
| 22 |
+
## How to use it?
|
| 23 |
|
| 24 |
+
There are several ways to use DALL·E mini to create your own images:
|
| 25 |
|
| 26 |
+
* use [the official DALL·E Mini demo](https://huggingface.co/spaces/dalle-mini/dalle-mini)
|
| 27 |
+
|
| 28 |
+
* experiment with the pipeline step by step through our [`inference pipeline notebook`](tools/inference/inference_pipeline.ipynb)
|
| 29 |
|
| 30 |
+
[](https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb)
|
| 31 |
|
| 32 |
+
* spin off your own app with [DALL-E Playground repository](https://github.com/saharmor/dalle-playground) (thanks [Sahar](https://twitter.com/theaievangelist))
|
| 33 |
+
|
| 34 |
+
## How does it work?
|
| 35 |
|
| 36 |
+
Refer to [our report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA).
|
| 37 |
|
| 38 |
## Contributing
|
| 39 |
|
| 40 |
+
Join the community on the [LAION Discord](https://discord.gg/xBPBXfcFHd).
|
| 41 |
Any contribution is welcome, from reporting issues to proposing fixes/improvements or testing the model with cool prompts!
|
| 42 |
|
| 43 |
## Development
|
|
|
|
| 49 |
For development, clone the repo and use `pip install -e ".[dev]"`.
|
| 50 |
Before making a PR, check style with `make style`.
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
### Training of DALL·E mini
|
| 53 |
|
| 54 |
Use [`tools/train/train.py`](tools/train/train.py).
|
|
|
|
| 61 |
|
| 62 |
Trained models are on 🤗 Model Hub:
|
| 63 |
|
| 64 |
+
* [VQGAN-f16-16384](https://huggingface.co/dalle-mini/vqgan_imagenet_f16_16384) for encoding/decoding images
|
| 65 |
+
* [DALL·E mini](https://huggingface.co/flax-community/dalle-mini) for generating images from a text prompt
|
| 66 |
|
| 67 |
### Where does the logo come from?
|
| 68 |
|
|
|
|
| 70 |
|
| 71 |
## Acknowledgements
|
| 72 |
|
| 73 |
+
* 🤗 Hugging Face for organizing [the FLAX/JAX community week](https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects)
|
| 74 |
+
* Google [TPU Research Cloud (TRC) program](https://sites.research.google/trc/) for providing computing resources
|
| 75 |
+
* [Weights & Biases](https://wandb.com/) for providing the infrastructure for experiment tracking and model management
|
| 76 |
|
| 77 |
## Authors & Contributors
|
| 78 |
|
| 79 |
DALL·E mini was initially developed by:
|
| 80 |
|
| 81 |
+
* [Boris Dayma](https://github.com/borisdayma)
|
| 82 |
+
* [Suraj Patil](https://github.com/patil-suraj)
|
| 83 |
+
* [Pedro Cuenca](https://github.com/pcuenca)
|
| 84 |
+
* [Khalid Saifullah](https://github.com/khalidsaifullaah)
|
| 85 |
+
* [Tanishq Abraham](https://github.com/tmabraham)
|
| 86 |
+
* [Phúc Lê Khắc](https://github.com/lkhphuc)
|
| 87 |
+
* [Luke Melas](https://github.com/lukemelas)
|
| 88 |
+
* [Ritobrata Ghosh](https://github.com/ghosh-r)
|
| 89 |
|
| 90 |
Many thanks to the people who helped make it better:
|
| 91 |
|
| 92 |
+
* the [DALLE-Pytorch](https://discord.gg/xBPBXfcFHd) and [EleutherAI](https://www.eleuther.ai/) communities for testing and exchanging cool ideas
|
| 93 |
+
* [Rohan Anil](https://github.com/rohan-anil) for adding Distributed Shampoo optimizer
|
| 94 |
+
* [Phil Wang](https://github.com/lucidrains) has provided a lot of cool implementations of transformer variants and gives interesting insights with [x-transformers](https://github.com/lucidrains/x-transformers)
|
| 95 |
+
* [Katherine Crowson](https://github.com/crowsonkb) for [super conditioning](https://twitter.com/RiversHaveWings/status/1478093658716966912)
|
| 96 |
|
| 97 |
## Citing DALL·E mini
|
| 98 |
|
|
|
|
| 117 |
|
| 118 |
Sequence to sequence model based on "[BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461v1)" with implementation of a few variants:
|
| 119 |
|
| 120 |
+
* "[GLU Variants Improve Transformer](https://arxiv.org/abs/2002.05202)"
|
| 121 |
+
* "[Deepnet: Scaling Transformers to 1,000 Layers](https://arxiv.org/abs/2203.00555)"
|
| 122 |
+
* "[NormFormer: Improved Transformer Pretraining with Extra Normalization](https://arxiv.org/abs/2110.09456)"
|
| 123 |
+
* "[Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)"
|
| 124 |
+
* "[CogView: Mastering Text-to-Image Generation via Transformers](https://arxiv.org/abs/2105.13290v2)"
|
| 125 |
+
* "[Root Mean Square Layer Normalization](https://arxiv.org/abs/1910.07467)"
|
| 126 |
+
* "[Sinkformers: Transformers with Doubly Stochastic Attention](https://arxiv.org/abs/2110.11773)"
|
| 127 |
|
| 128 |
Main optimizer (Distributed Shampoo) from "[Scalable Second Order Optimization for Deep Learning](https://arxiv.org/abs/2002.09018)".
|
| 129 |
|