BAAI
/

Diffusers
qiying commited on
Commit
b4df535
·
1 Parent(s): 062dacc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ <div align='center'>
4
+ <h1>Emu: An Open Multimodal Generalist</h1h1>
5
+ <h3><a href="">Generative Pretraining in Multimodality</a></h3>
6
+
7
+ [Quan Sun](https://github.com/Quan-Sun)<sup>1*</sup>, [Qiying Yu](https://yqy2001.github.io)<sup>2,1*</sup>, [Yufeng Cui]()<sup>1*</sup>, [Fan Zhang]()<sup>1*</sup>, [Xiaosong Zhang](https://github.com/zhangxiaosong18)<sup>1*</sup>, [Yueze Wang]()<sup>1</sup>, [Hongcheng Gao]()<sup>1</sup>, [Jingjing Liu](https://air.tsinghua.edu.cn/en/info/1046/1194.htm)<sup>2</sup>, [Tiejun Huang](https://scholar.google.com/citations?user=knvEK4AAAAAJ&hl=en)<sup>1,3</sup>, [Xinlong Wang](https://www.xloong.wang/)<sup>1</sup>
8
+
9
+ <sup>1</sup> [BAAI](https://www.baai.ac.cn/english.html), <sup>2</sup> [THU](https://air.tsinghua.edu.cn), <sup>3</sup> [PKU](https://english.pku.edu.cn/) <br><sup>*</sup> Equal Contribution
10
+
11
+ | [Paper]() | [Demo(soon)]() |
12
+ </div>
13
+
14
+ **Emu** is a Large Multimodal Model (LMM) trained with a unified autoregressive objective, *i.e.*, predict-the-next-element, including both visual embeddings and textual tokens. Trained under this objective, **Emu** can serve as a generalist interface for diverse multimodal tasks, such as image captioning, image/video question answering, and text-to-image generation, together with new abilities like in-context text and image generation, and image blending.
15
+
16
+ ## Setup
17
+
18
+ Clone the github repository and install required packages:
19
+
20
+ ```shell
21
+ git clone https://github.com/baaivision/Emu
22
+ cd Emu
23
+
24
+ pip install -r requirements.txt
25
+ ```
26
+
27
+ ## Model Weights
28
+
29
+ We release the pretrained and instruction-tuned weights of **Emu**. Our weights are subject to LLaMA's [license](https://github.com/facebookresearch/llama/blob/main/LICENSE).
30
+
31
+ | Model name | Weight |
32
+ | ---------- | ------------------------------------------------------- |
33
+ | **Emu** | [🤗 HF link](https://huggingface.co/BAAI/Emu/blob/main/Emu-pretrain.pt) (27GB) |
34
+ | **Emu-I** | [🤗 HF link](https://huggingface.co/BAAI/Emu/blob/main/Emu-instruct.pt) (27GB) |
35
+
36
+ ## Model Usage
37
+
38
+ At present, we provide inference code for image captioning and visual question answering:
39
+
40
+ ```sh
41
+ python emu_inference.py --instruct --ckpt-path $Instruct_CKPT_PATH
42
+ ```
43
+
44
+ ## Citation
45
+
46
+ If you find Emu useful for your your research and applications, please consider citing:
47
+
48
+ ```
49
+ @article{Emu,
50
+ title={Generative Pretraining in Multimodality},
51
+ author={Sun, Quan and Yu, Qiying and Cui, Yufeng and Zhang, Fan and Zhang, Xiaosong and Wang, Yueze and Gao, Hongcheng and Liu, Jingjing and Huang, Tiejun and Wang, Xinlong},
52
+ publisher={arXiv:},
53
+ year={2023},
54
+ }
55
+