File size: 2,344 Bytes
84c8015
 
fa0b010
 
 
 
 
84c8015
fa0b010
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
library_name: paddlenlp
license: apache-2.0
tags:
- summarization
language:
- zh
---

[![paddlenlp-banner](https://user-images.githubusercontent.com/1371212/175816733-8ec25eb0-9af3-4380-9218-27c154518258.png)](https://github.com/PaddlePaddle/PaddleNLP)

# PaddlePaddle/unimo-text-1.0

## Introduction

Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. 
They can only utilize single-modal data (i.e. text or image) or limited multi-modal data (i.e. image-text pairs). 
In this work, we propose a unified-modal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal 
understanding and generation tasks. Large scale of free text corpus and image collections can be utilized to improve the capability of visual 
and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified 
semantic space over a corpus of image-text pairs. As the non-paired single-modal data is very rich, our model can utilize much larger scale of 
data to learn more generalizable representations. Moreover, the textual knowledge and visual knowledge can enhance each other in the unified semantic space. 
The experimental results show that UNIMO significantly improves the performance of several single-modal and multi-modal downstream tasks.

More detail: https://arxiv.org/abs/2012.15409

## Available Models

- **unimo-text-1.0**, *12 layer, 12 heads, 768 hidden size, pretrained model*
- **unimo-text-1.0-large**, *24 layer, 16 heads, 1024 hidden size, pretrained model*
- **unimo-text-1.0-lcsts-new**, *12 layer, 12 heads, 768 hidden size, finetuned on the lcsts-new Chinese summarization dataset*
- **unimo-text-1.0-summary**, *12 layer, 12 heads, 768 hidden size, finetuned on several in-house Chinese summarization datasets*

## How to Use?

Click on the *Use in paddlenlp* button on the top right!

## Citation Info

```text
@article{ernie2.0,
  title = {UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning},
  author = {Li, Wei and Gao, Can and Niu, Guocheng and Xiao, Xinyan and Liu, Hao and Liu, Jiachen and Wu, Hua and Wang, Haifeng},
  journal={arXiv preprint arXiv:2012.15409},
  year = {2020},
}
```