Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,33 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- text-generation
|
7 |
+
- non-autoregressive-generation
|
8 |
+
- early-exit
|
9 |
---
|
10 |
+
|
11 |
+
# ELMER
|
12 |
+
The ELMER model was proposed in [**ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation**](https://arxiv.org/abs/2210.13304) by Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie and Ji-Rong Wen.
|
13 |
+
|
14 |
+
The detailed information and instructions can be found [https://github.com/RUCAIBox/ELMER](https://github.com/RUCAIBox/ELMER).
|
15 |
+
|
16 |
+
## Model Description
|
17 |
+
ELMER is an efficient and effective PLM for NAR text generation, which generates tokens at different layers by leveraging the early exit technique.
|
18 |
+
|
19 |
+
The architecture of ELMER is a variant of the standard Transformer encoder-decoder and poses three technical contributions:
|
20 |
+
|
21 |
+
1. For decoder, we replace the original masked multi-head attention with bi-directional multi-head attention akin to the encoder. Therefore, ELMER dynamically adjusts the output length by emitting an end token "[EOS]" at any position.
|
22 |
+
2. Leveraging early exit, ELMER injects "off-ramps" at each decoder layer, which make predictions with intermediate hidden states. If ELMER exits at the $l$-th layer, we copy the $l$-th hidden states to the subsequent layers.
|
23 |
+
3. ELMER utilizes a novel pre-training objective, layer permutation language modeling (LPLM), to pre-train on the large-scale corpus. LPLM permutes the exit layer for each token from 1 to the maximum layer $L$.
|
24 |
+
|
25 |
+
## Examples
|
26 |
+
To fine-tune ELMER on non-autoregressive text generation:
|
27 |
+
```python
|
28 |
+
>>> from transformers import BartTokenizer as ElmerTokenizer
|
29 |
+
>>> from transformers import BartForConditionalGeneration as ElmerForConditionalGeneration
|
30 |
+
|
31 |
+
>>> tokenizer = ElmerTokenizer.from_pretrained("RUCAIBox/elmer")
|
32 |
+
>>> model = ElmerForConditionalGeneration.from_pretrained("RUCAIBox/elmer")
|
33 |
+
```
|