|
--- |
|
license: mit |
|
language: |
|
- en |
|
tags: |
|
- t5 |
|
model-index: |
|
- name: metro_t0pp_basepp |
|
results: |
|
- task: |
|
type: natural-language-inference |
|
dataset: |
|
type: super_glue |
|
name: RTE |
|
config: rte |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 77.79783393501806 |
|
- task: |
|
type: natural-language-inference |
|
dataset: |
|
type: super_glue |
|
name: CB |
|
config: cb |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 69.52380952380955 |
|
- task: |
|
type: natural-language-inference |
|
dataset: |
|
type: anli |
|
name: ANLI R1 |
|
split: dev_r1 |
|
metrics: |
|
- type: accuracy |
|
value: 39.693333333333335 |
|
- task: |
|
type: natural-language-inference |
|
dataset: |
|
type: anli |
|
name: ANLI R2 |
|
split: dev_r2 |
|
metrics: |
|
- type: accuracy |
|
value: 36.61333333333334 |
|
- task: |
|
type: natural-language-inference |
|
dataset: |
|
type: anli |
|
name: ANLI R3 |
|
split: dev_r3 |
|
metrics: |
|
- type: accuracy |
|
value: 40.08333333333334 |
|
- task: |
|
type: coreference-resolution |
|
dataset: |
|
type: super_glue |
|
name: WSC |
|
config: wsc.fixed |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 61.44230769230769 |
|
- task: |
|
type: coreference-resolution |
|
dataset: |
|
type: winogrande |
|
name: Winogrande XL |
|
config: winogrande_xl |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 54.55406471981057 |
|
- task: |
|
type: multiple-choice-qa |
|
dataset: |
|
type: super_glue |
|
name: COPA |
|
config: copa |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 83.875 |
|
- task: |
|
type: multiple-choice-qa |
|
dataset: |
|
type: story_cloze |
|
name: StoryCloze 2016 |
|
config: '2016' |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 90.88188134687333 |
|
- task: |
|
type: multiple-choice-qa |
|
dataset: |
|
type: hellaswag |
|
name: HellaSwag |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 68.5421230830512 |
|
- task: |
|
type: word-sense-disambiguation |
|
dataset: |
|
type: super_glue |
|
name: WiC |
|
config: wic |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 67.58620689655174 |
|
--- |
|
|
|
Official repository: https://github.com/gonglinyuan/metro_t0 |
|
|
|
# METRO-T0 |
|
|
|
Paper: [Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers](https://arxiv.org/abs/2305.12567) (ACL 2023) |
|
|
|
METRO-T0 is a T5-style text-to-text Transformer pretrained using model-generated pretraining signals, prompt-finetuned on a family of public NLP tasks proposed in [T0](https://arxiv.org/abs/2110.08207). |
|
METRO-T0 is highly parameter efficient. For example, METRO-T0-Large++ (775M parameters) outperforms GPT-3 (175B parameters) and T0-3B (3B parameters) on a wide range of NLP tasks. |
|
|
|
 |
|
|
|
 |
|
|
|
## Use METRO-T0++-Base++ |
|
|
|
To use METRO-T0++-Base++ in PyTorch (Python 3.7+, PyTorch 1.12+ and transformers 4.17+ are prerequisites), refer to the code snippet below: |
|
|
|
```python |
|
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer |
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained("gonglinyuan/metro_t0pp_basepp", trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained("gonglinyuan/metro_t0pp_basepp", trust_remote_code=True) |
|
|
|
input_text = "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy" |
|
inputs = tokenizer([input_text], max_length=512, truncation=True, add_special_tokens=True, return_tensors="pt").input_ids |
|
outputs = model.generate(inputs, max_new_tokens=256, do_sample=False) |
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # expected: positive |
|
``` |
|
|
|
## Other METRO-T0 Models |
|
|
|
| | # Parameters | Pretraining Data | Prompt-Finetuning Data | |
|
|--------------------|--------------|------------------|------------------------| |
|
| [METRO-T0-Base](https://huggingface.co/gonglinyuan/metro_t0_base) | 226M | Wikibook (16G) | T0 Train | |
|
| [METRO-T0+-Base](https://huggingface.co/gonglinyuan/metro_t0p_base) | 226M | Wikibook (16G) | T0+ Train | |
|
| [METRO-T0++-Base](https://huggingface.co/gonglinyuan/metro_t0pp_base) | 226M | Wikibook (16G) | T0++ Train | |
|
| [METRO-T0-Base++](https://huggingface.co/gonglinyuan/metro_t0_basepp) | 256M | 160G corpus | T0 Train | |
|
| [METRO-T0+-Base++](https://huggingface.co/gonglinyuan/metro_t0p_basepp) | 256M | 160G corpus | T0+ Train | |
|
| [METRO-T0++-Base++](https://huggingface.co/gonglinyuan/metro_t0pp_basepp) | 256M | 160G corpus | T0++ Train | |
|
| [METRO-T0-Large++](https://huggingface.co/gonglinyuan/metro_t0_largepp) | 775M | 160G corpus | T0 Train | |
|
| [METRO-T0+-Large++](https://huggingface.co/gonglinyuan/metro_t0p_largepp) | 775M | 160G corpus | T0+ Train | |
|
| [METRO-T0++-Large++](https://huggingface.co/gonglinyuan/metro_t0pp_largepp) | 775M | 160G corpus | T0++ Train | |
|
|
|
|
|
## Citation |
|
|
|
If you find the code and models useful for your research, please cite the following paper: |
|
|
|
``` |
|
@misc{gong2023modelgenerated, |
|
title={Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers}, |
|
author={Linyuan Gong and Chenyan Xiong and Xiaodong Liu and Payal Bajaj and Yiqing Xie and Alvin Cheung and Jianfeng Gao and Xia Song}, |
|
year={2023}, |
|
eprint={2305.12567}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2305.12567} |
|
} |
|
``` |