Add library name and pipeline tag
Browse filesThis PR adds the `library_name` and `pipeline_tag` as metadata, ensuring the "how to use" button appears, and ensuring people can find your model at https://huggingface.co/models?pipeline_tag=text-generation. It also adds the paper abstract and a link to the demo.
README.md
CHANGED
@@ -1,22 +1,27 @@
|
|
1 |
---
|
2 |
-
|
|
|
3 |
datasets:
|
4 |
- allenai/tulu-v2-sft-mixture
|
5 |
- weqweasdas/preference_dataset_mixture2_and_safe_pku
|
6 |
language:
|
7 |
- en
|
8 |
-
|
9 |
-
|
|
|
10 |
---
|
|
|
11 |
# TESS 2 RM
|
12 |
|
13 |
This model is the reward model used for reward guidance decoding.
|
14 |
This model was finetuned from Mistral 7B v0.1, first instruction tuning using the Tulu 2 SFT mixture, and then RM-trained using the preference dataset mixture found [here](https://huggingface.co/datasets/weqweasdas/preference_dataset_mixture2_and_safe_pku).
|
15 |
For more details, please check out our paper [TESS-2: A Large-Scale, Generalist Diffusion Language Model](https://arxiv.org/abs/2502.13917).
|
16 |
|
|
|
|
|
17 |
## Using this model
|
18 |
|
19 |
-
This model is intended to be used with the repository https://github.com/hamishivi/tess-2 for guiding diffusion LM generations.
|
20 |
|
21 |
To run to this, first clone https://github.com/hamishivi/tess-2.
|
22 |
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- mistralai/Mistral-7B-v0.1
|
4 |
datasets:
|
5 |
- allenai/tulu-v2-sft-mixture
|
6 |
- weqweasdas/preference_dataset_mixture2_and_safe_pku
|
7 |
language:
|
8 |
- en
|
9 |
+
license: apache-2.0
|
10 |
+
library_name: transformers
|
11 |
+
pipeline_tag: text-generation
|
12 |
---
|
13 |
+
|
14 |
# TESS 2 RM
|
15 |
|
16 |
This model is the reward model used for reward guidance decoding.
|
17 |
This model was finetuned from Mistral 7B v0.1, first instruction tuning using the Tulu 2 SFT mixture, and then RM-trained using the preference dataset mixture found [here](https://huggingface.co/datasets/weqweasdas/preference_dataset_mixture2_and_safe_pku).
|
18 |
For more details, please check out our paper [TESS-2: A Large-Scale, Generalist Diffusion Language Model](https://arxiv.org/abs/2502.13917).
|
19 |
|
20 |
+
We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with the usual cross-entropy as diffusion loss, and then performing further instruction tuning. We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models. We further propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model. Finally, we show that TESS 2 further improves with increased inference-time compute, highlighting the utility of diffusion LMs in having fine-grained controllability over the amount of compute used at inference time. Code and models are available at https://github.com/hamishivi/tess-2.
|
21 |
+
|
22 |
## Using this model
|
23 |
|
24 |
+
This model is intended to be used with the repository https://github.com/hamishivi/tess-2 for guiding diffusion LM generations. You can also try out the demo: https://huggingface.co/spaces/hamishivi/tess-2-demo.
|
25 |
|
26 |
To run to this, first clone https://github.com/hamishivi/tess-2.
|
27 |
|