Update README.md
Browse files
README.md
CHANGED
@@ -17,8 +17,8 @@ base_model:
|
|
17 |
|
18 |
[Paper](https://arxiv.org/abs/2505.20161) [Project Page](https://nvlabs.github.io/prismatic-synthesis/)
|
19 |
|
20 |
-
PrismNLI-0.4B is a compact yet powerful
|
21 |
-
**Despite its small size, it delivers state-of-the-art performance on 8 NLI benchmarks**, making it a go-to solution for high-accuracy, low-latency
|
22 |
|
23 |
PrismNLI-0.4B is fine-tuned from [deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large)
|
24 |
on our high-quality dataset [PrismNLI](https://huggingface.co/datasets/Jaehun/PrismNLI), curated specifically to improve generalization of the trained model.
|
@@ -29,37 +29,40 @@ The enhancement includes:
|
|
29 |
- Instead of starting from scratch, we start from [deberta-v3-large-zeroshot-v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0), a checkpoint of
|
30 |
deberta-v3-lage trained on diverse classification data.
|
31 |
- Following prior works on entailment models, we reformulate the traditional 3-way NLI classification—`entailment`, `neutral`, and `contradiction`—into a binary setup:
|
32 |
-
`entailment` vs. `not-entailment`. This simplification
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
-
|
|
|
36 |
The model has been fine-tuned on 515K NLI datapoints from [PrismNLI](https://huggingface.co/datasets/Jaehun/PrismNLI), a synthetic dataset to improve generalization of
|
37 |
NLI models. The dataset has been generated by [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) via our algorithm, Prismatic Synthesis that scales synthetic data while improving the diversity of generated samples.
|
38 |
|
39 |
-
## Model
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
- **Repository:** [More Information Needed]
|
60 |
-
- **Paper [optional]:** [More Information Needed]
|
61 |
-
- **Demo [optional]:** [More Information Needed]
|
62 |
-
|
63 |
|
64 |
## Citation
|
65 |
If you find this model useful, please consider citing us!
|
|
|
17 |
|
18 |
[Paper](https://arxiv.org/abs/2505.20161) [Project Page](https://nvlabs.github.io/prismatic-synthesis/)
|
19 |
|
20 |
+
PrismNLI-0.4B is a compact yet powerful model, purpose-built for natural language inference (NLI) and zero-shot classification.
|
21 |
+
**Despite its small size, it delivers state-of-the-art performance on 8 NLI benchmarks**, making it a go-to solution for high-accuracy, low-latency applications.
|
22 |
|
23 |
PrismNLI-0.4B is fine-tuned from [deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large)
|
24 |
on our high-quality dataset [PrismNLI](https://huggingface.co/datasets/Jaehun/PrismNLI), curated specifically to improve generalization of the trained model.
|
|
|
29 |
- Instead of starting from scratch, we start from [deberta-v3-large-zeroshot-v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0), a checkpoint of
|
30 |
deberta-v3-lage trained on diverse classification data.
|
31 |
- Following prior works on entailment models, we reformulate the traditional 3-way NLI classification—`entailment`, `neutral`, and `contradiction`—into a binary setup:
|
32 |
+
`entailment` vs. `not-entailment`. This simplification helps the model to better act as a **universal classifier** by simply asking: *Is this hypothesis true, given the premise?*
|
33 |
|
34 |
+
| **Model** | **Average** | **HANS** | **WNLI** | **ANLI-r1** | **ANLI-r2** | **ANLI-r3** | **Diagnostics** | **BigBench** | **Control** |
|
35 |
+
|--------------------------------|-------------|----------|----------|-------------|-------------|-------------|-----------------|--------------|-------------|
|
36 |
+
| deberta-v3-large-zeroshot-v2.0 | 79.47 | 81.28 | 70.68 | 86.40 | 77.60 | 77.50 | 83.59 | 87.03 | 71.68 |
|
37 |
+
| modernBERT-large-zeroshot-v2.0 | 74.78 | 80.30 | 66.00 | 81.20 | 71.50 | 71.67 | 82.05 | 73.18 | 72.30 |
|
38 |
+
| deberta-v3-large-mfalw | 80.62 | 81.10 | **74.08**| 86.30 | **79.90** | 78.33 | 85.22 | 85.61 | 74.40 |
|
39 |
+
| PrismNLI-0.4B | **82.88** | **90.68**| 72.95 | **87.70** | 78.80 | **79.58** | **86.22** | **90.59** | **76.52** |
|
40 |
|
41 |
+
|
42 |
+
## Training Data
|
43 |
The model has been fine-tuned on 515K NLI datapoints from [PrismNLI](https://huggingface.co/datasets/Jaehun/PrismNLI), a synthetic dataset to improve generalization of
|
44 |
NLI models. The dataset has been generated by [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) via our algorithm, Prismatic Synthesis that scales synthetic data while improving the diversity of generated samples.
|
45 |
|
46 |
+
## Model Usage
|
47 |
+
The model can be used as a standard NLI (entailment detection) classifier. Label `0` denotes `entailment`, and Label `l` denotes `not-entailment`.
|
48 |
+
Beyond NLI, the model can serve as a zero-shot classifier:
|
49 |
+
```python
|
50 |
+
from transformers import pipeline
|
51 |
+
|
52 |
+
text = "It was baked in a wood-fired oven and topped with San Marzano tomatoes and buffalo mozzarella."
|
53 |
+
hypothesis_template = "This text is about {}"
|
54 |
+
classes_verbalized = ['pizza', 'pasta', 'salad', 'sushi']
|
55 |
+
zeroshot_classifier = pipeline("zero-shot-classification", model="Jaehun/PrismNLI-0.4B")
|
56 |
+
output = zeroshot_classifier(text, classes_verbalized, hypothesis_template=hypothesis_template, multi_label=False)
|
57 |
+
```
|
58 |
+
The output will be like:
|
59 |
+
```python
|
60 |
+
{
|
61 |
+
'sequence': 'It was baked in a wood-fired oven and topped with San Marzano tomatoes and buffalo mozzarella.',
|
62 |
+
'labels': ['pizza', 'pasta', 'salad', 'sushi'],
|
63 |
+
'scores': [0.9982, 0.0014, 0.0002, 0.0002],
|
64 |
+
}
|
65 |
+
```
|
|
|
|
|
|
|
|
|
66 |
|
67 |
## Citation
|
68 |
If you find this model useful, please consider citing us!
|