|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- cross-encoder |
|
|
- reranker |
|
|
- generated_from_trainer |
|
|
- dataset_size:942069 |
|
|
- loss:PrecomputedDistillationLoss |
|
|
base_model: jhu-clsp/ettin-encoder-17m |
|
|
datasets: |
|
|
- dleemiller/all-nli-distill |
|
|
pipeline_tag: text-classification |
|
|
library_name: sentence-transformers |
|
|
metrics: |
|
|
- f1_macro |
|
|
- f1_micro |
|
|
- f1_weighted |
|
|
model-index: |
|
|
- name: CrossEncoder based on jhu-clsp/ettin-encoder-17m |
|
|
results: |
|
|
- task: |
|
|
type: cross-encoder-classification |
|
|
name: Cross Encoder Classification |
|
|
dataset: |
|
|
name: AllNLI dev |
|
|
type: AllNLI-dev |
|
|
metrics: |
|
|
- type: f1_macro |
|
|
value: 0.843215238686306 |
|
|
name: F1 Macro |
|
|
- type: f1_micro |
|
|
value: 0.8435163046243068 |
|
|
name: F1 Micro |
|
|
- type: f1_weighted |
|
|
value: 0.8438547382511594 |
|
|
name: F1 Weighted |
|
|
- task: |
|
|
type: cross-encoder-classification |
|
|
name: Cross Encoder Classification |
|
|
dataset: |
|
|
name: AllNLI test |
|
|
type: AllNLI-test |
|
|
metrics: |
|
|
- type: f1_macro |
|
|
value: 0.8442865676487733 |
|
|
name: F1 Macro |
|
|
- type: f1_micro |
|
|
value: 0.8446784696784697 |
|
|
name: F1 Micro |
|
|
- type: f1_weighted |
|
|
value: 0.8449960204914074 |
|
|
name: F1 Weighted |
|
|
--- |
|
|
|
|
|
# EttinX Cross-Encoder: Natural Language Inference (NLI) |
|
|
|
|
|
This cross encoder performs sequence classification for contradiction/neutral/entailment labels. This has |
|
|
drop-in compatibility with comparable sentence transformers cross encoders. |
|
|
|
|
|
To train this model, I added teacher logits to the all-nli dataset `dleemiller/all-nli-distill` from the |
|
|
`dleemiller/ModernCE-large-nli` model. This significantly improves performance above standard training. |
|
|
|
|
|
This 17m architecture is based on ModernBERT and is an excellent candidate for lightweight **CPU inference**. |
|
|
|
|
|
--- |
|
|
|
|
|
## Features |
|
|
- **High performing:** Achieves **80.47%** and **86.95%** (Micro F1) on MNLI mismatched and SNLI test. |
|
|
- **Efficient architecture:** Based on the Ettin-17m encoder design (17M parameters), offering faster inference speeds. |
|
|
- **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals. |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Model | MNLI Mismatched | SNLI Test | Context Length | # Parameters | |
|
|
|---------------------------|-------------------|--------------|----------------|----------------| |
|
|
| `dleemiller/ModernCE-large-nli` | **0.9202** | 0.9110 | 8192 | 395M | |
|
|
| `dleemiller/ModernCE-base-nli` | 0.9034 | 0.9025 | 8192 | 149M | |
|
|
| `cross-encoder/deberta-v3-large` | 0.9049 | 0.9220 | 512 | 435M | |
|
|
| `cross-encoder/deberta-v3-base` | 0.9004 | 0.9234 | 512 | 184M | |
|
|
| `cross-encoder/nli-distilroberta-base` | 0.8398 | 0.8838 | 512 | 82M | |
|
|
| `dleemiller/EttinX-nli-xxs` | 0.8047 | 0.8695 | 8192 | 17M | |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
To use EttinX for NLI tasks, you can load the model with the Hugging Face `sentence-transformers` library: |
|
|
|
|
|
```python |
|
|
from sentence_transformers import CrossEncoder |
|
|
|
|
|
# Load EttinX model |
|
|
model = CrossEncoder("dleemiller/EttinX-nli-xxs") |
|
|
|
|
|
scores = model.predict([ |
|
|
('A man is eating pizza', 'A man eats something'), |
|
|
('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.') |
|
|
]) |
|
|
|
|
|
# Convert scores to labels |
|
|
label_mapping = ['contradiction', 'entailment', 'neutral'] |
|
|
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)] |
|
|
# ['entailment', 'contradiction'] |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Pretraining |
|
|
We initialize the `` weights. |
|
|
|
|
|
Details: |
|
|
- Batch size: 512 |
|
|
- Learning rate: 1e-4 |
|
|
- **Attention Dropout:** attention dropout 0.1 |
|
|
|
|
|
### Fine-Tuning |
|
|
Fine-tuning was performed on the `dleemiller/all-nli-distill` dataset. |
|
|
|
|
|
### Validation Results |
|
|
The model achieved the following test set micro f1 performance after fine-tuning: |
|
|
- **MNLI Unmatched:** 0.8047 |
|
|
- **SNLI:** 0.8695 |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Card |
|
|
|
|
|
- **Architecture:** Ettin-encoder-17m |
|
|
- **Fine-Tuning Data:** `dleemiller/all-nli-distill` |
|
|
|
|
|
--- |
|
|
|
|
|
## Thank You |
|
|
|
|
|
Thanks to the Johns Hopkins team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{moderncenli2025, |
|
|
author = {Miller, D. Lee}, |
|
|
title = {EttinX NLI: An NLI cross encoder model}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face Hub}, |
|
|
url = {https://huggingface.co/dleemiller/EttinX-nli-xxs}, |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the [MIT License](LICENSE). |