Distemist-CE-Reranker / README.md

abecerr1

Add new CrossEncoder model

3ee6d97 verified 4 months ago

preview code

raw

history blame contribute delete

3.86 kB

metadata

tags:
  - sentence-transformers
  - cross-encoder
pipeline_tag: text-ranking
library_name: sentence-transformers

CrossEncoder

This is a Cross Encoder model trained using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Type: Cross Encoder
Maximum Sequence Length: 512 tokens
Number of Output Labels: 1 label

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("BSC-NLP4BIA/Distemist-CE-Reranker")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Training Details

Framework Versions

Python: 3.11.3
Sentence Transformers: 4.1.0
Transformers: 4.51.3
PyTorch: 2.7.0+cu126
Accelerate:
Datasets:
Tokenizers: 0.21.1

BSC-NLP4BIA
/

Distemist-CE-Reranker