CrossEncoder based on answerdotai/ModernBERT-base

This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 8192 tokens
  • Number of Output Labels: 1 label
  • Language: en

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("sentence_transformers_model_id")
# Get scores for pairs of texts
pairs = [
    ['should you take ibuprofen with high blood pressure?', "In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor."],
    ['how old do you have to be to work in sc?', 'The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.'],
    ['how to write a topic proposal for a research paper?', "['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']"],
    ['how much does aaf pay players?', 'These dates provided an opportunity for players cut at the NFL roster deadline, and each player signed a non-guaranteed three-year contract worth a total of $250,000 ($70,000 in 2019; $80,000 in 2020; $100,000 in 2021), with performance-based and fan-interaction incentives allowing for players to earn more.'],
    ['is jove and zeus the same?', 'Jupiter, or Jove, in Roman mythology is the king of the gods and the god of sky and thunder, equivalent to Zeus in Greek traditions.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'should you take ibuprofen with high blood pressure?',
    [
        "In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor.",
        'The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.',
        "['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']",
        'These dates provided an opportunity for players cut at the NFL roster deadline, and each player signed a non-guaranteed three-year contract worth a total of $250,000 ($70,000 in 2019; $80,000 in 2020; $100,000 in 2021), with performance-based and fan-interaction incentives allowing for players to earn more.',
        'Jupiter, or Jove, in Roman mythology is the king of the gods and the god of sky and thunder, equivalent to Zeus in Greek traditions.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric gooaq-dev NanoMSMARCO NanoNFCorpus NanoNQ
map 0.7386 (+0.0063) 0.5463 (+0.0567) 0.3300 (+0.0595) 0.6707 (+0.2500)
mrr@10 0.7360 (+0.0068) 0.5401 (+0.0626) 0.5409 (+0.0410) 0.6737 (+0.2471)
ndcg@10 0.7880 (+0.0064) 0.6203 (+0.0799) 0.3660 (+0.0410) 0.7246 (+0.2240)

Cross Encoder Nano BEIR

Metric Value
map 0.5157 (+0.1221)
mrr@10 0.5849 (+0.1169)
ndcg@10 0.5703 (+0.1149)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 580,740 training samples
  • Columns: query, response, and label
  • Approximate statistics based on the first 1000 samples:
    query response label
    type string string int
    details
    • min: 17 characters
    • mean: 42.5 characters
    • max: 91 characters
    • min: 51 characters
    • mean: 253.83 characters
    • max: 385 characters
    • 1: 100.00%
  • Samples:
    query response label
    what is the difference between a certificate and associate's degree? Certificate degrees are extremely focused in their objective(s) and are related to a specific job or career niche. ... Certificates are often obtained as an add-on to an associate degree. Associate degree programs require two years of full-time classroom attendance in order to complete a degree. 1
    what is the difference between 5star and inverter ac? An inverter AC works on variable speed compressor whereas a 5-star rated non-inverter AC have single speed compressor. It changes its speed as per the heat load and number of people. The need of Stabilizer: A stabilizer is installed with the AC to maintain an optimum voltage range during the power fluctuations. 1
    what is the difference between gas and electric cars? A gas-powered car has a fuel tank, which supplies gasoline to the engine. The engine then turns a transmission, which turns the wheels. Move your mouse over the parts for a 3-D view. An electric car, on the other hand, has a set of batteries that provides electricity to an electric motor. 1
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fct": "torch.nn.modules.linear.Identity",
        "pos_weight": 5
    }
    

Evaluation Dataset

gooaq

  • Dataset: gooaq at b089f72
  • Size: 3,012,496 evaluation samples
  • Columns: query, response, and label
  • Approximate statistics based on the first 1000 samples:
    query response label
    type string string int
    details
    • min: 18 characters
    • mean: 43.05 characters
    • max: 88 characters
    • min: 51 characters
    • mean: 252.39 characters
    • max: 386 characters
    • 1: 100.00%
  • Samples:
    query response label
    should you take ibuprofen with high blood pressure? In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor. 1
    how old do you have to be to work in sc? The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor. 1
    how to write a topic proposal for a research paper? ['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.'] 1
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fct": "torch.nn.modules.linear.Identity",
        "pos_weight": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 4
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooaq-dev_ndcg@10 NanoMSMARCO_ndcg@10 NanoNFCorpus_ndcg@10 NanoNQ_ndcg@10 NanoBEIR_mean_ndcg@10
-1 -1 - - 0.1879 (-0.5937) 0.0748 (-0.4656) 0.2012 (-0.1238) 0.0414 (-0.4592) 0.1058 (-0.3496)
0.0001 1 1.1971 - - - - - -
0.0220 200 1.1557 - - - - - -
0.0441 400 0.9119 - - - - - -
0.0661 600 0.5124 - - - - - -
0.0882 800 0.4225 - - - - - -
0.1102 1000 0.3876 1.3811 0.7192 (-0.0624) 0.5171 (-0.0233) 0.3438 (+0.0187) 0.5647 (+0.0641) 0.4752 (+0.0198)
0.1322 1200 0.3563 - - - - - -
0.1543 1400 0.3155 - - - - - -
0.1763 1600 0.3181 - - - - - -
0.1983 1800 0.289 - - - - - -
0.2204 2000 0.283 0.6710 0.7528 (-0.0289) 0.5559 (+0.0155) 0.3445 (+0.0194) 0.6592 (+0.1585) 0.5198 (+0.0645)
0.2424 2200 0.2745 - - - - - -
0.2645 2400 0.2575 - - - - - -
0.2865 2600 0.2762 - - - - - -
0.3085 2800 0.2489 - - - - - -
0.3306 3000 0.2259 0.7575 0.7696 (-0.0121) 0.4982 (-0.0422) 0.3555 (+0.0305) 0.6483 (+0.1476) 0.5007 (+0.0453)
0.3526 3200 0.2576 - - - - - -
0.3747 3400 0.2384 - - - - - -
0.3967 3600 0.2431 - - - - - -
0.4187 3800 0.206 - - - - - -
0.4408 4000 0.2381 0.9594 0.7774 (-0.0042) 0.5649 (+0.0245) 0.3666 (+0.0416) 0.6842 (+0.1836) 0.5386 (+0.0832)
0.4628 4200 0.2196 - - - - - -
0.4848 4400 0.2153 - - - - - -
0.5069 4600 0.217 - - - - - -
0.5289 4800 0.1982 - - - - - -
0.5510 5000 0.2172 0.6249 0.7864 (+0.0047) 0.6029 (+0.0625) 0.3833 (+0.0583) 0.7029 (+0.2022) 0.5630 (+0.1077)
0.5730 5200 0.2145 - - - - - -
0.5950 5400 0.213 - - - - - -
0.6171 5600 0.2117 - - - - - -
0.6391 5800 0.2102 - - - - - -
0.6612 6000 0.2125 0.7420 0.7834 (+0.0017) 0.5907 (+0.0503) 0.3771 (+0.0521) 0.7176 (+0.2169) 0.5618 (+0.1064)
0.6832 6200 0.1995 - - - - - -
0.7052 6400 0.1978 - - - - - -
0.7273 6600 0.1857 - - - - - -
0.7493 6800 0.1811 - - - - - -
0.7713 7000 0.2055 1.1528 0.7827 (+0.0011) 0.6152 (+0.0748) 0.3730 (+0.0480) 0.7190 (+0.2184) 0.5691 (+0.1137)
0.7934 7200 0.1855 - - - - - -
0.8154 7400 0.1829 - - - - - -
0.8375 7600 0.1901 - - - - - -
0.8595 7800 0.1862 - - - - - -
0.8815 8000 0.1858 0.6424 0.7880 (+0.0064) 0.6203 (+0.0799) 0.3660 (+0.0410) 0.7246 (+0.2240) 0.5703 (+0.1149)
0.9036 8200 0.1545 - - - - - -
0.9256 8400 0.1729 - - - - - -
0.9477 8600 0.1657 - - - - - -
0.9697 8800 0.1698 - - - - - -
0.9917 9000 0.1658 0.6904 0.7898 (+0.0081) 0.6011 (+0.0606) 0.3612 (+0.0361) 0.7165 (+0.2159) 0.5596 (+0.1042)
-1 -1 - - 0.7880 (+0.0064) 0.6203 (+0.0799) 0.3660 (+0.0410) 0.7246 (+0.2240) 0.5703 (+0.1149)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.6.0.dev20241112+cu121
  • Accelerate: 1.2.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
4
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for sentence-transformers library.

Model tree for tomaarsen/reranker-ModernBERT-base-gooaq-bce

Finetuned
(355)
this model

Dataset used to train tomaarsen/reranker-ModernBERT-base-gooaq-bce