CrossEncoder based on answerdotai/ModernBERT-base
This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: answerdotai/ModernBERT-base
- Maximum Sequence Length: 8192 tokens
- Number of Output Labels: 1 label
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("sentence_transformers_model_id")
# Get scores for pairs of texts
pairs = [
['should you take ibuprofen with high blood pressure?', "In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor."],
['how old do you have to be to work in sc?', 'The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.'],
['how to write a topic proposal for a research paper?', "['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']"],
['how much does aaf pay players?', 'These dates provided an opportunity for players cut at the NFL roster deadline, and each player signed a non-guaranteed three-year contract worth a total of $250,000 ($70,000 in 2019; $80,000 in 2020; $100,000 in 2021), with performance-based and fan-interaction incentives allowing for players to earn more.'],
['is jove and zeus the same?', 'Jupiter, or Jove, in Roman mythology is the king of the gods and the god of sky and thunder, equivalent to Zeus in Greek traditions.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'should you take ibuprofen with high blood pressure?',
[
"In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor.",
'The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.',
"['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']",
'These dates provided an opportunity for players cut at the NFL roster deadline, and each player signed a non-guaranteed three-year contract worth a total of $250,000 ($70,000 in 2019; $80,000 in 2020; $100,000 in 2021), with performance-based and fan-interaction incentives allowing for players to earn more.',
'Jupiter, or Jove, in Roman mythology is the king of the gods and the god of sky and thunder, equivalent to Zeus in Greek traditions.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Evaluation
Metrics
Cross Encoder Reranking
- Datasets:
gooaq-dev
,NanoMSMARCO
,NanoNFCorpus
andNanoNQ
- Evaluated with
CERerankingEvaluator
Metric | gooaq-dev | NanoMSMARCO | NanoNFCorpus | NanoNQ |
---|---|---|---|---|
map | 0.7386 (+0.0063) | 0.5463 (+0.0567) | 0.3300 (+0.0595) | 0.6707 (+0.2500) |
mrr@10 | 0.7360 (+0.0068) | 0.5401 (+0.0626) | 0.5409 (+0.0410) | 0.6737 (+0.2471) |
ndcg@10 | 0.7880 (+0.0064) | 0.6203 (+0.0799) | 0.3660 (+0.0410) | 0.7246 (+0.2240) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_mean
- Evaluated with
CENanoBEIREvaluator
Metric | Value |
---|---|
map | 0.5157 (+0.1221) |
mrr@10 | 0.5849 (+0.1169) |
ndcg@10 | 0.5703 (+0.1149) |
Training Details
Training Dataset
Unnamed Dataset
- Size: 580,740 training samples
- Columns:
query
,response
, andlabel
- Approximate statistics based on the first 1000 samples:
query response label type string string int details - min: 17 characters
- mean: 42.5 characters
- max: 91 characters
- min: 51 characters
- mean: 253.83 characters
- max: 385 characters
- 1: 100.00%
- Samples:
query response label what is the difference between a certificate and associate's degree?
Certificate degrees are extremely focused in their objective(s) and are related to a specific job or career niche. ... Certificates are often obtained as an add-on to an associate degree. Associate degree programs require two years of full-time classroom attendance in order to complete a degree.
1
what is the difference between 5star and inverter ac?
An inverter AC works on variable speed compressor whereas a 5-star rated non-inverter AC have single speed compressor. It changes its speed as per the heat load and number of people. The need of Stabilizer: A stabilizer is installed with the AC to maintain an optimum voltage range during the power fluctuations.
1
what is the difference between gas and electric cars?
A gas-powered car has a fuel tank, which supplies gasoline to the engine. The engine then turns a transmission, which turns the wheels. Move your mouse over the parts for a 3-D view. An electric car, on the other hand, has a set of batteries that provides electricity to an electric motor.
1
- Loss:
BinaryCrossEntropyLoss
with these parameters:{ "activation_fct": "torch.nn.modules.linear.Identity", "pos_weight": 5 }
Evaluation Dataset
gooaq
- Dataset: gooaq at b089f72
- Size: 3,012,496 evaluation samples
- Columns:
query
,response
, andlabel
- Approximate statistics based on the first 1000 samples:
query response label type string string int details - min: 18 characters
- mean: 43.05 characters
- max: 88 characters
- min: 51 characters
- mean: 252.39 characters
- max: 386 characters
- 1: 100.00%
- Samples:
query response label should you take ibuprofen with high blood pressure?
In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor.
1
how old do you have to be to work in sc?
The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.
1
how to write a topic proposal for a research paper?
['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']
1
- Loss:
BinaryCrossEntropyLoss
with these parameters:{ "activation_fct": "torch.nn.modules.linear.Identity", "pos_weight": 5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 64per_device_eval_batch_size
: 64learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1seed
: 12bf16
: Truedataloader_num_workers
: 4load_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 64per_device_eval_batch_size
: 64per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 12data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 4dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | gooaq-dev_ndcg@10 | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
---|---|---|---|---|---|---|---|---|
-1 | -1 | - | - | 0.1879 (-0.5937) | 0.0748 (-0.4656) | 0.2012 (-0.1238) | 0.0414 (-0.4592) | 0.1058 (-0.3496) |
0.0001 | 1 | 1.1971 | - | - | - | - | - | - |
0.0220 | 200 | 1.1557 | - | - | - | - | - | - |
0.0441 | 400 | 0.9119 | - | - | - | - | - | - |
0.0661 | 600 | 0.5124 | - | - | - | - | - | - |
0.0882 | 800 | 0.4225 | - | - | - | - | - | - |
0.1102 | 1000 | 0.3876 | 1.3811 | 0.7192 (-0.0624) | 0.5171 (-0.0233) | 0.3438 (+0.0187) | 0.5647 (+0.0641) | 0.4752 (+0.0198) |
0.1322 | 1200 | 0.3563 | - | - | - | - | - | - |
0.1543 | 1400 | 0.3155 | - | - | - | - | - | - |
0.1763 | 1600 | 0.3181 | - | - | - | - | - | - |
0.1983 | 1800 | 0.289 | - | - | - | - | - | - |
0.2204 | 2000 | 0.283 | 0.6710 | 0.7528 (-0.0289) | 0.5559 (+0.0155) | 0.3445 (+0.0194) | 0.6592 (+0.1585) | 0.5198 (+0.0645) |
0.2424 | 2200 | 0.2745 | - | - | - | - | - | - |
0.2645 | 2400 | 0.2575 | - | - | - | - | - | - |
0.2865 | 2600 | 0.2762 | - | - | - | - | - | - |
0.3085 | 2800 | 0.2489 | - | - | - | - | - | - |
0.3306 | 3000 | 0.2259 | 0.7575 | 0.7696 (-0.0121) | 0.4982 (-0.0422) | 0.3555 (+0.0305) | 0.6483 (+0.1476) | 0.5007 (+0.0453) |
0.3526 | 3200 | 0.2576 | - | - | - | - | - | - |
0.3747 | 3400 | 0.2384 | - | - | - | - | - | - |
0.3967 | 3600 | 0.2431 | - | - | - | - | - | - |
0.4187 | 3800 | 0.206 | - | - | - | - | - | - |
0.4408 | 4000 | 0.2381 | 0.9594 | 0.7774 (-0.0042) | 0.5649 (+0.0245) | 0.3666 (+0.0416) | 0.6842 (+0.1836) | 0.5386 (+0.0832) |
0.4628 | 4200 | 0.2196 | - | - | - | - | - | - |
0.4848 | 4400 | 0.2153 | - | - | - | - | - | - |
0.5069 | 4600 | 0.217 | - | - | - | - | - | - |
0.5289 | 4800 | 0.1982 | - | - | - | - | - | - |
0.5510 | 5000 | 0.2172 | 0.6249 | 0.7864 (+0.0047) | 0.6029 (+0.0625) | 0.3833 (+0.0583) | 0.7029 (+0.2022) | 0.5630 (+0.1077) |
0.5730 | 5200 | 0.2145 | - | - | - | - | - | - |
0.5950 | 5400 | 0.213 | - | - | - | - | - | - |
0.6171 | 5600 | 0.2117 | - | - | - | - | - | - |
0.6391 | 5800 | 0.2102 | - | - | - | - | - | - |
0.6612 | 6000 | 0.2125 | 0.7420 | 0.7834 (+0.0017) | 0.5907 (+0.0503) | 0.3771 (+0.0521) | 0.7176 (+0.2169) | 0.5618 (+0.1064) |
0.6832 | 6200 | 0.1995 | - | - | - | - | - | - |
0.7052 | 6400 | 0.1978 | - | - | - | - | - | - |
0.7273 | 6600 | 0.1857 | - | - | - | - | - | - |
0.7493 | 6800 | 0.1811 | - | - | - | - | - | - |
0.7713 | 7000 | 0.2055 | 1.1528 | 0.7827 (+0.0011) | 0.6152 (+0.0748) | 0.3730 (+0.0480) | 0.7190 (+0.2184) | 0.5691 (+0.1137) |
0.7934 | 7200 | 0.1855 | - | - | - | - | - | - |
0.8154 | 7400 | 0.1829 | - | - | - | - | - | - |
0.8375 | 7600 | 0.1901 | - | - | - | - | - | - |
0.8595 | 7800 | 0.1862 | - | - | - | - | - | - |
0.8815 | 8000 | 0.1858 | 0.6424 | 0.7880 (+0.0064) | 0.6203 (+0.0799) | 0.3660 (+0.0410) | 0.7246 (+0.2240) | 0.5703 (+0.1149) |
0.9036 | 8200 | 0.1545 | - | - | - | - | - | - |
0.9256 | 8400 | 0.1729 | - | - | - | - | - | - |
0.9477 | 8600 | 0.1657 | - | - | - | - | - | - |
0.9697 | 8800 | 0.1698 | - | - | - | - | - | - |
0.9917 | 9000 | 0.1658 | 0.6904 | 0.7898 (+0.0081) | 0.6011 (+0.0606) | 0.3612 (+0.0361) | 0.7165 (+0.2159) | 0.5596 (+0.1042) |
-1 | -1 | - | - | 0.7880 (+0.0064) | 0.6203 (+0.0799) | 0.3660 (+0.0410) | 0.7246 (+0.2240) | 0.5703 (+0.1149) |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.10
- Sentence Transformers: 3.5.0.dev0
- Transformers: 4.49.0.dev0
- PyTorch: 2.6.0.dev20241112+cu121
- Accelerate: 1.2.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-classification models for sentence-transformers library.
Model tree for tomaarsen/reranker-ModernBERT-base-gooaq-bce
Base model
answerdotai/ModernBERT-base