CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric NanoMSMARCO NanoNFCorpus NanoNQ
map 0.5020 (+0.0124) 0.3389 (+0.0684) 0.5833 (+0.1626)
mrr@10 0.4884 (+0.0109) 0.5581 (+0.0582) 0.5848 (+0.1581)
ndcg@10 0.5545 (+0.0141) 0.3595 (+0.0345) 0.6487 (+0.1481)

Cross Encoder Nano BEIR

Metric Value
map 0.4747 (+0.0812)
mrr@10 0.5437 (+0.0757)
ndcg@10 0.5209 (+0.0655)

Training Details

Training Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 82,326 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 11 characters
    • mean: 33.24 characters
    • max: 101 characters
    • size: 10 elements
    • size: 10 elements
  • Samples:
    query docs labels
    what are fiber lasers ['From Wikipedia, the free encyclopedia. A fiber laser or fibre laser is a laser in which the active gain medium is an optical fiber doped with rare-earth elements such as erbium, ytterbium, neodymium, dysprosium, praseodymium, and thulium. They are related to doped fiber amplifiers, which provide light amplification without lasing. Many high-power fiber lasers are based on double-clad fiber. The gain medium forms the core of the fiber, which is surrounded by two layers of cladding. The lasing mode propagates in the core, while a multimode pump beam propagates in the inner cladding layer. The outer cladding keeps this pump light confined.', 'The fiber laser is a variation on the standard solid-state laser, with the medium being a clad fiber rather than a rod, a slab, or a disk. Laser light is emitted by a dopant in the central core of the fiber, and the core structure can range from simple to fairly complex. The doped fiber has a cavity mirror on each end; in practice, these are fiber ... [1, 0, 0, 0, 0, ...]
    fast can boar run ['A wild boar can run at speeds of 30-35mph which is about 48.3-56.3km/h. As for weight, a wild boar weighs around 52-91kg which is about 115-200 pounds. Wild boars are native to Europe, Africa, and some parts of Asia. The body of a wild boar is around 0.8-2 meters long which is about 2.6-6.6 feet long.', 'Wild Turkeys can run at speeds up to 25 mph, and they can fly up to 55 mph. However, if being hunted by someone for the Thanksgiving or Christmas table-Who know how fast the … y will run or fly!', 'A wild hog can reach speeds of up to 35 mph when running at full speed. A hippo can run over 30 mph! report this answer. Updated on Wednesday, February 01 2012 at 03:09PM EST. Source: www.texasboars.com/...', "Les. Brown bears-are extremely fast, capable of running in short bursts as high as of 40 mph (64 km/h). Polar bears-have been clocked at a top speed of 35 mph (56 km/h), along a a road in Churchill, Canada. Grizzly bears-can reach top speeds of up to 30 mph (48km/h), but they can't m... [1, 0, 0, 0, 0, ...]
    what plant would grow in shade ['Hostas are among the showiest and easy-to-grow perennial plants that grow in shade. They also offer the most variety of any of the multiple shade plants. Choose from miniatures that stay only a couple of inches wide or giants that sprawl 6 feet across or more. Japanese forestgrass (Hakonechloa macra) is a wonderful grass for plants that grow in shade. It offers a lovely waterfall-like habit and variegated varieties have bight gold, yellow, or white in the foliage.', 'Lilyturf (Liriope) is an easy-to-grow favorite shade plant. Loved for its grassy foliage and spikes of blue or white flowers in late summer, as well as its resistance to deer and rabbits, lilyturf is practically a plant-it-and-forget garden resident. It grows best in Zones 5-10 and grows a foot tall. Japanese forestgrass (Hakonechloa macra) is a wonderful grass for plants that grow in shade. It offers a lovely waterfall-like habit and variegated varieties have bight gold, yellow, or white in the foliage.', "Gardening in ... [1, 1, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "eps": 1e-10,
        "pad_value": -1
    }
    

Evaluation Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 82,326 evaluation samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 11 characters
    • mean: 33.6 characters
    • max: 97 characters
    • size: 10 elements
    • size: 10 elements
  • Samples:
    query docs labels
    can blue cheese cause mold allergic reaction ['Mold Allergy. The blue spots found in blue cheese are mold. If you’ve been diagnosed with a mold allergy, eating blue cheese can trigger common mold allergic reaction symptoms. Mold allergies commonly arise from airborne spores during the spring, summer and fall months. Inhaled mold spores cause inflammation in the eyes, throat and sinuses. If eating blue cheese causes inflammation to develop anywhere in your body, make an appointment with your doctor because you may have an allergy to one or more of its ingredients. Blue cheese contains two highly allergenic substances: milk and mold. Most symptoms caused by an allergic reaction are the result of inflammation in soft tissue in different parts of the body. Your doctor may recommend allergy testing to determine the cause of the inflammation', 'Blue cheese allergy is a condition that has puzzled food experts quite a bit. The unique gourmet cheese with a mottled appearance can cause your body to swell up making you feel extremely uncomf... [1, 0, 0, 0, 0, ...]
    what does it cost for a facebook ad ['Contributed by Jason Alleger. The cost of Facebook ads depends on a few factors, but generally ranges from $.05 – $5 per click. Facebook increases the cost of ads based on (a) targeting, (b) bids and (c) engagement. The more targeted your ads are, the more expensive they become. If you were to target ads to all Facebook users (all 1.06 billion), then you would pay just pennies. Sponsored Stories: 400 clicks to Facebook page – $200 ($.50 per click). Promoted Posts: 20,000 views – $100 ($5 per 1,000 views). It takes a lot of work to keep the cost-per-click down, as the advertiser needs to constantly be updating their ads to keep the cost low.', 'Can anyone who has advertised on facebook describe how much it cost you overall? Also, is there anyone who can mention if facebook advertising (and the specific type of facebook ad-social ad/etc, age group) was positive or negative for them in their ventures? Best Answer: Setting up an ad account and advertising on Facebook is easy. You can do ... [1, 0, 0, 0, 0, ...]
    how can ants get in dishwasher ["Full Answer. Ants usually find their way into a dishwasher through the dryer vents or the drain. Although most people's first reaction is to turn to pesticides to solve the problem, the chemicals contained in pesticides can be harmful for children and pets.", "No ants in the house. I've used traps on both sides of dishwasher and under the sink where the drain and supply holes are. We have put vinegar in the dishwasher drain & have let it sit there for three days and the ants still come back. They are only in side the dishwasher never on the counter ,floor, sink.", '1 Then leave them alone for a number of weeks. 2 Exterior: Sprinkle granular ant bait around ant hills, along ant trails; again, anywhere they appear. 3 Pets will not be injured by these baits. 4 The ants quickly take the bait below ground to the queen, destroying the colony.', "A: Empty the dishwasher completely, and pour 1 gallon of vinegar down the dishwasher's drain. Leave this for a few minutes so any ants appearin... [1, 0, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "eps": 1e-10,
        "pad_value": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_ndcg@10 NanoNFCorpus_ndcg@10 NanoNQ_ndcg@10 NanoBEIR_mean_ndcg@10
-1 -1 - - 0.0444 (-0.4960) 0.2663 (-0.0587) 0.0478 (-0.4528) 0.1195 (-0.3359)
0.0001 1 2.0806 - - - - -
0.0230 200 2.0875 - - - - -
0.0459 400 2.097 - - - - -
0.0689 600 2.0844 - - - - -
0.0918 800 2.0771 - - - - -
0.1148 1000 2.0699 - - - - -
0.1377 1200 2.0864 - - - - -
0.1607 1400 2.0676 - - - - -
0.1836 1600 2.0772 2.0761 0.5280 (-0.0125) 0.3529 (+0.0279) 0.5989 (+0.0983) 0.4933 (+0.0379)
0.2066 1800 2.0822 - - - - -
0.2295 2000 2.0777 - - - - -
0.2525 2200 2.075 - - - - -
0.2755 2400 2.0717 - - - - -
0.2984 2600 2.0854 - - - - -
0.3214 2800 2.0765 - - - - -
0.3443 3000 2.0678 - - - - -
0.3673 3200 2.076 2.0741 0.5368 (-0.0037) 0.3781 (+0.0531) 0.5847 (+0.0841) 0.4999 (+0.0445)
0.3902 3400 2.0749 - - - - -
0.4132 3600 2.0735 - - - - -
0.4361 3800 2.0636 - - - - -
0.4591 4000 2.0749 - - - - -
0.4820 4200 2.0745 - - - - -
0.5050 4400 2.0716 - - - - -
0.5279 4600 2.0741 - - - - -
0.5509 4800 2.0724 2.0735 0.5633 (+0.0229) 0.3703 (+0.0453) 0.6102 (+0.1095) 0.5146 (+0.0592)
0.5739 5000 2.0788 - - - - -
0.5968 5200 2.0711 - - - - -
0.6198 5400 2.0708 - - - - -
0.6427 5600 2.0645 - - - - -
0.6657 5800 2.0684 - - - - -
0.6886 6000 2.0731 - - - - -
0.7116 6200 2.0745 - - - - -
0.7345 6400 2.067 2.0722 0.5510 (+0.0105) 0.3441 (+0.0190) 0.5927 (+0.0921) 0.4959 (+0.0405)
0.7575 6600 2.0657 - - - - -
0.7804 6800 2.0798 - - - - -
0.8034 7000 2.0693 - - - - -
0.8264 7200 2.074 - - - - -
0.8493 7400 2.0744 - - - - -
0.8723 7600 2.0688 - - - - -
0.8952 7800 2.0515 - - - - -
0.9182 8000 2.0765 2.0723 0.5545 (+0.0141) 0.3595 (+0.0345) 0.6487 (+0.1481) 0.5209 (+0.0655)
0.9411 8200 2.0777 - - - - -
0.9641 8400 2.073 - - - - -
0.9870 8600 2.0726 - - - - -
-1 -1 - - 0.5545 (+0.0141) 0.3595 (+0.0345) 0.6487 (+0.1481) 0.5209 (+0.0655)
  • The bold row denotes the saved checkpoint.

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.236 kWh
  • Carbon Emitted: 0.092 kg of CO2
  • Hours Used: 0.862 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.48.3
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.20.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to rank: from pairwise approach to listwise approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}
Downloads last month
15
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for sentence-transformers library.

Model tree for tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet

Finetuned
(40)
this model

Dataset used to train tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet