CrossEncoder based on distilbert/distilroberta-base
This is a Cross Encoder model finetuned from distilbert/distilroberta-base on the quora-duplicates dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: distilbert/distilroberta-base
- Maximum Sequence Length: 514 tokens
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("sentence_transformers_model_id")
# Get scores for pairs...
pairs = [
['What is the step by step guide to invest in share market in india?', 'What is the step by step guide to invest in share market?'],
['What is the story of Kohinoor (Koh-i-Noor) Diamond?', 'What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?'],
['How can I increase the speed of my internet connection while using a VPN?', 'How can Internet speed be increased by hacking through DNS?'],
['Why am I mentally very lonely? How can I solve it?', 'Find the remainder when [math]23^{24}[/math] is divided by 24,23?'],
['Which one dissolve in water quikly sugar, salt, methane and carbon di oxide?', 'Which fish would survive in salt water?'],
]
scores = model.predict(pairs)
print(scores.shape)
# [5]
# ... or rank different texts based on similarity to a single text
ranks = model.rank(
'What is the step by step guide to invest in share market in india?',
[
'What is the step by step guide to invest in share market?',
'What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?',
'How can Internet speed be increased by hacking through DNS?',
'Find the remainder when [math]23^{24}[/math] is divided by 24,23?',
'Which fish would survive in salt water?',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Evaluation
Metrics
Cross Encoder Classification
- Datasets:
quora-duplicates-devandquora-duplicates-test - Evaluated with
CEClassificationEvaluator
| Metric | quora-duplicates-dev | quora-duplicates-test |
|---|---|---|
| accuracy | 0.8938 | 0.8938 |
| accuracy_threshold | 0.5089 | 0.5091 |
| f1 | 0.8612 | 0.8612 |
| f1_threshold | 0.3856 | 0.3858 |
| precision | 0.8183 | 0.8183 |
| recall | 0.9089 | 0.9089 |
| average_precision | 0.9203 | 0.9203 |
Training Details
Training Dataset
quora-duplicates
- Dataset: quora-duplicates at 451a485
- Size: 404,290 training samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 1 characters
- mean: 59.15 characters
- max: 354 characters
- min: 6 characters
- mean: 60.74 characters
- max: 399 characters
- 0: ~64.20%
- 1: ~35.80%
- Samples:
sentence1 sentence2 label What are the features of the Indian caste system?What triggers you the most when you play video games?0What is the best place to learn Mandarin Chinese in Singapore?What is the best place in Singapore for durian in December?0What will be Hillary Clinton's India policy if she wins the election?How would the bilateral relationship between India and the USA be under Hillary Clinton's presidency?1 - Loss:
BinaryCrossEntropyLoss
Evaluation Dataset
quora-duplicates
- Dataset: quora-duplicates at 451a485
- Size: 404,290 evaluation samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 11 characters
- mean: 57.9 characters
- max: 244 characters
- min: 12 characters
- mean: 59.33 characters
- max: 221 characters
- 0: ~62.00%
- 1: ~38.00%
- Samples:
sentence1 sentence2 label What is the step by step guide to invest in share market in india?What is the step by step guide to invest in share market?0What is the story of Kohinoor (Koh-i-Noor) Diamond?What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?0How can I increase the speed of my internet connection while using a VPN?How can Internet speed be increased by hacking through DNS?0 - Loss:
BinaryCrossEntropyLoss
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 64per_device_eval_batch_size: 64num_train_epochs: 1warmup_ratio: 0.1bf16: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | quora-duplicates-dev_average_precision | quora-duplicates-test_average_precision |
|---|---|---|---|---|---|
| -1 | -1 | - | - | 0.3711 | - |
| 0.0167 | 100 | 0.6574 | - | - | - |
| 0.0333 | 200 | 0.4804 | - | - | - |
| 0.0500 | 300 | 0.4406 | - | - | - |
| 0.0666 | 400 | 0.4208 | - | - | - |
| 0.0833 | 500 | 0.3929 | 0.3958 | 0.8210 | - |
| 0.0999 | 600 | 0.3986 | - | - | - |
| 0.1166 | 700 | 0.3743 | - | - | - |
| 0.1332 | 800 | 0.3938 | - | - | - |
| 0.1499 | 900 | 0.3602 | - | - | - |
| 0.1665 | 1000 | 0.3714 | 0.3437 | 0.8565 | - |
| 0.1832 | 1100 | 0.3486 | - | - | - |
| 0.1998 | 1200 | 0.3479 | - | - | - |
| 0.2165 | 1300 | 0.3417 | - | - | - |
| 0.2331 | 1400 | 0.3425 | - | - | - |
| 0.2498 | 1500 | 0.3353 | 0.3264 | 0.8742 | - |
| 0.2664 | 1600 | 0.3335 | - | - | - |
| 0.2831 | 1700 | 0.3274 | - | - | - |
| 0.2998 | 1800 | 0.3284 | - | - | - |
| 0.3164 | 1900 | 0.3118 | - | - | - |
| 0.3331 | 2000 | 0.3073 | 0.3282 | 0.8826 | - |
| 0.3497 | 2100 | 0.3233 | - | - | - |
| 0.3664 | 2200 | 0.3072 | - | - | - |
| 0.3830 | 2300 | 0.314 | - | - | - |
| 0.3997 | 2400 | 0.3065 | - | - | - |
| 0.4163 | 2500 | 0.3046 | 0.2877 | 0.8930 | - |
| 0.4330 | 2600 | 0.2857 | - | - | - |
| 0.4496 | 2700 | 0.285 | - | - | - |
| 0.4663 | 2800 | 0.2957 | - | - | - |
| 0.4829 | 2900 | 0.2965 | - | - | - |
| 0.4996 | 3000 | 0.2824 | 0.2842 | 0.8998 | - |
| 0.5162 | 3100 | 0.3019 | - | - | - |
| 0.5329 | 3200 | 0.2841 | - | - | - |
| 0.5495 | 3300 | 0.2981 | - | - | - |
| 0.5662 | 3400 | 0.2878 | - | - | - |
| 0.5828 | 3500 | 0.278 | 0.2803 | 0.9061 | - |
| 0.5995 | 3600 | 0.2841 | - | - | - |
| 0.6162 | 3700 | 0.2794 | - | - | - |
| 0.6328 | 3800 | 0.2808 | - | - | - |
| 0.6495 | 3900 | 0.27 | - | - | - |
| 0.6661 | 4000 | 0.2719 | 0.2697 | 0.9091 | - |
| 0.6828 | 4100 | 0.2792 | - | - | - |
| 0.6994 | 4200 | 0.2669 | - | - | - |
| 0.7161 | 4300 | 0.2696 | - | - | - |
| 0.7327 | 4400 | 0.2642 | - | - | - |
| 0.7494 | 4500 | 0.2684 | 0.2591 | 0.9140 | - |
| 0.7660 | 4600 | 0.2593 | - | - | - |
| 0.7827 | 4700 | 0.2756 | - | - | - |
| 0.7993 | 4800 | 0.2584 | - | - | - |
| 0.8160 | 4900 | 0.2525 | - | - | - |
| 0.8326 | 5000 | 0.267 | 0.2540 | 0.9168 | - |
| 0.8493 | 5100 | 0.2612 | - | - | - |
| 0.8659 | 5200 | 0.2607 | - | - | - |
| 0.8826 | 5300 | 0.2565 | - | - | - |
| 0.8993 | 5400 | 0.2432 | - | - | - |
| 0.9159 | 5500 | 0.2568 | 0.2489 | 0.9198 | - |
| 0.9326 | 5600 | 0.2572 | - | - | - |
| 0.9492 | 5700 | 0.2658 | - | - | - |
| 0.9659 | 5800 | 0.2568 | - | - | - |
| 0.9825 | 5900 | 0.2539 | - | - | - |
| 0.9992 | 6000 | 0.2458 | 0.2503 | 0.9203 | - |
| -1 | -1 | - | - | - | 0.9203 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.069 kWh
- Carbon Emitted: 0.027 kg of CO2
- Hours Used: 0.214 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 3.5.0.dev0
- Transformers: 4.49.0.dev0
- PyTorch: 2.5.0+cu121
- Accelerate: 1.3.0
- Datasets: 2.20.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- -
Model tree for tomaarsen/reranker-distilroberta-base-quora-duplicates
Base model
distilbert/distilroberta-baseDataset used to train tomaarsen/reranker-distilroberta-base-quora-duplicates
Evaluation results
- Accuracy on quora duplicates devself-reported0.894
- Accuracy Threshold on quora duplicates devself-reported0.509
- F1 on quora duplicates devself-reported0.861
- F1 Threshold on quora duplicates devself-reported0.386
- Precision on quora duplicates devself-reported0.818
- Recall on quora duplicates devself-reported0.909
- Average Precision on quora duplicates devself-reported0.920
- Accuracy on quora duplicates testself-reported0.894
- Accuracy Threshold on quora duplicates testself-reported0.509
- F1 on quora duplicates testself-reported0.861