SentenceTransformer based on answerdotai/ModernBERT-base

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("as-bessonov/reranker_searchengines_cos2")
# Run inference
sentences = [
    'are there any csi shows still on?',
    'Natalie Davis (a.k.a. "The Miniature Serial Killer") is a fictional character on the CBS crime drama CSI: Crime Scene Investigation, portrayed by Jessica Collins. The Miniature Killer was introduced in the seventh-season premiere; after a season-long arc, she was identified as Natalie Davis in the finale.',
    'The answer is Ne. These 3 elements belong to the same period (row) with Ne having 1 more proton ( and electron) than F, which itself has one more proton ( and electron) than O. ... Hence, Ne has a smaller atomic radius.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.0462, -0.1366],
#         [ 0.0462,  1.0000,  0.2051],
#         [-0.1366,  0.2051,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,966,986 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 8 tokens
    • mean: 12.03 tokens
    • max: 22 tokens
    • min: 12 tokens
    • mean: 57.41 tokens
    • max: 125 tokens
    • min: 0.0
    • mean: 0.18
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    are socks safe for babies? While you may be able to skip socks during summer, they're an essential layer during most months of the year. This is especially true during winter when thick socks can prevent hypothermia and illness. When you and baby leave the house in cold weather, always pack one or two extra pairs of socks in your diaper bag. 1.0
    are socks safe for babies? Crew socks: This is the most common length, but crew socks are far from average! This height falls in the middle of the calf and pairs well with any shoe. ... Trouser socks/tall socks/mid-calf socks: Trouser socks tend to be a bit higher than your average crew sock, but they don't completely cover the calf. 0.0
    are socks safe for babies? In fact Birkenstocks are almost made to be easily worn with socks. You can go with the outdoorsie wool and hiking socks in your basic earth colors or you can add some pizzaz and get some “statement” socks to spice it up a bit. ... So, yeah, you can wear socks with your Birks. 0.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 512
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 4

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0026 10 0.6724
0.0052 20 0.6722
0.0078 30 0.6434
0.0104 40 0.613
0.0130 50 0.4804
0.0156 60 0.2451
0.0182 70 0.1604
0.0208 80 0.1501
0.0234 90 0.1533
0.0260 100 0.152
0.0286 110 0.1466
0.0312 120 0.1487
0.0338 130 0.1438
0.0364 140 0.1487
0.0390 150 0.1453
0.0416 160 0.1472
0.0442 170 0.143
0.0469 180 0.144
0.0495 190 0.1472
0.0521 200 0.1448
0.0547 210 0.1439
0.0573 220 0.1415
0.0599 230 0.1373
0.0625 240 0.1487
0.0651 250 0.1477
0.0677 260 0.1442
0.0703 270 0.1441
0.0729 280 0.1458
0.0755 290 0.1406
0.0781 300 0.1439
0.0807 310 0.1438
0.0833 320 0.1457
0.0859 330 0.1439
0.0885 340 0.1373
0.0911 350 0.1422
0.0937 360 0.1455
0.0963 370 0.1406
0.0989 380 0.1458
0.1015 390 0.1406
0.1041 400 0.1447
0.1067 410 0.1379
0.1093 420 0.1433
0.1119 430 0.1408
0.1145 440 0.1421
0.1171 450 0.1375
0.1197 460 0.1434
0.1223 470 0.1384
0.1249 480 0.1407
0.1275 490 0.1429
0.1301 500 0.1365
0.1327 510 0.1438
0.1353 520 0.1379
0.1379 530 0.1397
0.1406 540 0.1378
0.1432 550 0.143
0.1458 560 0.1368
0.1484 570 0.1408
0.1510 580 0.1424
0.1536 590 0.1361
0.1562 600 0.1396
0.1588 610 0.1349
0.1614 620 0.1347
0.1640 630 0.1328
0.1666 640 0.1389
0.1692 650 0.1297
0.1718 660 0.1331
0.1744 670 0.1309
0.1770 680 0.1348
0.1796 690 0.128
0.1822 700 0.1302
0.1848 710 0.1281
0.1874 720 0.1306
0.1900 730 0.1329
0.1926 740 0.1294
0.1952 750 0.1289
0.1978 760 0.1235
0.2004 770 0.1233
0.2030 780 0.1271
0.2056 790 0.1248
0.2082 800 0.1227
0.2108 810 0.1271
0.2134 820 0.1225
0.2160 830 0.1261
0.2186 840 0.128
0.2212 850 0.1238
0.2238 860 0.1283
0.2264 870 0.1281
0.2290 880 0.1291
0.2317 890 0.1275
0.2343 900 0.1285
0.2369 910 0.1262
0.2395 920 0.1184
0.2421 930 0.1205
0.2447 940 0.1228
0.2473 950 0.1281
0.2499 960 0.125
0.2525 970 0.1247
0.2551 980 0.1225
0.2577 990 0.1239
0.2603 1000 0.1228
0.2629 1010 0.1215
0.2655 1020 0.1211
0.2681 1030 0.1222
0.2707 1040 0.1242
0.2733 1050 0.1176
0.2759 1060 0.1208
0.2785 1070 0.1172
0.2811 1080 0.1234
0.2837 1090 0.1206
0.2863 1100 0.1202
0.2889 1110 0.116
0.2915 1120 0.117
0.2941 1130 0.1207
0.2967 1140 0.1214
0.2993 1150 0.1206
0.3019 1160 0.1183
0.3045 1170 0.1265
0.3071 1180 0.1225
0.3097 1190 0.1179
0.3123 1200 0.1205
0.3149 1210 0.1186
0.3175 1220 0.1199
0.3201 1230 0.1189
0.3227 1240 0.1142
0.3254 1250 0.1225
0.3280 1260 0.1206
0.3306 1270 0.1164
0.3332 1280 0.1208
0.3358 1290 0.1163
0.3384 1300 0.1148
0.3410 1310 0.1118
0.3436 1320 0.1174
0.3462 1330 0.1196
0.3488 1340 0.1128
0.3514 1350 0.1125
0.3540 1360 0.1108
0.3566 1370 0.114
0.3592 1380 0.1197
0.3618 1390 0.115
0.3644 1400 0.1158
0.3670 1410 0.1099
0.3696 1420 0.1122
0.3722 1430 0.1121
0.3748 1440 0.1133
0.3774 1450 0.1105
0.3800 1460 0.1163
0.3826 1470 0.1149
0.3852 1480 0.1119
0.3878 1490 0.112
0.3904 1500 0.1125
0.3930 1510 0.1182
0.3956 1520 0.11
0.3982 1530 0.1102
0.4008 1540 0.108
0.4034 1550 0.1109
0.4060 1560 0.1211
0.4086 1570 0.1123
0.4112 1580 0.1134
0.4138 1590 0.1157
0.4164 1600 0.1103
0.4191 1610 0.1146
0.4217 1620 0.1106
0.4243 1630 0.1141
0.4269 1640 0.1107
0.4295 1650 0.1132
0.4321 1660 0.1067
0.4347 1670 0.1136
0.4373 1680 0.1107
0.4399 1690 0.1103
0.4425 1700 0.1068
0.4451 1710 0.1118
0.4477 1720 0.1098
0.4503 1730 0.1113
0.4529 1740 0.1132
0.4555 1750 0.1136
0.4581 1760 0.1079
0.4607 1770 0.1124
0.4633 1780 0.1061
0.4659 1790 0.1099
0.4685 1800 0.1075
0.4711 1810 0.1097
0.4737 1820 0.1083
0.4763 1830 0.1117
0.4789 1840 0.1061
0.4815 1850 0.1076
0.4841 1860 0.1102
0.4867 1870 0.1098
0.4893 1880 0.1066
0.4919 1890 0.1082
0.4945 1900 0.1142
0.4971 1910 0.1081
0.4997 1920 0.1089
0.5023 1930 0.1076
0.5049 1940 0.1055
0.5075 1950 0.1097
0.5102 1960 0.105
0.5128 1970 0.1061
0.5154 1980 0.1064
0.5180 1990 0.111
0.5206 2000 0.1032
0.5232 2010 0.1061
0.5258 2020 0.1099
0.5284 2030 0.1093
0.5310 2040 0.1084
0.5336 2050 0.112
0.5362 2060 0.1034
0.5388 2070 0.1088
0.5414 2080 0.1067
0.5440 2090 0.1175
0.5466 2100 0.111
0.5492 2110 0.104
0.5518 2120 0.1081
0.5544 2130 0.1086
0.5570 2140 0.1045
0.5596 2150 0.106
0.5622 2160 0.1125
0.5648 2170 0.109
0.5674 2180 0.103
0.5700 2190 0.1035
0.5726 2200 0.1069
0.5752 2210 0.1077
0.5778 2220 0.1036
0.5804 2230 0.1099
0.5830 2240 0.1092
0.5856 2250 0.1028
0.5882 2260 0.1043
0.5908 2270 0.1054
0.5934 2280 0.1021
0.5960 2290 0.1078
0.5986 2300 0.1054
0.6012 2310 0.108
0.6039 2320 0.104
0.6065 2330 0.1028
0.6091 2340 0.1086
0.6117 2350 0.1061
0.6143 2360 0.1062
0.6169 2370 0.1082
0.6195 2380 0.1056
0.6221 2390 0.1043
0.6247 2400 0.1066
0.6273 2410 0.1091
0.6299 2420 0.1035
0.6325 2430 0.1058
0.6351 2440 0.1065
0.6377 2450 0.1055
0.6403 2460 0.1046
0.6429 2470 0.1011
0.6455 2480 0.1043
0.6481 2490 0.11
0.6507 2500 0.1029
0.6533 2510 0.1025
0.6559 2520 0.1052
0.6585 2530 0.1071
0.6611 2540 0.1065
0.6637 2550 0.1054
0.6663 2560 0.106
0.6689 2570 0.1075
0.6715 2580 0.1012
0.6741 2590 0.1049
0.6767 2600 0.1051
0.6793 2610 0.1013
0.6819 2620 0.0972
0.6845 2630 0.1102
0.6871 2640 0.106
0.6897 2650 0.1039
0.6923 2660 0.1066
0.6950 2670 0.1044
0.6976 2680 0.1036
0.7002 2690 0.1023
0.7028 2700 0.1024
0.7054 2710 0.1011
0.7080 2720 0.1021
0.7106 2730 0.106
0.7132 2740 0.1053
0.7158 2750 0.0988
0.7184 2760 0.1006
0.7210 2770 0.0983
0.7236 2780 0.1083
0.7262 2790 0.1042
0.7288 2800 0.1045
0.7314 2810 0.1025
0.7340 2820 0.1066
0.7366 2830 0.1019
0.7392 2840 0.1023
0.7418 2850 0.1007
0.7444 2860 0.1033
0.7470 2870 0.1056
0.7496 2880 0.1008
0.7522 2890 0.1027
0.7548 2900 0.1045
0.7574 2910 0.1003
0.7600 2920 0.1063
0.7626 2930 0.1081
0.7652 2940 0.1002
0.7678 2950 0.1021
0.7704 2960 0.1003
0.7730 2970 0.1015
0.7756 2980 0.104
0.7782 2990 0.1049
0.7808 3000 0.1034
0.7834 3010 0.1021
0.7860 3020 0.0998
0.7887 3030 0.0965
0.7913 3040 0.1059
0.7939 3050 0.1045
0.7965 3060 0.1029
0.7991 3070 0.1028
0.8017 3080 0.1019
0.8043 3090 0.104
0.8069 3100 0.101
0.8095 3110 0.103
0.8121 3120 0.1001
0.8147 3130 0.1
0.8173 3140 0.1042
0.8199 3150 0.1039
0.8225 3160 0.104
0.8251 3170 0.1031
0.8277 3180 0.1045
0.8303 3190 0.1018
0.8329 3200 0.1006
0.8355 3210 0.1011
0.8381 3220 0.1028
0.8407 3230 0.0964
0.8433 3240 0.1027
0.8459 3250 0.098
0.8485 3260 0.1001
0.8511 3270 0.1014
0.8537 3280 0.1027
0.8563 3290 0.0999
0.8589 3300 0.1013
0.8615 3310 0.1014
0.8641 3320 0.1023
0.8667 3330 0.1038
0.8693 3340 0.0993
0.8719 3350 0.1011
0.8745 3360 0.1054
0.8771 3370 0.1003
0.8798 3380 0.1012
0.8824 3390 0.1015
0.8850 3400 0.1023
0.8876 3410 0.1026
0.8902 3420 0.1003
0.8928 3430 0.0989
0.8954 3440 0.1045
0.8980 3450 0.1039
0.9006 3460 0.0998
0.9032 3470 0.1038
0.9058 3480 0.1012
0.9084 3490 0.1023
0.9110 3500 0.1001
0.9136 3510 0.1058
0.9162 3520 0.1042
0.9188 3530 0.0995
0.9214 3540 0.0988
0.9240 3550 0.0996
0.9266 3560 0.1008
0.9292 3570 0.1016
0.9318 3580 0.1052
0.9344 3590 0.1038
0.9370 3600 0.1014
0.9396 3610 0.1018
0.9422 3620 0.0987
0.9448 3630 0.1021
0.9474 3640 0.1015
0.9500 3650 0.0983
0.9526 3660 0.1022
0.9552 3670 0.1075
0.9578 3680 0.1049
0.9604 3690 0.0993
0.9630 3700 0.1014
0.9656 3710 0.0984
0.9682 3720 0.0963
0.9708 3730 0.1052
0.9735 3740 0.0958
0.9761 3750 0.1003
0.9787 3760 0.1046
0.9813 3770 0.1044
0.9839 3780 0.1036
0.9865 3790 0.1027
0.9891 3800 0.1006
0.9917 3810 0.1023
0.9943 3820 0.0992
0.9969 3830 0.1014
0.9995 3840 0.1008

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.0.0
  • Transformers: 4.52.4
  • PyTorch: 2.7.0a0+79aa17489c.nv25.04
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
40
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for as-bessonov/reranker_searchengines_cos2

Finetuned
(629)
this model