all-MiniLM-L6-v17-pair_score

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'electronic instrument',
    'sirlion',
    'Salad',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0122 100 10.562
0.0244 200 10.0184
0.0366 300 9.398
0.0488 400 8.8197
0.0610 500 8.3899
0.0733 600 7.8989
0.0855 700 7.6515
0.0977 800 7.3998
0.1099 900 7.166
0.1221 1000 6.9383
0.1343 1100 6.6043
0.1465 1200 6.3584
0.1587 1300 6.0252
0.1709 1400 5.7639
0.1831 1500 5.6496
0.1953 1600 5.2169
0.2075 1700 5.1389
0.2198 1800 4.9316
0.2320 1900 4.8547
0.2442 2000 4.6022
0.2564 2100 4.7122
0.2686 2200 4.5965
0.2808 2300 3.9285
0.2930 2400 4.0168
0.3052 2500 4.2677
0.3174 2600 4.147
0.3296 2700 4.101
0.3418 2800 3.8629
0.3540 2900 3.86
0.3663 3000 3.5607
0.3785 3100 3.8495
0.3907 3200 3.5558
0.4029 3300 3.7251
0.4151 3400 3.5233
0.4273 3500 3.8677
0.4395 3600 3.3688
0.4517 3700 3.479
0.4639 3800 3.1691
0.4761 3900 3.1791
0.4883 4000 3.2925
0.5005 4100 2.6573
0.5128 4200 2.8804
0.5250 4300 3.0418
0.5372 4400 2.7162
0.5494 4500 2.8449
0.5616 4600 2.7159
0.5738 4700 2.5733
0.5860 4800 2.5866
0.5982 4900 2.9195
0.6104 5000 2.0384
0.6226 5100 2.6745
0.6348 5200 2.3901
0.6471 5300 2.2872
0.6593 5400 2.0086
0.6715 5500 2.198
0.6837 5600 1.9139
0.6959 5700 2.0432
0.7081 5800 2.1445
0.7203 5900 2.5626
0.7325 6000 2.1707
0.7447 6100 2.1568
0.7569 6200 2.0102
0.7691 6300 2.0012
0.7813 6400 1.8381
0.7936 6500 1.7552
0.8058 6600 1.9704
0.8180 6700 1.6397
0.8302 6800 1.8857
0.8424 6900 1.8036
0.8546 7000 1.721
0.8668 7100 1.6888
0.8790 7200 1.7908
0.8912 7300 1.5851
0.9034 7400 1.7986
0.9156 7500 1.2549
0.9278 7600 1.5765
0.9401 7700 1.4524
0.9523 7800 1.2767
0.9645 7900 1.1604
0.9767 8000 1.557
0.9889 8100 1.1124
1.0011 8200 1.3092
1.0133 8300 1.598
1.0255 8400 1.6242
1.0377 8500 1.4893
1.0499 8600 1.0693
1.0621 8700 0.9369
1.0743 8800 1.1275
1.0866 8900 1.3307
1.0988 9000 1.0498
1.1110 9100 1.2496
1.1232 9200 1.1011
1.1354 9300 1.0483
1.1476 9400 1.2593
1.1598 9500 0.9409
1.1720 9600 1.0609
1.1842 9700 1.1829
1.1964 9800 1.0511
1.2086 9900 0.919
1.2209 10000 0.9473
1.2331 10100 1.2604
1.2453 10200 1.17
1.2575 10300 1.181
1.2697 10400 0.9092
1.2819 10500 0.9655
1.2941 10600 1.058
1.3063 10700 1.283
1.3185 10800 1.1552
1.3307 10900 0.858
1.3429 11000 0.8581
1.3551 11100 1.1272
1.3674 11200 1.0127
1.3796 11300 0.7372
1.3918 11400 0.913
1.4040 11500 0.8728
1.4162 11600 1.1358
1.4284 11700 0.9387
1.4406 11800 0.8424
1.4528 11900 0.8999
1.4650 12000 1.2505
1.4772 12100 1.0151
1.4894 12200 0.8013
1.5016 12300 1.1422
1.5139 12400 1.1518
1.5261 12500 1.0553
1.5383 12600 0.9228
1.5505 12700 1.2036
1.5627 12800 1.1064
1.5749 12900 0.7599
1.5871 13000 0.6376
1.5993 13100 1.002
1.6115 13200 0.9072
1.6237 13300 0.9645
1.6359 13400 0.9208
1.6482 13500 1.1439
1.6604 13600 1.3721
1.6726 13700 0.8702
1.6848 13800 0.9476
1.6970 13900 1.1247
1.7092 14000 1.1059
1.7214 14100 0.9272
1.7336 14200 0.8893
1.7458 14300 0.6242
1.7580 14400 0.6779
1.7702 14500 0.7436
1.7824 14600 0.7655
1.7947 14700 0.7952
1.8069 14800 1.1916
1.8191 14900 0.7219
1.8313 15000 0.7313
1.8435 15100 0.8224
1.8557 15200 0.8756
1.8679 15300 0.622
1.8801 15400 1.0309
1.8923 15500 0.7322
1.9045 15600 0.9327
1.9167 15700 0.8632
1.9289 15800 1.0087
1.9412 15900 0.6738
1.9534 16000 0.8936
1.9656 16100 0.8083
1.9778 16200 0.7114
1.9900 16300 0.9119

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu118
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
3
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for youssefkhalil320/all-MiniLM-L6-v17-pair_score

Finetuned
(451)
this model