all-MiniLM-L6-v9-pair_score

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'no artificial flavouring food',
    'rubber dog toy',
    'tourmaline ceramic brush',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0053 100 13.2077
0.0107 200 12.3835
0.016 300 10.7699
0.0213 400 9.2679
0.0267 500 8.2638
0.032 600 7.69
0.0373 700 7.2751
0.0427 800 6.8786
0.048 900 6.7811
0.0533 1000 6.5834
0.0587 1100 6.3517
0.064 1200 6.2272
0.0693 1300 6.1943
0.0747 1400 6.1038
0.08 1500 6.1216
0.0853 1600 6.1429
0.0907 1700 5.8876
0.096 1800 5.8074
0.1013 1900 5.6261
0.1067 2000 5.838
0.112 2100 5.7161
0.1173 2200 5.5388
0.1227 2300 5.5654
0.128 2400 5.5196
0.1333 2500 5.3665
0.1387 2600 5.2952
0.144 2700 5.4131
0.1493 2800 5.2104
0.1547 2900 5.2176
0.16 3000 4.9406
0.1653 3100 4.8781
0.1707 3200 5.08
0.176 3300 5.1495
0.1813 3400 4.8717
0.1867 3500 4.8196
0.192 3600 4.8065
0.1973 3700 4.718
0.2027 3800 4.7111
0.208 3900 4.6759
0.2133 4000 4.7733
0.2187 4100 4.7041
0.224 4200 4.7898
0.2293 4300 4.8974
0.2347 4400 4.4939
0.24 4500 4.4107
0.2453 4600 4.4831
0.2507 4700 4.4571
0.256 4800 4.1461
0.2613 4900 4.5198
0.2667 5000 4.4998
0.272 5100 4.2135
0.2773 5200 4.441
0.2827 5300 4.2669
0.288 5400 4.0964
0.2933 5500 4.2048
0.2987 5600 4.2123
0.304 5700 4.3391
0.3093 5800 4.3366
0.3147 5900 4.1775
0.32 6000 3.9954
0.3253 6100 4.141
0.3307 6200 4.09
0.336 6300 3.9517
0.3413 6400 3.9844
0.3467 6500 3.8902
0.352 6600 3.571
0.3573 6700 3.7686
0.3627 6800 3.7766
0.368 6900 4.0305
0.3733 7000 4.2835
0.3787 7100 3.8102
0.384 7200 3.5178
0.3893 7300 3.8828
0.3947 7400 3.9125
0.4 7500 3.8578
0.4053 7600 3.7391
0.4107 7700 3.7178
0.416 7800 3.6572
0.4213 7900 3.835
0.4267 8000 3.4354
0.432 8100 3.6725
0.4373 8200 3.2932
0.4427 8300 3.7056
0.448 8400 3.9801
0.4533 8500 3.7294
0.4587 8600 3.6412
0.464 8700 3.4301
0.4693 8800 3.4932
0.4747 8900 3.1855
0.48 9000 3.4505
0.4853 9100 3.4431
0.4907 9200 3.0782
0.496 9300 3.3604
0.5013 9400 3.3833
0.5067 9500 3.2887
0.512 9600 3.1361
0.5173 9700 3.7856
0.5227 9800 3.4907
0.528 9900 3.4553
0.5333 10000 3.2604
0.5387 10100 3.4325
0.544 10200 3.319
0.5493 10300 3.3623
0.5547 10400 3.4278
0.56 10500 3.0365
0.5653 10600 3.1647
0.5707 10700 3.387
0.576 10800 3.0888
0.5813 10900 3.2073
0.5867 11000 3.0386
0.592 11100 3.222
0.5973 11200 3.1902
0.6027 11300 3.2242
0.608 11400 2.9589
0.6133 11500 2.831
0.6187 11600 3.0551
0.624 11700 2.8091
0.6293 11800 3.2146
0.6347 11900 3.1964
0.64 12000 2.9525
0.6453 12100 3.2989
0.6507 12200 2.9683
0.656 12300 2.9026
0.6613 12400 3.1533
0.6667 12500 2.7657
0.672 12600 3.09
0.6773 12700 3.1612
0.6827 12800 2.9614
0.688 12900 3.0533
0.6933 13000 2.7601
0.6987 13100 2.9242
0.704 13200 2.5517
0.7093 13300 2.9859
0.7147 13400 2.7317
0.72 13500 2.7578
0.7253 13600 3.1413
0.7307 13700 3.0612
0.736 13800 2.8295
0.7413 13900 2.6263
0.7467 14000 2.7181
0.752 14100 2.8643
0.7573 14200 2.903
0.7627 14300 2.7787
0.768 14400 2.991
0.7733 14500 2.8306
0.7787 14600 2.4423
0.784 14700 2.8633
0.7893 14800 2.7031
0.7947 14900 3.1548
0.8 15000 2.798
0.8053 15100 2.8189
0.8107 15200 3.0114
0.816 15300 2.5909
0.8213 15400 3.1911
0.8267 15500 2.8341
0.832 15600 2.7644
0.8373 15700 2.7604
0.8427 15800 2.7829
0.848 15900 2.572
0.8533 16000 2.8229
0.8587 16100 2.5582
0.864 16200 2.6958
0.8693 16300 2.8728
0.8747 16400 3.2461
0.88 16500 2.6323
0.8853 16600 2.6517
0.8907 16700 2.5449
0.896 16800 2.9283
0.9013 16900 2.8587
0.9067 17000 2.6408
0.912 17100 2.8004
0.9173 17200 2.8961
0.9227 17300 2.4204
0.928 17400 3.0084
0.9333 17500 2.8667
0.9387 17600 2.6908
0.944 17700 2.4349
0.9493 17800 2.7648
0.9547 17900 2.8743
0.96 18000 2.8606
0.9653 18100 2.6969
0.9707 18200 2.808
0.976 18300 3.0887
0.9813 18400 2.8279
0.9867 18500 2.8218
0.992 18600 2.3288
0.9973 18700 2.7652

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu118
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
16
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for youssefkhalil320/all-MiniLM-L6-v9-pair_score

Finetuned
(365)
this model