all-MiniLM-L6-v18-pair_score

This is a sentence-transformers model finetuned from youssefkhalil320/all-MiniLM-L6-v14-pair_score. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: youssefkhalil320/all-MiniLM-L6-v14-pair_score
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'medical accessory',
    'bedroom chair',
    'handball',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0244 100 15.2955
0.0488 200 12.0245
0.0732 300 9.5958
0.0977 400 9.0251
0.1221 500 8.8519
0.1465 600 8.7625
0.1709 700 8.7085
0.1953 800 8.6742
0.2197 900 8.6344
0.2441 1000 8.5984
0.2686 1100 8.5607
0.2930 1200 8.5395
0.3174 1300 8.4973
0.3418 1400 8.4809
0.3662 1500 8.4588
0.3906 1600 8.4489
0.4150 1700 8.4255
0.4395 1800 8.4189
0.4639 1900 8.4012
0.4883 2000 8.3793
0.5127 2100 8.3654
0.5371 2200 8.3369
0.5615 2300 8.3483
0.5859 2400 8.3441
0.6104 2500 8.3357
0.6348 2600 8.3009
0.6592 2700 8.2997
0.6836 2800 8.2901
0.7080 2900 8.2797
0.7324 3000 8.2837
0.7568 3100 8.2856
0.7812 3200 8.2753
0.8057 3300 8.275
0.8301 3400 8.2577
0.8545 3500 8.2296
0.8789 3600 8.2368
0.9033 3700 8.244
0.9277 3800 8.2259
0.9521 3900 8.2262
0.9766 4000 8.252
1.0010 4100 8.1966
1.0254 4200 8.1923
1.0498 4300 8.1938
1.0742 4400 8.1771
1.0986 4500 8.1956
1.1230 4600 8.1826
1.1475 4700 8.1726
1.1719 4800 8.1826
1.1963 4900 8.1699
1.2207 5000 8.1576
1.2451 5100 8.1849
1.2695 5200 8.1753
1.2939 5300 8.1661
1.3184 5400 8.153
1.3428 5500 8.158
1.3672 5600 8.1577
1.3916 5700 8.152
1.4160 5800 8.1716
1.4404 5900 8.1528
1.4648 6000 8.1561
1.4893 6100 8.1658
1.5137 6200 8.1379
1.5381 6300 8.1437
1.5625 6400 8.1329
1.5869 6500 8.1251
1.6113 6600 8.1359
1.6357 6700 8.1334
1.6602 6800 8.1187
1.6846 6900 8.1461
1.7090 7000 8.1294
1.7334 7100 8.0935
1.7578 7200 8.1494
1.7822 7300 8.105
1.8066 7400 8.0987
1.8311 7500 8.1524
1.8555 7600 8.1228
1.8799 7700 8.1228
1.9043 7800 8.1315
1.9287 7900 8.1236
1.9531 8000 8.1499
1.9775 8100 8.1092

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu118
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
3
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for youssefkhalil320/all-MiniLM-L6-v18-pair_score