SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Devy1/MiniLM-cosqa-128")
# Run inference
sentences = [
    'bottom 5 rows in python',
    'def table_top_abs(self):\n        """Returns the absolute position of table top"""\n        table_height = np.array([0, 0, self.table_full_size[2]])\n        return string_to_array(self.floor.get("pos")) + table_height',
    'def refresh(self, document):\n\t\t""" Load a new copy of a document from the database.  does not\n\t\t\treplace the old one """\n\t\ttry:\n\t\t\told_cache_size = self.cache_size\n\t\t\tself.cache_size = 0\n\t\t\tobj = self.query(type(document)).filter_by(mongo_id=document.mongo_id).one()\n\t\tfinally:\n\t\t\tself.cache_size = old_cache_size\n\t\tself.cache_write(obj)\n\t\treturn obj',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.4828, -0.0626],
#         [ 0.4828,  1.0000, -0.0528],
#         [-0.0626, -0.0528,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 9,020 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 9.67 tokens
    • max: 21 tokens
    • min: 40 tokens
    • mean: 86.17 tokens
    • max: 256 tokens
  • Samples:
    anchor positive
    1d array in char datatype in python def _convert_to_array(array_like, dtype):
    """
    Convert Matrix attributes which are array-like or buffer to array.
    """
    if isinstance(array_like, bytes):
    return np.frombuffer(array_like, dtype=dtype)
    return np.asarray(array_like, dtype=dtype)
    python condition non none def _not(condition=None, **kwargs):
    """
    Return the opposite of input condition.

    :param condition: condition to process.

    :result: not condition.
    :rtype: bool
    """

    result = True

    if condition is not None:
    result = not run(condition, **kwargs)

    return result
    accessing a column from a matrix in python def get_column(self, X, column):
    """Return a column of the given matrix.

    Args:
    X: numpy.ndarray or pandas.DataFrame.
    column: int or str.

    Returns:
    np.ndarray: Selected column.
    """
    if isinstance(X, pd.DataFrame):
    return X[column].values

    return X[:, column]
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0141 1 0.6881
0.0282 2 0.4421
0.0423 3 0.3636
0.0563 4 0.4092
0.0704 5 0.4558
0.0845 6 0.5227
0.0986 7 0.6376
0.1127 8 0.4178
0.1268 9 0.2803
0.1408 10 0.3843
0.1549 11 0.3998
0.1690 12 0.3264
0.1831 13 0.4509
0.1972 14 0.4697
0.2113 15 0.3188
0.2254 16 0.5552
0.2394 17 0.3308
0.2535 18 0.4426
0.2676 19 0.3757
0.2817 20 0.2844
0.2958 21 0.3652
0.3099 22 0.341
0.3239 23 0.3956
0.3380 24 0.4095
0.3521 25 0.3498
0.3662 26 0.3957
0.3803 27 0.4788
0.3944 28 0.4238
0.4085 29 0.3866
0.4225 30 0.4671
0.4366 31 0.358
0.4507 32 0.4684
0.4648 33 0.4192
0.4789 34 0.3826
0.4930 35 0.3387
0.5070 36 0.4292
0.5211 37 0.4378
0.5352 38 0.3185
0.5493 39 0.3687
0.5634 40 0.3171
0.5775 41 0.3343
0.5915 42 0.4706
0.6056 43 0.3747
0.6197 44 0.3272
0.6338 45 0.4118
0.6479 46 0.4688
0.6620 47 0.3684
0.6761 48 0.3609
0.6901 49 0.3521
0.7042 50 0.3533
0.7183 51 0.3788
0.7324 52 0.3182
0.7465 53 0.5793
0.7606 54 0.2803
0.7746 55 0.2695
0.7887 56 0.2853
0.8028 57 0.3116
0.8169 58 0.3542
0.8310 59 0.3445
0.8451 60 0.2799
0.8592 61 0.3178
0.8732 62 0.4737
0.8873 63 0.2121
0.9014 64 0.2585
0.9155 65 0.3238
0.9296 66 0.3203
0.9437 67 0.4475
0.9577 68 0.3722
0.9718 69 0.4047
0.9859 70 0.3056
1.0 71 0.316
1.0141 72 0.2711
1.0282 73 0.3488
1.0423 74 0.2413
1.0563 75 0.2434
1.0704 76 0.2602
1.0845 77 0.3006
1.0986 78 0.237
1.1127 79 0.2614
1.1268 80 0.2456
1.1408 81 0.2305
1.1549 82 0.2774
1.1690 83 0.3028
1.1831 84 0.2037
1.1972 85 0.2905
1.2113 86 0.2048
1.2254 87 0.2459
1.2394 88 0.2291
1.2535 89 0.2319
1.2676 90 0.2755
1.2817 91 0.3138
1.2958 92 0.3555
1.3099 93 0.2908
1.3239 94 0.2602
1.3380 95 0.2615
1.3521 96 0.2041
1.3662 97 0.2629
1.3803 98 0.2508
1.3944 99 0.248
1.4085 100 0.2601
1.4225 101 0.3114
1.4366 102 0.3201
1.4507 103 0.2574
1.4648 104 0.2371
1.4789 105 0.2041
1.4930 106 0.2454
1.5070 107 0.3303
1.5211 108 0.29
1.5352 109 0.3327
1.5493 110 0.2741
1.5634 111 0.258
1.5775 112 0.3228
1.5915 113 0.2989
1.6056 114 0.2769
1.6197 115 0.3744
1.6338 116 0.3053
1.6479 117 0.1675
1.6620 118 0.2337
1.6761 119 0.2505
1.6901 120 0.2304
1.7042 121 0.2369
1.7183 122 0.1978
1.7324 123 0.1929
1.7465 124 0.2212
1.7606 125 0.2175
1.7746 126 0.1839
1.7887 127 0.3059
1.8028 128 0.1996
1.8169 129 0.3
1.8310 130 0.3051
1.8451 131 0.2272
1.8592 132 0.2503
1.8732 133 0.3077
1.8873 134 0.1847
1.9014 135 0.2437
1.9155 136 0.2333
1.9296 137 0.2111
1.9437 138 0.162
1.9577 139 0.4412
1.9718 140 0.1282
1.9859 141 0.2651
2.0 142 0.1055
2.0141 143 0.2316
2.0282 144 0.243
2.0423 145 0.1892
2.0563 146 0.19
2.0704 147 0.172
2.0845 148 0.185
2.0986 149 0.2481
2.1127 150 0.2651
2.1268 151 0.2511
2.1408 152 0.1761
2.1549 153 0.2215
2.1690 154 0.2275
2.1831 155 0.2621
2.1972 156 0.2255
2.2113 157 0.201
2.2254 158 0.1372
2.2394 159 0.1941
2.2535 160 0.2225
2.2676 161 0.1713
2.2817 162 0.1045
2.2958 163 0.2273
2.3099 164 0.2474
2.3239 165 0.312
2.3380 166 0.2274
2.3521 167 0.1991
2.3662 168 0.1511
2.3803 169 0.2248
2.3944 170 0.2025
2.4085 171 0.258
2.4225 172 0.2163
2.4366 173 0.4012
2.4507 174 0.2397
2.4648 175 0.1978
2.4789 176 0.2071
2.4930 177 0.147
2.5070 178 0.2424
2.5211 179 0.1345
2.5352 180 0.2506
2.5493 181 0.1275
2.5634 182 0.3284
2.5775 183 0.2063
2.5915 184 0.1483
2.6056 185 0.2051
2.6197 186 0.2439
2.6338 187 0.252
2.6479 188 0.2126
2.6620 189 0.2156
2.6761 190 0.153
2.6901 191 0.2481
2.7042 192 0.2481
2.7183 193 0.1539
2.7324 194 0.1224
2.7465 195 0.1924
2.7606 196 0.196
2.7746 197 0.2172
2.7887 198 0.1999
2.8028 199 0.1932
2.8169 200 0.1758
2.8310 201 0.2173
2.8451 202 0.1792
2.8592 203 0.2228
2.8732 204 0.2013
2.8873 205 0.2197
2.9014 206 0.1942
2.9155 207 0.1798
2.9296 208 0.2064
2.9437 209 0.2901
2.9577 210 0.202
2.9718 211 0.1809
2.9859 212 0.176
3.0 213 0.1733

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 5.1.1
  • Transformers: 4.56.2
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.1.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
21
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Devy1/MiniLM-cosqa-128

Finetuned
(550)
this model

Collection including Devy1/MiniLM-cosqa-128