SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L12-v2
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("hanwenzhu/all-MiniLM-L12-v2-lr2e-4-bs256-nneg3-ml-ne5-apr25")
# Run inference
sentences = [
    'Mathlib.Data.Bool.Count#6',
    'List.count_not_add_count',
    'lie_zsmul',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 8,429,545 training samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 10 tokens
    • mean: 15.51 tokens
    • max: 22 tokens
    • min: 3 tokens
    • mean: 11.11 tokens
    • max: 40 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.Colimit.Module#111 DirectSum.induction_on
    Mathlib.Algebra.Colimit.Module#111 map_add
    Mathlib.Algebra.Colimit.Module#111 AddMonoidHom.comp_assoc
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 2,120 evaluation samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 10 tokens
    • mean: 16.26 tokens
    • max: 26 tokens
    • min: 3 tokens
    • mean: 11.83 tokens
    • max: 33 tokens
  • Samples:
    state_name premise_name
    Batteries.Control.ForInStep.Lemmas#10 ForInStep.done_bindList
    Batteries.Data.ByteArray#12 Fin.val_lt_of_le
    Batteries.Data.ByteArray#12 Nat.le_refl
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0002
  • num_train_epochs: 5.0
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.03
  • bf16: True
  • dataloader_num_workers: 4
  • resume_from_checkpoint: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0002
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5.0
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: True
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
4.9520 163060 0.6739
4.9523 163070 0.7073
4.9526 163080 0.7318
4.9529 163090 0.6999
4.9532 163100 0.6871
4.9535 163110 0.6941
4.9538 163120 0.7311
4.9541 163130 0.6666
4.9544 163140 0.6841
4.9547 163150 0.7188
4.9551 163160 0.7337
4.9554 163170 0.6917
4.9557 163180 0.6745
4.9560 163190 0.7139
4.9563 163200 0.696
4.9566 163210 0.7142
4.9569 163220 0.6719
4.9572 163230 0.6492
4.9575 163240 0.7019
4.9578 163250 0.701
4.9581 163260 0.7217
4.9584 163270 0.6953
4.9587 163280 0.6928
4.9590 163290 0.6868
4.9593 163300 0.6912
4.9596 163310 0.7042
4.9599 163320 0.6771
4.9602 163330 0.7192
4.9605 163340 0.6948
4.9608 163350 0.7118
4.9611 163360 0.6937
4.9614 163370 0.6885
4.9617 163380 0.6518
4.9620 163390 0.7212
4.9623 163400 0.7011
4.9626 163410 0.6819
4.9629 163420 0.68
4.9633 163430 0.6884
4.9636 163440 0.7004
4.9639 163450 0.6905
4.9642 163460 0.7149
4.9645 163470 0.7228
4.9648 163480 0.7009
4.9651 163490 0.7261
4.9654 163500 0.687
4.9657 163510 0.6717
4.9660 163520 0.7126
4.9663 163530 0.7223
4.9666 163540 0.7014
4.9669 163550 0.6969
4.9672 163560 0.7203
4.9675 163570 0.7086
4.9678 163580 0.6947
4.9681 163590 0.7196
4.9684 163600 0.6756
4.9687 163610 0.6892
4.9690 163620 0.719
4.9693 163630 0.7274
4.9696 163640 0.6894
4.9699 163650 0.7596
4.9702 163660 0.6815
4.9705 163670 0.6792
4.9708 163680 0.658
4.9711 163690 0.6973
4.9715 163700 0.6555
4.9718 163710 0.7155
4.9721 163720 0.6896
4.9724 163730 0.6631
4.9727 163740 0.6781
4.9730 163750 0.7014
4.9733 163760 0.6866
4.9736 163770 0.7077
4.9739 163780 0.6985
4.9742 163790 0.6926
4.9745 163800 0.7179
4.9748 163810 0.706
4.9751 163820 0.7228
4.9754 163830 0.7007
4.9757 163840 0.6748
4.9760 163850 0.7414
4.9763 163860 0.6943
4.9766 163870 0.7068
4.9769 163880 0.6576
4.9772 163890 0.6958
4.9775 163900 0.7205
4.9778 163910 0.7117
4.9781 163920 0.6775
4.9784 163930 0.655
4.9787 163940 0.698
4.9790 163950 0.6913
4.9793 163960 0.6906
4.9797 163970 0.662
4.9800 163980 0.6731
4.9803 163990 0.6722
4.9806 164000 0.7155
4.9809 164010 0.692
4.9812 164020 0.6726
4.9815 164030 0.7109
4.9818 164040 0.6764
4.9821 164050 0.6889
4.9824 164060 0.6978
4.9827 164070 0.7357
4.9830 164080 0.6892
4.9833 164090 0.6848
4.9836 164100 0.6877
4.9839 164110 0.7118
4.9842 164120 0.6916
4.9845 164130 0.6752
4.9848 164140 0.7099
4.9851 164150 0.6937
4.9854 164160 0.7149
4.9857 164170 0.6705
4.9860 164180 0.6962
4.9863 164190 0.7078
4.9866 164200 0.7003
4.9869 164210 0.6927
4.9872 164220 0.7375
4.9875 164230 0.7055
4.9879 164240 0.6788
4.9882 164250 0.6631
4.9885 164260 0.7268
4.9888 164270 0.6968
4.9891 164280 0.6878
4.9894 164290 0.7003
4.9897 164300 0.6862
4.9900 164310 0.7128
4.9903 164320 0.6515
4.9906 164330 0.7074
4.9909 164340 0.706
4.9912 164350 0.6826
4.9915 164360 0.6824
4.9918 164370 0.7031
4.9921 164380 0.7036
4.9924 164390 0.7109
4.9927 164400 0.7091
4.9930 164410 0.6946
4.9933 164420 0.6801
4.9936 164430 0.7044
4.9939 164440 0.7027
4.9942 164450 0.6749
4.9945 164460 0.6933
4.9948 164470 0.709
4.9951 164480 0.6765
4.9954 164490 0.7224
4.9957 164500 0.7002
4.9961 164510 0.7148
4.9964 164520 0.7119
4.9967 164530 0.6932
4.9970 164540 0.7499
4.9973 164550 0.6967
4.9976 164560 0.6849
4.9979 164570 0.7077
4.9982 164580 0.6726
4.9985 164590 0.6885
4.9988 164600 0.7229
4.9991 164610 0.6601
4.9994 164620 0.6994
4.9997 164630 0.6934
5.0 164640 0.6601

Framework Versions

  • Python: 3.11.8
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 2.5.1.post302
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MaskedCachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
1
Safetensors
Model size
33.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hanwenzhu/all-MiniLM-L12-v2-lr2e-4-bs256-nneg3-ml-ne5-apr25

Finetuned
(33)
this model