SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L12-v2
Maximum Sequence Length: 128 tokens
Output Dimensionality: 384 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hanwenzhu/all-MiniLM-L12-v2-lr2e-4-bs256-nneg3-ml-ne5-apr25")
# Run inference
sentences = [
    'Mathlib.Data.Bool.Count#6',
    'List.count_not_add_count',
    'lie_zsmul',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 8,429,545 training samples
Columns: state_name and premise_name
Approximate statistics based on the first 1000 samples:
state_name premise_name
type string string
details
min: 10 tokens
mean: 15.51 tokens
max: 22 tokens

min: 3 tokens
mean: 11.11 tokens
max: 40 tokens
Samples:

state_name premise_name

Mathlib.Algebra.Colimit.Module#111 DirectSum.induction_on

Mathlib.Algebra.Colimit.Module#111 map_add

Mathlib.Algebra.Colimit.Module#111 AddMonoidHom.comp_assoc
Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
```
{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}
```

	state_name	premise_name
type	string	string
details	min: 10 tokens mean: 15.51 tokens max: 22 tokens	min: 3 tokens mean: 11.11 tokens max: 40 tokens

state_name	premise_name
`Mathlib.Algebra.Colimit.Module#111`	`DirectSum.induction_on`
`Mathlib.Algebra.Colimit.Module#111`	`map_add`
`Mathlib.Algebra.Colimit.Module#111`	`AddMonoidHom.comp_assoc`

Evaluation Dataset

Unnamed Dataset

Size: 2,120 evaluation samples
Columns: state_name and premise_name
Approximate statistics based on the first 1000 samples:
state_name premise_name
type string string
details
min: 10 tokens
mean: 16.26 tokens
max: 26 tokens

min: 3 tokens
mean: 11.83 tokens
max: 33 tokens
Samples:

state_name premise_name

Batteries.Control.ForInStep.Lemmas#10 ForInStep.done_bindList

Batteries.Data.ByteArray#12 Fin.val_lt_of_le

Batteries.Data.ByteArray#12 Nat.le_refl
Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
```
{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}
```

	state_name	premise_name
type	string	string
details	min: 10 tokens mean: 16.26 tokens max: 26 tokens	min: 3 tokens mean: 11.83 tokens max: 33 tokens

state_name	premise_name
`Batteries.Control.ForInStep.Lemmas#10`	`ForInStep.done_bindList`
`Batteries.Data.ByteArray#12`	`Fin.val_lt_of_le`
`Batteries.Data.ByteArray#12`	`Nat.le_refl`

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 256
per_device_eval_batch_size: 64
learning_rate: 0.0002
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.03
bf16: True
dataloader_num_workers: 4
resume_from_checkpoint: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 256
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.0002
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5.0
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.03
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 4
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: True
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss
4.9520	163060	0.6739
4.9523	163070	0.7073
4.9526	163080	0.7318
4.9529	163090	0.6999
4.9532	163100	0.6871
4.9535	163110	0.6941
4.9538	163120	0.7311
4.9541	163130	0.6666
4.9544	163140	0.6841
4.9547	163150	0.7188
4.9551	163160	0.7337
4.9554	163170	0.6917
4.9557	163180	0.6745
4.9560	163190	0.7139
4.9563	163200	0.696
4.9566	163210	0.7142
4.9569	163220	0.6719
4.9572	163230	0.6492
4.9575	163240	0.7019
4.9578	163250	0.701
4.9581	163260	0.7217
4.9584	163270	0.6953
4.9587	163280	0.6928
4.9590	163290	0.6868
4.9593	163300	0.6912
4.9596	163310	0.7042
4.9599	163320	0.6771
4.9602	163330	0.7192
4.9605	163340	0.6948
4.9608	163350	0.7118
4.9611	163360	0.6937
4.9614	163370	0.6885
4.9617	163380	0.6518
4.9620	163390	0.7212
4.9623	163400	0.7011
4.9626	163410	0.6819
4.9629	163420	0.68
4.9633	163430	0.6884
4.9636	163440	0.7004
4.9639	163450	0.6905
4.9642	163460	0.7149
4.9645	163470	0.7228
4.9648	163480	0.7009
4.9651	163490	0.7261
4.9654	163500	0.687
4.9657	163510	0.6717
4.9660	163520	0.7126
4.9663	163530	0.7223
4.9666	163540	0.7014
4.9669	163550	0.6969
4.9672	163560	0.7203
4.9675	163570	0.7086
4.9678	163580	0.6947
4.9681	163590	0.7196
4.9684	163600	0.6756
4.9687	163610	0.6892
4.9690	163620	0.719
4.9693	163630	0.7274
4.9696	163640	0.6894
4.9699	163650	0.7596
4.9702	163660	0.6815
4.9705	163670	0.6792
4.9708	163680	0.658
4.9711	163690	0.6973
4.9715	163700	0.6555
4.9718	163710	0.7155
4.9721	163720	0.6896
4.9724	163730	0.6631
4.9727	163740	0.6781
4.9730	163750	0.7014
4.9733	163760	0.6866
4.9736	163770	0.7077
4.9739	163780	0.6985
4.9742	163790	0.6926
4.9745	163800	0.7179
4.9748	163810	0.706
4.9751	163820	0.7228
4.9754	163830	0.7007
4.9757	163840	0.6748
4.9760	163850	0.7414
4.9763	163860	0.6943
4.9766	163870	0.7068
4.9769	163880	0.6576
4.9772	163890	0.6958
4.9775	163900	0.7205
4.9778	163910	0.7117
4.9781	163920	0.6775
4.9784	163930	0.655
4.9787	163940	0.698
4.9790	163950	0.6913
4.9793	163960	0.6906
4.9797	163970	0.662
4.9800	163980	0.6731
4.9803	163990	0.6722
4.9806	164000	0.7155
4.9809	164010	0.692
4.9812	164020	0.6726
4.9815	164030	0.7109
4.9818	164040	0.6764
4.9821	164050	0.6889
4.9824	164060	0.6978
4.9827	164070	0.7357
4.9830	164080	0.6892
4.9833	164090	0.6848
4.9836	164100	0.6877
4.9839	164110	0.7118
4.9842	164120	0.6916
4.9845	164130	0.6752
4.9848	164140	0.7099
4.9851	164150	0.6937
4.9854	164160	0.7149
4.9857	164170	0.6705
4.9860	164180	0.6962
4.9863	164190	0.7078
4.9866	164200	0.7003
4.9869	164210	0.6927
4.9872	164220	0.7375
4.9875	164230	0.7055
4.9879	164240	0.6788
4.9882	164250	0.6631
4.9885	164260	0.7268
4.9888	164270	0.6968
4.9891	164280	0.6878
4.9894	164290	0.7003
4.9897	164300	0.6862
4.9900	164310	0.7128
4.9903	164320	0.6515
4.9906	164330	0.7074
4.9909	164340	0.706
4.9912	164350	0.6826
4.9915	164360	0.6824
4.9918	164370	0.7031
4.9921	164380	0.7036
4.9924	164390	0.7109
4.9927	164400	0.7091
4.9930	164410	0.6946
4.9933	164420	0.6801
4.9936	164430	0.7044
4.9939	164440	0.7027
4.9942	164450	0.6749
4.9945	164460	0.6933
4.9948	164470	0.709
4.9951	164480	0.6765
4.9954	164490	0.7224
4.9957	164500	0.7002
4.9961	164510	0.7148
4.9964	164520	0.7119
4.9967	164530	0.6932
4.9970	164540	0.7499
4.9973	164550	0.6967
4.9976	164560	0.6849
4.9979	164570	0.7077
4.9982	164580	0.6726
4.9985	164590	0.6885
4.9988	164600	0.7229
4.9991	164610	0.6601
4.9994	164620	0.6994
4.9997	164630	0.6934
5.0	164640	0.6601

Framework Versions

Python: 3.11.8
Sentence Transformers: 3.1.1
Transformers: 4.45.1
PyTorch: 2.5.1.post302
Accelerate: 0.34.2
Datasets: 3.0.0
Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MaskedCachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

hanwenzhu
/

all-MiniLM-L12-v2-lr2e-4-bs256-nneg3-ml-ne5-apr25

SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MaskedCachedMultipleNegativesRankingLoss

Model tree for hanwenzhu/all-MiniLM-L12-v2-lr2e-4-bs256-nneg3-ml-ne5-apr25