SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "Background: Patients find it difficult to use dexterous prosthetic hands without a suitable control system, highlighting a need for improved grasp performance and ease of operation. Existing methods may not adequately address the challenges faced by users, particularly those with inferior myoelectric signals, in effectively controlling prosthetic devices.\nContribution: Combine 'myoelectric signal' and ",
    'a unified framework for collaborative decoding between large and small language models (Large Language Models and small language models)',
    'joint biomedical entity linking and event extraction',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 784,827 training samples
  • Columns: query, answer, and label
  • Approximate statistics based on the first 1000 samples:
    query answer label
    type string string int
    details
    • min: 60 tokens
    • mean: 77.86 tokens
    • max: 93 tokens
    • min: 3 tokens
    • mean: 8.82 tokens
    • max: 64 tokens
    • 0: ~96.70%
    • 1: ~3.30%
  • Samples:
    query answer label
    Background: The study addresses the challenge of action segmentation under weak supervision, where the available ground truth only indicates the presence of actions without providing their temporal ordering or occurrence timing in training videos. This limitation necessitates the development of a method to generate pseudo-ground truth for effective training and improve performance in action segmentation and alignment tasks.
    Contribution: Combine 'a Hidden Markov Model' and
    a multilayer perceptron 1
    Background: The study addresses the challenge of action segmentation under weak supervision, where the available ground truth only indicates the presence of actions without providing their temporal ordering or occurrence timing in training videos. This limitation necessitates the development of a method to generate pseudo-ground truth for effective training and improve performance in action segmentation and alignment tasks.
    Contribution: Combine 'a Hidden Markov Model' and
    synthetic occlusion augmentation during training 0
    Background: The study addresses the challenge of action segmentation under weak supervision, where the available ground truth only indicates the presence of actions without providing their temporal ordering or occurrence timing in training videos. This limitation necessitates the development of a method to generate pseudo-ground truth for effective training and improve performance in action segmentation and alignment tasks.
    Contribution: Combine 'a Hidden Markov Model' and
    robustness of deep learning methods 0
  • Loss: ContrastiveLoss with these parameters:
    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • learning_rate: 1.9218937402834593e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.08278167292320517
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1.9218937402834593e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.08278167292320517
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0082 100 0.0104
0.0163 200 0.0068
0.0245 300 0.005
0.0326 400 0.0041
0.0408 500 0.0054
0.0489 600 0.004
0.0571 700 0.0037
0.0652 800 0.0037
0.0734 900 0.0049
0.0815 1000 0.0038
0.0897 1100 0.004
0.0979 1200 0.0037
0.1060 1300 0.004
0.1142 1400 0.0049
0.1223 1500 0.0038
0.1305 1600 0.0036
0.1386 1700 0.0037
0.1468 1800 0.0045
0.1549 1900 0.0038
0.1631 2000 0.0034
0.1712 2100 0.0034
0.1794 2200 0.0035
0.1876 2300 0.0045
0.1957 2400 0.0036
0.2039 2500 0.0036
0.2120 2600 0.0033
0.2202 2700 0.004
0.2283 2800 0.0036
0.2365 2900 0.0033
0.2446 3000 0.0033
0.2528 3100 0.0037
0.2609 3200 0.0038
0.2691 3300 0.0033
0.2773 3400 0.0034
0.2854 3500 0.0033
0.2936 3600 0.0041
0.3017 3700 0.0033
0.3099 3800 0.0033
0.3180 3900 0.0032
0.3262 4000 0.004
0.3343 4100 0.0035
0.3425 4200 0.0031
0.3506 4300 0.0033
0.3588 4400 0.0033
0.3670 4500 0.0039
0.3751 4600 0.0032
0.3833 4700 0.0034
0.3914 4800 0.0031
0.3996 4900 0.004
0.4077 5000 0.0032
0.4159 5100 0.0031
0.4240 5200 0.0031
0.4322 5300 0.0032
0.4403 5400 0.0039
0.4485 5500 0.0031
0.4567 5600 0.003
0.4648 5700 0.0032
0.4730 5800 0.0038
0.4811 5900 0.0033
0.4893 6000 0.0031
0.4974 6100 0.0032
0.5056 6200 0.0033
0.5137 6300 0.0033
0.5219 6400 0.0032
0.5300 6500 0.0031
0.5382 6600 0.0032
0.5464 6700 0.0038
0.5545 6800 0.003
0.5627 6900 0.003
0.5708 7000 0.0029
0.5790 7100 0.0038
0.5871 7200 0.0032
0.5953 7300 0.0031
0.6034 7400 0.003
0.6116 7500 0.003
0.6198 7600 0.0039
0.6279 7700 0.0031
0.6361 7800 0.0031
0.6442 7900 0.0031
0.6524 8000 0.0039
0.6605 8100 0.003
0.6687 8200 0.003
0.6768 8300 0.003
0.6850 8400 0.0028
0.6931 8500 0.0035
0.7013 8600 0.0031
0.7095 8700 0.003
0.7176 8800 0.0026
0.7258 8900 0.0034
0.7339 9000 0.0033
0.7421 9100 0.003
0.7502 9200 0.0027
0.7584 9300 0.0029
0.7665 9400 0.0034
0.7747 9500 0.0029
0.7828 9600 0.0028
0.7910 9700 0.0027
0.7992 9800 0.0033
0.8073 9900 0.0031
0.8155 10000 0.0029
0.8236 10100 0.0028
0.8318 10200 0.0031
0.8399 10300 0.0031
0.8481 10400 0.003
0.8562 10500 0.0029
0.8644 10600 0.0028
0.8725 10700 0.0033
0.8807 10800 0.003
0.8889 10900 0.0029
0.8970 11000 0.0027
0.9052 11100 0.0033
0.9133 11200 0.0029
0.9215 11300 0.0029
0.9296 11400 0.0029
0.9378 11500 0.003
0.9459 11600 0.0034
0.9541 11700 0.0031
0.9622 11800 0.0027
0.9704 11900 0.0029
0.9786 12000 0.0034
0.9867 12100 0.0032
0.9949 12200 0.003
1.0030 12300 0.0032
1.0112 12400 0.0028
1.0193 12500 0.003
1.0275 12600 0.0027
1.0356 12700 0.0034
1.0438 12800 0.0029
1.0519 12900 0.0025
1.0601 13000 0.0028
1.0683 13100 0.0026
1.0764 13200 0.0035
1.0846 13300 0.0026
1.0927 13400 0.0028
1.1009 13500 0.0026
1.1090 13600 0.0034
1.1172 13700 0.0028
1.1253 13800 0.0027
1.1335 13900 0.0026
1.1416 14000 0.0031
1.1498 14100 0.0025
1.1580 14200 0.0025
1.1661 14300 0.0025
1.1743 14400 0.0024
1.1824 14500 0.0031
1.1906 14600 0.0025
1.1987 14700 0.0024
1.2069 14800 0.0025
1.2150 14900 0.0029
1.2232 15000 0.0025
1.2313 15100 0.0025
1.2395 15200 0.0023
1.2477 15300 0.0024
1.2558 15400 0.0029
1.2640 15500 0.0023
1.2721 15600 0.0023
1.2803 15700 0.0023
1.2884 15800 0.0032
1.2966 15900 0.0023
1.3047 16000 0.0023
1.3129 16100 0.0024
1.3210 16200 0.0025
1.3292 16300 0.0028
1.3374 16400 0.0023
1.3455 16500 0.0021
1.3537 16600 0.0023
1.3618 16700 0.0029
1.3700 16800 0.0023
1.3781 16900 0.0023
1.3863 17000 0.0025
1.3944 17100 0.0028
1.4026 17200 0.0023
1.4107 17300 0.0023
1.4189 17400 0.0023
1.4271 17500 0.0023
1.4352 17600 0.0029
1.4434 17700 0.0022
1.4515 17800 0.0022
1.4597 17900 0.0023
1.4678 18000 0.0026
1.4760 18100 0.0024
1.4841 18200 0.0023
1.4923 18300 0.0024
1.5004 18400 0.0024
1.5086 18500 0.0026
1.5168 18600 0.0022
1.5249 18700 0.0023
1.5331 18800 0.0023
1.5412 18900 0.003
1.5494 19000 0.002
1.5575 19100 0.0022
1.5657 19200 0.0023
1.5738 19300 0.0023
1.5820 19400 0.0028
1.5901 19500 0.0022
1.5983 19600 0.0023
1.6065 19700 0.0022
1.6146 19800 0.0028
1.6228 19900 0.0022
1.6309 20000 0.0023
1.6391 20100 0.0025
1.6472 20200 0.0028
1.6554 20300 0.0023
1.6635 20400 0.0021
1.6717 20500 0.0022
1.6798 20600 0.0022
1.6880 20700 0.0025
1.6962 20800 0.0024
1.7043 20900 0.0023
1.7125 21000 0.0021
1.7206 21100 0.0024
1.7288 21200 0.0024
1.7369 21300 0.0023
1.7451 21400 0.0022
1.7532 21500 0.0021
1.7614 21600 0.0025
1.7696 21700 0.0023
1.7777 21800 0.002
1.7859 21900 0.0022
1.7940 22000 0.0025
1.8022 22100 0.0022
1.8103 22200 0.0023
1.8185 22300 0.0022
1.8266 22400 0.0021
1.8348 22500 0.0025
1.8429 22600 0.0025
1.8511 22700 0.0022
1.8593 22800 0.0023
1.8674 22900 0.0026
1.8756 23000 0.0022
1.8837 23100 0.0022
1.8919 23200 0.0022
1.9000 23300 0.0024
1.9082 23400 0.0022
1.9163 23500 0.0022
1.9245 23600 0.0023
1.9326 23700 0.0023
1.9408 23800 0.0027
1.9490 23900 0.0023
1.9571 24000 0.0023
1.9653 24100 0.0022
1.9734 24200 0.0027
1.9816 24300 0.0025
1.9897 24400 0.0023
1.9979 24500 0.0025

Framework Versions

  • Python: 3.11.2
  • Sentence Transformers: 3.3.1
  • Transformers: 4.49.0
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.0.1
  • Datasets: 3.1.0
  • Tokenizers: 0.21.0

Citation

BibTeX

@misc{sternlicht2025chimeraknowledgebaseidea,
      title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature}, 
      author={Noy Sternlicht and Tom Hope},
      year={2025},
      eprint={2505.20779},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.20779}, 
}

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
    title={Dimensionality Reduction by Learning an Invariant Mapping},
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}

Quick Links

Downloads last month
13
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noystl/recomb-pred-all-mpnet-base

Finetuned
(281)
this model

Dataset used to train noystl/recomb-pred-all-mpnet-base

Collection including noystl/recomb-pred-all-mpnet-base