ModernBERT Embed base fitness health Matryoshka

This is a sentence-transformers model finetuned from kokojake/modernbert-embed-base-fitness-health-matryoshka on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("kokojake/modernbert-embed-base-fitness-health-matryoshka-8-epochs")
# Run inference
sentences = [
    'With this policy paper we offer our teams and partners the basis for understanding what we \nbelieve physical and functional rehabilitation covers today. While we can look back and see how far we have come, we are also aware of the changes we have yet to go through. Hence it \nhighlights the directions we will be taking in the coming years, for example the user-centred approach, the quality and sustainability of services in developing countries, improving the \nprofessional training process, and connecting with user groups.\nWe have a small, highly-motivated and ambitious team that, with limited resources (always too limited!), manages daily tours-de-force to improve our practices, capitalise, train, \ninnovate, structure and improve the position of physical and functional rehabilitation in the world. This document follows naturally from that, highlighting how the medical field plays an \nessential role in enhancing the social participation of people with disabilities. The beneficiary becomes a new participant in his own health, allowing him to be a stakeholder in building the \nsocieties of today and tomorrow.\nThanks to all who contributed, and happy reading.',
    'role of medical field in social participation for disabilities',
    'selection and adjustment of assistive products for patient independence',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5211
cosine_accuracy@3 0.5211
cosine_accuracy@5 0.5211
cosine_accuracy@10 0.5578
cosine_precision@1 0.5211
cosine_precision@3 0.5211
cosine_precision@5 0.5211
cosine_precision@10 0.4754
cosine_recall@1 0.0649
cosine_recall@3 0.1947
cosine_recall@5 0.3245
cosine_recall@10 0.5487
cosine_ndcg@10 0.5375
cosine_mrr@10 0.5272
cosine_map@100 0.606

Information Retrieval

Metric Value
cosine_accuracy@1 0.5062
cosine_accuracy@3 0.5062
cosine_accuracy@5 0.5062
cosine_accuracy@10 0.5461
cosine_precision@1 0.5062
cosine_precision@3 0.5057
cosine_precision@5 0.5056
cosine_precision@10 0.4611
cosine_recall@1 0.0636
cosine_recall@3 0.1904
cosine_recall@5 0.3173
cosine_recall@10 0.5348
cosine_ndcg@10 0.5233
cosine_mrr@10 0.5129
cosine_map@100 0.595

Information Retrieval

Metric Value
cosine_accuracy@1 0.4859
cosine_accuracy@3 0.4859
cosine_accuracy@5 0.4859
cosine_accuracy@10 0.5227
cosine_precision@1 0.4859
cosine_precision@3 0.4859
cosine_precision@5 0.4859
cosine_precision@10 0.4407
cosine_recall@1 0.0612
cosine_recall@3 0.1837
cosine_recall@5 0.3062
cosine_recall@10 0.5121
cosine_ndcg@10 0.5017
cosine_mrr@10 0.4921
cosine_map@100 0.5785

Information Retrieval

Metric Value
cosine_accuracy@1 0.4688
cosine_accuracy@3 0.4688
cosine_accuracy@5 0.4688
cosine_accuracy@10 0.507
cosine_precision@1 0.4688
cosine_precision@3 0.4688
cosine_precision@5 0.4686
cosine_precision@10 0.43
cosine_recall@1 0.0584
cosine_recall@3 0.1753
cosine_recall@5 0.292
cosine_recall@10 0.4963
cosine_ndcg@10 0.4854
cosine_mrr@10 0.4751
cosine_map@100 0.5618

Information Retrieval

Metric Value
cosine_accuracy@1 0.4141
cosine_accuracy@3 0.4141
cosine_accuracy@5 0.4141
cosine_accuracy@10 0.4586
cosine_precision@1 0.4141
cosine_precision@3 0.4141
cosine_precision@5 0.4141
cosine_precision@10 0.3875
cosine_recall@1 0.0512
cosine_recall@3 0.1535
cosine_recall@5 0.2558
cosine_recall@10 0.4471
cosine_ndcg@10 0.4339
cosine_mrr@10 0.4215
cosine_map@100 0.5068

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 11,518 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 10 tokens
    • mean: 244.14 tokens
    • max: 412 tokens
    • min: 4 tokens
    • mean: 10.92 tokens
    • max: 39 tokens
  • Samples:
    positive anchor
    certainty as rated in the Cochrane
    review (89) due to indirectness], 35
    trials); and

    improve function (trivial effect, low certainty evidence [downgraded
    #
    Review findings: values and
    preferences relevant to older
    people
    GRADE-
    CERQual
    assessment
    of confidence
    Explanation of confidence
    assessment
    12 Older people emphasized the
    importance of continuity of
    physical exercises to maintain
    mobility and reduce pain. A lack of
    continuity of physical exercise and
    instruction could have adverse
    effects.
    LOW No/very minor concerns
    regarding methodological
    limitations, moderate
    concerns regarding coherence,
    minor concerns regarding
    adequacy, and minor concerns
    regarding relevance.
    13 Older people also valued
    educational materials to
    accompany exercise programmes,
    such as drawings and descriptions
    of the exercises.
    LOW
    Minor concerns regarding
    methodological limitations, no/very minor concerns
    regarding coherence, serious
    concerns regarding adequacy,
    and serious con...
    importance of physical exercise continuity for older adults mobility and pain reduction
    Phosphodiesterase-5
    inhibitors
    Prescription and/or administration (if injection) of the medicine and providing
    education and advice on the safe intake or administration (if self-directed) and potential adverse effects of the medicine.
    Physical exercise
    training
    A variety of physical exercises (e.g. aerobic or strengthening exercises, balance or
    coordination exercises, mind–body exercises), with or without weight-bearing, are suitable to improve exercise capacity, muscle strength, joint mobility, voluntary
    movement, balance, gait and walking, as well as helping to reduce pain and fatigue. Regular physical exercise training (including education and advice on exercises)
    is planned according to an individual’s needs, guided or assisted and, if feasible,
    performed self-directed following education and advice on the appropriate exercises.
    Phosphodiesterase-5 inhibitors prescription administration and adverse effects
    and occasional guidelines (Bach-Faig 2011):
    Every day

    Three main meals should contain three basic elements: o Cereals: 1-2 servings per meal (preferably whole grain), such as bread, pasta,
    rice, and couscous
    three main meals basic elements cereals whole grain
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.4444 10 16.2146 - - - - -
0.8889 20 15.6805 - - - - -
1.0 23 - 0.5287 0.5234 0.5034 0.4812 0.4255
1.3111 30 13.2495 - - - - -
1.7556 40 13.4064 - - - - -
2.0 46 - 0.5320 0.5199 0.5019 0.4776 0.4306
2.1778 50 11.3483 - - - - -
2.6222 60 11.7323 - - - - -
3.0 69 - 0.5375 0.5216 0.5014 0.4816 0.4288
3.0444 70 11.3371 - - - - -
3.4889 80 10.2106 - - - - -
3.8444 88 - 0.5375 0.5233 0.5017 0.4854 0.4339
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.0.2
  • Transformers: 4.51.1
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
6
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kokojake/modernbert-embed-base-fitness-health-matryoshka-8-epochs

Evaluation results