ModernBERT Embed base fitness health Matryoshka

This is a sentence-transformers model finetuned from kokojake/modernbert-embed-base-fitness-health-matryoshka-8-epochs-25k on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("kokojake/modernbert-embed-base-fitness-health-matryoshka-epoch-15")
# Run inference
sentences = [
    'Kasperczyk A, Kasperczyk S, Vendemiale G. An open-label, single-center \npilot study to test the effects of an amino acid mixture in older patients admitted to internal medicine wards. Nutrition. 2020;69:110588.\n\t38.\t Paddon-Jones D, Rasmussen BB. Dietary protein recommendations and the prevention of sarcopenia. Curr Opin Clin Nutr Metab Care. \n2009;12(1):86–90. 39.\t MacKenzie-Shalders KL, King NA, Byrne NM, Slater GJ. Increasing Protein \nDistribution Has No Effect on Changes in Lean Mass During a Rugby Preseason. Int J Sport Nutr Exerc Metab. 2016;26(1):1–7. 40.\t Zanini B, Simonetto A, Zubani M, Castellano M, Gilioli G: The Effects of \nCow-Milk Protein Supplementation in Elderly Population: Systematic Review and Narrative Synthesis. Nutrients. 2020, 12(9).',
    'dietary protein recommendations for sarcopenia prevention',
    'effect of sitting time on obesity and diabetes',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5578
cosine_accuracy@3 0.5634
cosine_accuracy@5 0.5768
cosine_accuracy@10 0.6573
cosine_precision@1 0.5578
cosine_precision@3 0.5588
cosine_precision@5 0.5547
cosine_precision@10 0.4887
cosine_recall@1 0.0767
cosine_recall@3 0.2295
cosine_recall@5 0.3723
cosine_recall@10 0.6075
cosine_ndcg@10 0.5889
cosine_mrr@10 0.5732
cosine_map@100 0.6481

Information Retrieval

Metric Value
cosine_accuracy@1 0.5508
cosine_accuracy@3 0.5573
cosine_accuracy@5 0.5703
cosine_accuracy@10 0.6512
cosine_precision@1 0.5508
cosine_precision@3 0.5521
cosine_precision@5 0.5483
cosine_precision@10 0.4848
cosine_recall@1 0.0757
cosine_recall@3 0.2262
cosine_recall@5 0.367
cosine_recall@10 0.6016
cosine_ndcg@10 0.5828
cosine_mrr@10 0.5667
cosine_map@100 0.643

Information Retrieval

Metric Value
cosine_accuracy@1 0.55
cosine_accuracy@3 0.5543
cosine_accuracy@5 0.5707
cosine_accuracy@10 0.6538
cosine_precision@1 0.55
cosine_precision@3 0.5506
cosine_precision@5 0.5471
cosine_precision@10 0.4857
cosine_recall@1 0.0752
cosine_recall@3 0.2248
cosine_recall@5 0.3653
cosine_recall@10 0.6018
cosine_ndcg@10 0.5821
cosine_mrr@10 0.5659
cosine_map@100 0.6428

Information Retrieval

Metric Value
cosine_accuracy@1 0.5197
cosine_accuracy@3 0.5223
cosine_accuracy@5 0.5348
cosine_accuracy@10 0.6205
cosine_precision@1 0.5197
cosine_precision@3 0.5197
cosine_precision@5 0.5152
cosine_precision@10 0.4587
cosine_recall@1 0.0706
cosine_recall@3 0.2108
cosine_recall@5 0.3417
cosine_recall@10 0.5644
cosine_ndcg@10 0.5472
cosine_mrr@10 0.5343
cosine_map@100 0.6136

Information Retrieval

Metric Value
cosine_accuracy@1 0.4526
cosine_accuracy@3 0.4552
cosine_accuracy@5 0.4751
cosine_accuracy@10 0.5552
cosine_precision@1 0.4526
cosine_precision@3 0.4528
cosine_precision@5 0.4511
cosine_precision@10 0.4055
cosine_recall@1 0.0619
cosine_recall@3 0.1849
cosine_recall@5 0.3015
cosine_recall@10 0.5027
cosine_ndcg@10 0.4836
cosine_mrr@10 0.4681
cosine_map@100 0.5521

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 20,792 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 4 tokens
    • mean: 220.24 tokens
    • max: 415 tokens
    • min: 5 tokens
    • mean: 11.15 tokens
    • max: 41 tokens
  • Samples:
    positive anchor
    interpretations, if a common framework
    like the ICF is used”23, the unit recommends
    using the ICF for communications outside
    the association, particularly in research
    contexts.
    Health conditions (disorder or disease)
    Activities
    © WHO, International Classification of Functioning, Disability and Health, 2001
    Participation
    Body Functions
    and Structures
    Environmental
    Factors
    Personal
    Factors
    ICF usage in research communications for health disorders
    Physiol. Regul. Integr. Comp. Physiol. 2015, 309, R767–R779. [CrossRef]
    39. Laurentino, G.C.; Ugrinowitsch, C.; Roschel, H.; Aoki, M.S.; Soares, A.G.; Neves, M.; Aihara, A.Y.; Fernandes
    Laurentino et al. research on integrative physiology
    Telling your client to “push through your heels” when performing a squat or “explode
    through your hips or push through your feet” when performing jumping and sprinting
    movements are examples of internal cues. You also may utilize external cues to enhance motor learning and performance in all
    populations. External cues—or external focus of attention—direct a client’s attention
    toward the effect the movement will have on the surrounding environment and the movement outcome, as it relates to the exercise being performed (Winkelman et al., 2017;
    effect of external focus of attention on motor learning
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.2462 10 7.2777 - - - - -
0.4923 20 7.6341 - - - - -
0.7385 30 7.1497 - - - - -
0.9846 40 6.8322 0.5820 0.5741 0.5679 0.5308 0.4724
1.2462 50 6.779 - - - - -
1.4923 60 5.5133 - - - - -
1.7385 70 6.1867 - - - - -
1.9846 80 6.0276 0.5829 0.5798 0.5769 0.5409 0.4897
2.2462 90 4.971 - - - - -
2.4923 100 5.0184 - - - - -
2.7385 110 5.1473 - - - - -
2.9846 120 5.6456 0.5880 0.5830 0.5780 0.5472 0.4872
3.2462 130 5.0487 - - - - -
3.4923 140 4.7154 - - - - -
3.7385 150 5.1362 - - - - -
3.9846 160 4.931 0.5889 0.5828 0.5821 0.5472 0.4836
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.0.2
  • Transformers: 4.51.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
85
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kokojake/modernbert-embed-base-fitness-health-matryoshka-epoch-15

Evaluation results