EmbeddingGemma-300m finetuned on the Medical Instruction and RetrIeval Dataset (MIRIAD)

This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the miriad/miriad-4.4M dataset (specifically the first 100.000 question-passage pairs from tomaarsen/miriad-4.4M-split). It maps sentences & documents to a 768-dimensional dense vector space and can be used for medical information retrieval, specifically designed for searching for passages (up to 1k tokens) of scientific medical papers using detailed medical questions.

This model has been trained using code from our EmbeddingGemma blogpost to showcase how the EmbeddingGemma model can be finetuned on specific domains/tasks for even stronger performance. It is not affiliated with Google.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google/embeddinggemma-300m
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence-transformers/embeddinggemma-300m-medical")
# Run inference
queries = [
    "What are some potential limitations in projecting the future demand for joint replacement surgeries?\n",
]
documents = [
    "We also asked whether current trends are advancing according to earlier expectations [6] .\n\n Our study has several limitations. Our projections are based on the historical growth trajectory of joint replacement surgeries, and do not take into account potential limitations in the availability of surgeons or limited economic resources by private and public payers and hospitals in the future. For example, a shortage in the number of surgeons will have a substantial influence on the actual number of procedures that are performed. We also have not incorporated the potential for future alternative technologies, such as cartilage regeneration or tissue engineering, or drug therapies that limit the progression of joint diseases, which may preempt the need for TJR. We were also unable to account for the potential impact of changes in economy, which may place additional economic burden on patients to pay substantial out-of pocket expenses for these procedures, depending on their insurance coverage. Our study also did not consider potential changes in healthcare policies, such as adoption of volume standards or regionalization of TJR to high volume centers [5] , which could limit the access to care and decrease the future demand. The above economic, policy, and scientific factors cannot be readily incorporated in the statistical model. Our study was also focused on the procedural trends in the U.S.; followup research may include an analysis of trends in other countries, though the availability of historical TJR trends in other countries may be limited. Nonetheless, these limitations in no way diminish the importance of conducting and regularly updating surgical projections to help guide future research, surgeon training, and public health policy decisions. Our study also incorporated a more conservative projection, which relied only on the future changes in population growth, while maintaining current rates of adoption of TJR. Despite these limitations, our current findings are expected to have implications in the private coverage and reimbursement of joint replacement procedures in the future, as patients less than 65 years of age are not typically covered by Medicare, which today funds the majority of total joint replacement procedures in the United States.\n\n We found the relative size of the young patient population for TJR has grown between 1993 and 2006. While 25% to 32% of primary or revision TJRs were performed in patients less than 65 years old in 1993, these proportions have increased to 40% to 46% in the most recent NIS data. The increasing trend in younger patients undergoing TJR has also been reported for different, but partly overlapping, historical periods. For example, Jain et al. reported that the proportion of primary TKA patients aged less than 60 years increased from 12.5% to 19.5% (+56%) between 1990-1993 and 1998-2000 [4] . In addition, for patients aged under 70 years, the proportion increased by 9% from 45.6% to 49.6%. Due to the difference in the stratification by age categories, we were unable to make a direct comparison with the data by Jain et al. [4] . However, our findings that the historical volume of TJR procedures in the younger patient population have been increasing is consistent with these previously reported trends.\n\n While we previously forecasted an increase in demand for primary hip and knee replacement in 2030 by 174% and 673% [6] , respectively, the current study underscores the contribution that young patients are expected to play in the Fig. 2A -B Historical incidence of primary total hip arthroplasty (A) and primary total knee arthroplasty (B) from 1993-2006, superimposed with previous projections [6] , and the updated projections from the current study. The dotted lines represent the 95% CI for the projections.\n\n future utilization of primary TJR surgery, if historical trends in prevalence continue into the future. The statistical modeling approach we have employed in the current and previous study fits a multivariate but linear Poisson regression model to the historical prevalence of TJR procedures. However, because the size of the population subgroups is free to change nonlinearly in the future based on the Census Bureau's projection, the actual projected incidence of surgical demand is therefore not constrained to be a linear function over time. The demand for primary hip and knee arthroplasty between 2004 and 2006 generally exceeded our previous projections, which employed an identical methodology. However, we are unable to judge, based on the limited window of new data for validation, whether a more complex modeling approach would provide a more reliable forecast of demand for surgical procedures.\n\n Our previous methodology provided a reasonable shortterm forecast of the demand for revision hip and knee surgeries between 2004 and 2006. In particular, for 2006, we observed a slight decrease in the estimated number of primary THA and TKA procedures compared to 2005 (Fig. 2 ), but this decrease fell within the uncertainty of the estimates.",
    '23 In cases of splenic B-cell lymphomas that do not fulfill the World Health Organization 2008 criteria for better established or provisional entities, a diagnosis of splenic B-cell lymphoma/leukemia unclassifiable should be preferred.\n\n Differentiating SMZL from lymphoplasmacytic lymphoma (LPL) may be challenging, particularly on BM biopsy, because SMZL may show a monoclonal serum component and plasmacytic morphology, and both entities lack a distinct phenotype. LPL, which develops primarily in the spleen, homogeneously infiltrates the white pulp without MZ pattern and without monocytoid B cells. MYD88 L265P mutation, present in almost all cases of LPL and rare in SMZL, may be a useful diagnostic tool. 25 A further diagnostic pitfall may be represented by detection of a BM clonal infiltrate in cases of non-CLL monoclonal B lymphocytosis. 26 Finally, secondary splenic localization of EMZL presents a pattern that overlaps with that of SMZL, but clinical dissemination is crucial for differentiation. Splenic involvement virtually excludes a diagnosis of nodal MZL; apart from the differential expression of IRTA1, which is negative in SMZL, 11, 22 clinical correlation is critical for reaching a correct diagnosis when dealing with a BM biopsy.\n\n The cellular origin of SMZL is still debated, and its identification is essential to correctly classify this lymphoma and to elucidate its pathobiology. According to the World Health Organization classification, the postulated normal counterpart of SMZL is a B cell of unknown differentiation stage. 11 According to studies of Ig gene rearrangements, a derivation from antigen-experienced B cells has been postulated in the \n\n -, ,25% of cases; -/1, 25%-50% of cases; 1/-, 50%-75% of cases; 1, .75% of cases.\n\n FL, follicular lymphoma; NMZL, nodal marginal zone lymphoma; SDRPL, splenic diffuse red pulp lymphoma.\n\n *Sporadic cases reported.\n\n vast majority of SMZL. [27] [28] [29] Skewing of the Ig gene repertoire toward the use of the IGHV1-2*04 allele in SMZL suggests that they could derive from a progenitor population adapted in the spleen to particular antigenic challenges, although definitive answers on the issue of the cell of origin of SMZL will admittedly be provided only through multidisciplinary examination of the immune repertoire and transcriptome of normal B-cell populations of the spleen compartments.\n\n The contribution of antigen stimulation to SMZL pathogenesis is suggested by the highly restricted Ig gene repertoire, including stereotyped configuration of the B-cell receptor (BCR) in ;10% of cases 30 and selective usage of the Ig heavy chain variable IGHV1-2*04 allele in ;30%.\n\n 31 Although the epitope recognized by IGHV1-2*04-expressing BCR is unknown, the features of IGHV1-2*04 rearrangements, including minimal somatic mutations and the long complementarity-determining region 3 sequence with common motifs, suggest a possible selection of T-cell-independent MZ B cells by superantigens and thus a role of antigenic drive in lymphomagenesis.\n\n Cytogenetic and genetic lesions SMZL lacks recurrent chromosomal translocations, including translocations that are typical of other lymphoma types such as the t(14;18) translocation affecting BCL2 in follicular lymphoma, the t(11;14) translocation affecting CCND1 in MCL, and the t(11;18), t(14;18), and t(1;14) translocations affecting the BIRC3/MALT1, MALT1, and BCL10 genes, respectively, in EMZL. The lack of these abnormalities may help distinguish SMZL from pathologically mimicking tumors. Approximately 30% of SMZL show hemizygous 7q deletion, which is also frequently seen in splenic B-cell lymphoma/leukemia unclassifiable, but rarely in other lymphoma subtypes.\n\n 32,33 The gene(s) targeted by the 7q deletion remain obscure despite the combined investigation of genomic and transcriptomic profiles and mutation analysis of a number of candidate genes.\n\n Unbiased genomic studies have unraveled the typical coding genome of SMZL. [37] [38] [39] [40] [41] [42] [43] However, because of the limited number of SMZL genomes and/or exomes available so far, the full spectrum of lesions that contribute to the malignant transformation of SMZL remains unknown.',
    'The chronic inflammation of rheumatoid arthritis mainly affects the synovial membranes of multiple joints and potentially involves vasculitis and pulmonary, ocular and cardiovascular systems. After the onset of the inflammation, the synovium changes dramatically (Edwards, 1998) . The synovial intima is filled with B-lymphocytes engaged in antibody production against unknown antigens (Bläß, Engel, & Burmester, 1999) . Infiltrations of plasma cells into the synovia are highly associated with inflammation of rheumatoid arthritis (Dong, Li, Liu, & Zhu, 2009; Reparon-Schuijt et al., 1998) . The resulting immune complexes activate macrophages and complement and drive a T-cell dependent antibody production in the synovial tissue. The immune complexes are mainly rheumatoid factors that are defined as auto-antibodies against Fc-fragments of IgG (Tighe & Carson, 2001) and occur in about 90% of rheumatoid arthritis patients (Dörner, Egerer, Feist, & Burmester, 2004) . Normally\n\n Rheumatoid arthritis is a desastrous progressive autoimmune disease for which no causative cure is available, simply because the eliciting antigens are unknown despite intesive research efforts. Most patients have also Rheumatiod factor activity where antibodies bind to their own structures within the constant region. Here we considered, wether mutations in the constant regions of immunoglobulins could represent the eliciting antigens.\n\n rheumatoid factors bind to an antibody-antigen complex and facilitate clearance by binding to Fcreceptors, fixation of complement and antigen processing by B-lymphocytes (Carson, Chen, & Kipps, 1991) . The rheumatoid factor binding site resides in CH 2 -CH 3 domain of Fc (Artandi, Calame, Morrison, & Bonagura, 1992; Bonagura et al., 1998; Sutton et al., 1998) . However, rheumatoid factors are also found in other conditions of B-cell hyperreactivity.\n\n The driving force for autoimmune diseases are self-reactive antibodies directed against "altered self" which can be modified proteins (Trouw, Huizinga, & Toes, 2013) . So far posttranslational modifications have been detected in citrullinated antigens that are highly specific for rheumatoid arthritis. Citrulline residues arise from arginine by peptidyl arginine deiminase. However, this posttranslational modification cannot fully explain the pathogenesis of rheumatoid arthritis (Klareskog, Amara, & Malmström, 2014) .\n\n Changes of IgG glycosylation in the IgG were also thought to be involved in rheumatoid arthritis (Parekh et al., 1985) , but recent studies showed that the glycosylation loci are not associated with rheumatoid arthritis (Yarwood et al., 2016) .\n\n Other modifications include oxidized IgG that are recognized by circulating lymphocytes leading to a proliferative response and secrete IL-2 (Grinnell, Yoshida, & Jasin, 2005) . IgG is also covalently cross linked by reactive oxygen and nitric oxide products secreted by inflammatory cells (Uesugi, Hayashi, & Jasin, 1998) .\n\n IgG has long been implicated in the pathogenesis of rheumatoid arthritis. When immune complexes from synovial fluids of patients with rheumatoid arthritis were analyzed for their constituents, mainly IgG and IgM antibodies were found (Male & Roitt, 1981) . They did not contain antibodies with rheumatoid factor specificity and a structural alteration of the IgG was considered as a cause for antigenicity (Carter, Makh, Ponsford, & Elson, 1989) . Sutton, Corper, Bonagura, and Taussig (2000) suggested that rheumatoid factors bind Fc-region and foreign antigen antigens simultaneously and the affinity is potentiated by somatic mutation. Indeed, Fc-binding antibodies from rheumatoid arthritis synovial fluids show imprints of an antigen-dependent process of somatic hypermutation and clonal selection in the variable regions of the L-and H-chains (Van Esch et al., 2003) . It is clear that the synovium of patients with rheumatoid arthritis is prone to mutations (Firestein, 2010) and several multi-evidence genes in genome wide studies have been identified (Whitaker et al., 2015) .',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.7784, -0.0542,  0.0875]])

Evaluation

Metrics

Information Retrieval

  • Datasets: miriad-eval-1kq-31kd and miriad-test-1kq-31kd, i.e. 1k eval/test queries and 31k passages (of which 1k eval/test passages and 30k train passages)
  • Evaluated with InformationRetrievalEvaluator
Metric miriad-eval-1kq-31kd miriad-test-1kq-31kd
cosine_accuracy@1 0.822 0.802
cosine_accuracy@3 0.926 0.907
cosine_accuracy@5 0.945 0.942
cosine_accuracy@10 0.976 0.963
cosine_precision@1 0.822 0.802
cosine_precision@3 0.3087 0.3023
cosine_precision@5 0.189 0.1884
cosine_precision@10 0.0976 0.0963
cosine_recall@1 0.822 0.802
cosine_recall@3 0.926 0.907
cosine_recall@5 0.945 0.942
cosine_recall@10 0.976 0.963
cosine_ndcg@10 0.9026 0.8862
cosine_mrr@10 0.8788 0.8611
cosine_map@100 0.8797 0.863

Training Details

Training Dataset

miriad-4.4_m-split

  • Dataset: miriad-4.4_m-split at 596b9ab
  • Size: 100,000 training samples
  • Columns: question and passage_text
  • Approximate statistics based on the first 1000 samples:
    question passage_text
    type string string
    details
    • min: 7 tokens
    • mean: 20.79 tokens
    • max: 60 tokens
    • min: 481 tokens
    • mean: 945.6 tokens
    • max: 1024 tokens
  • Samples:
    question passage_text
    What factors may contribute to increased pulmonary conduit durability in patients who undergo the Ross operation compared to those with right ventricular outflow tract obstruction?
    I n 1966, Ross and Somerville 1 reported the first use of an aortic homograft to establish right ventricle-to-pulmonary artery continuity in a patient with tetralogy of Fallot and pulmonary atresia. Since that time, pulmonary position homografts have been used in a variety of right-sided congenital heart lesions. Actuarial 5-year homograft survivals for cryopreserved homografts are reported to range between 55% and 94%, with the shortest durability noted in patients less than 2 years of age. 4 Pulmonary position homografts also are used to replace pulmonary autografts explanted to repair left-sided outflow disease (the Ross operation). Several factors may be likely to favor increased pulmonary conduit durability in Ross patients compared with those with right ventricular outflow tract obstruction, including later age at operation (allowing for larger homografts), more normal pulmonary artery architecture, absence of severe right ventricular hypertrophy, and more natural positioning of ...
    How does MCAM expression in hMSC affect the growth and maintenance of hematopoietic progenitors? After culture in a 3-dimensional hydrogel-based matrix, which constitutes hypoxic conditions, MCAM expression is lost. Concordantly, Tormin et al. demonstrated that MCAM is down-regulated under hypoxic conditions. 10 Furthermore, it was shown by others and our group that oxygen tension causes selective modification of hematopoietic cell and mesenchymal stromal cell interactions in co-culture systems as well as influence HSPC metabolism. [44] [45] [46] Thus, the observed differences between Sharma et al. and our data in HSPC supporting capacity of hMSC are likely due to the different culture conditions used. Further studies are required to clarify the influence of hypoxia in our model system. Altogether these findings provide further evidence for the importance of MCAM in supporting HSPC. Furthermore, previous reports have shown that MCAM is down-regulated in MSC after several passages as well as during aging and differentiation. 19, 47 Interestingly, MCAM overexpression in hMSC enhance...
    What is the relationship between Fanconi anemia and breast and ovarian cancer susceptibility genes?
    ( 31 ) , of which 5% -10 % may be caused by genetic factors ( 32 ) , up to half a million of these patients may be at risk of secondary hereditary neoplasms. The historic observation of twofold to fi vefold increased risks of cancers of the ovary, thyroid, and connective tissue after breast cancer ( 33 ) presaged the later syndromic association of these tumors with inherited mutations of BRCA1, BRCA2, PTEN, and p53 ( 16 ) . By far the largest cumulative risk of a secondary cancer in BRCA mutation carriers is associated with cancer in the contralateral breast, which may reach a risk of 29.5% at 10 years ( 34 ) . The Breast Cancer Linkage Consortium ( 35 , 36 ) also documented threefold to fi vefold increased risks of subsequent cancers of prostate, pancreas, gallbladder, stomach, skin (melanoma), and uterus in BRCA2 mutation carriers and twofold increased risks of prostate and pancreas cancer in BRCA1 mutation carriers; these results are based largely on self-reported family history inf...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 8,
        "gather_across_devices": false
    }
    

Evaluation Dataset

miriad-4.4_m-split

  • Dataset: miriad-4.4_m-split at 596b9ab
  • Size: 1,000 evaluation samples
  • Columns: question and passage_text
  • Approximate statistics based on the first 1000 samples:
    question passage_text
    type string string
    details
    • min: 7 tokens
    • mean: 20.91 tokens
    • max: 61 tokens
    • min: 465 tokens
    • mean: 943.1 tokens
    • max: 1024 tokens
  • Samples:
    question passage_text
    What are some hereditary cancer syndromes that can result in various forms of cancer?
    Hereditary Cancer Syndromes, including Hereditary Breast and Ovarian Cancer (HBOC) and Lynch Syndrome (LS), can result in various forms of cancer due to germline mutations in cancer predisposition genes. While the major contributory genes for these syndromes have been identified and well-studied (BRCA1/ BRCA2 for HBOC and MSH2/MSH6/MLH1/PMS2/ EPCAM for LS), there remains a large percentage of associated cancer cases that are negative for germline mutations in these genes, including 80% of women with a personal or family history of breast cancer who are negative for BRCA1/2 mutations [1] . Similarly, between 30 and 50% of families fulfill stringent criteria for LS and test negative for germline mismatch repair gene mutations [2] . Adding complexity to these disorders is the significant overlap in the spectrum of cancers observed between various hereditary cancer syndromes, including many cancer susceptibility syndromes. Some that contribute to elevated breast cancer risk include Li-Frau...
    How do MAK-4 and MAK-5 exert their antioxidant properties?
    Hybrid F1 mice were injected with urethane (300 mg/kg) at 8 days of age. A group was then put on a MAK-supplemented diet, another group was fed a standard pellet diet. At 36 weeks of age the mice were sacrificed and the livers examined for the presence of tumors mouse (Panel A) and for the number of nodules per mouse (Panel B) (* p < 0.05, ** P < 0.001). Statistical analysis was performed by Two Way ANOVA Test followed by Post Hoc Bonferroni analysis.

    We than measured the influence of the MAK-4+5 combination on the expression of the three liver-specific connexins (cx26, cx32, and cx43). The level of cx26 expression was similar in all the groups of mice treated with the MAK-supplemented diet and in the control (Figure 4, Panel A) . A significant, time-dependent increase in cx32 was observed in the liver of all the groups of MAK treated mice compared to the normal diet-fed controls. Cx32 expression increased 2-fold after 1 week of treatment, and 3-to 4-fold at 3 months (Figure 4, Pane...
    What are the primary indications for a decompressive craniectomy, and what role does neurocritical care play in determining the suitability of a patient for this procedure? Decompressive craniectomy is a valid neurosurgical strategy now a day as an alternative to control an elevated intracranial pressure (ICP) and controlling the risk of uncal and/or subfalcine herniation, in refractory cases to the postural, ventilator, and pharmacological measures to control it. The neurocritical care and the ICP monitorization are key determinants to identify and postulate the inclusion criteria to consider a patient as candidate to this procedure, as it is always considered a rescue surgical technique. Head trauma and ischemic or hemorrhagic cerebrovascular disease with progressive deterioration due to mass effect are some of the cases that may require a decompressive craniectomy with its different variants. However, this procedure per se can have complications described in the postcraniectomy syndrome and may occur in short, medium, or even long term.

    1,2 The paradoxical herniation is a condition in which there is a deviation of the midline with mass effect, even t...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 8,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • prompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss miriad-eval-1kq-31kd_cosine_ndcg@10 miriad-test-1kq-31kd_cosine_ndcg@10
-1 -1 - - 0.8474 0.8340
0.0256 20 0.1019 - - -
0.0512 40 0.0444 - - -
0.0767 60 0.0408 - - -
0.1023 80 0.0462 - - -
0.1279 100 0.0542 0.0525 0.8616 -
0.1535 120 0.0454 - - -
0.1790 140 0.0403 - - -
0.2046 160 0.0463 - - -
0.2302 180 0.0508 - - -
0.2558 200 0.0497 0.0449 0.8643 -
0.2813 220 0.0451 - - -
0.3069 240 0.0445 - - -
0.3325 260 0.0489 - - -
0.3581 280 0.0452 - - -
0.3836 300 0.0461 0.0406 0.8832 -
0.4092 320 0.0415 - - -
0.4348 340 0.04 - - -
0.4604 360 0.0399 - - -
0.4859 380 0.0423 - - -
0.5115 400 0.0352 0.0316 0.8823 -
0.5371 420 0.0408 - - -
0.5627 440 0.0356 - - -
0.5882 460 0.0371 - - -
0.6138 480 0.0276 - - -
0.6394 500 0.028 0.0280 0.8807 -
0.6650 520 0.0302 - - -
0.6905 540 0.0345 - - -
0.7161 560 0.0325 - - -
0.7417 580 0.033 - - -
0.7673 600 0.0314 0.0264 0.8910 -
0.7928 620 0.033 - - -
0.8184 640 0.029 - - -
0.8440 660 0.0396 - - -
0.8696 680 0.0266 - - -
0.8951 700 0.0262 0.0240 0.8968 -
0.9207 720 0.0262 - - -
0.9463 740 0.0327 - - -
0.9719 760 0.0293 - - -
0.9974 780 0.0304 - - -
-1 -1 - - 0.9026 0.8862

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.828 kWh
  • Carbon Emitted: 0.331 kg of CO2
  • Hours Used: 5.520 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.2.0.dev0
  • Transformers: 4.56.0.dev0
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.6.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
74
Safetensors
Model size
303M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for sentence-transformers/embeddinggemma-300m-medical

Finetuned
(22)
this model

Dataset used to train sentence-transformers/embeddinggemma-300m-medical

Evaluation results