EmbeddingGemma-300m finetuned on the Medical Instruction and RetrIeval Dataset (MIRIAD)

This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the miriad/miriad-4.4M dataset (specifically the first 100.000 question-passage pairs from tomaarsen/miriad-4.4M-split). It maps sentences & documents to a 768-dimensional dense vector space and can be used for medical information retrieval, specifically designed for searching for passages (up to 1k tokens) of scientific medical papers using detailed medical questions.

This model has been trained using code from our EmbeddingGemma blogpost to showcase how the EmbeddingGemma model can be finetuned on specific domains/tasks for even stronger performance. It is not affiliated with Google.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: google/embeddinggemma-300m
Maximum Sequence Length: 1024 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- miriad-4.4_m-split (the first 100.000 samples of the default subset)
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence-transformers/embeddinggemma-300m-medical")
# Run inference
queries = [
    "What are some potential limitations in projecting the future demand for joint replacement surgeries?\n",
]
documents = [
    "We also asked whether current trends are advancing according to earlier expectations [6] .\n\n Our study has several limitations. Our projections are based on the historical growth trajectory of joint replacement surgeries, and do not take into account potential limitations in the availability of surgeons or limited economic resources by private and public payers and hospitals in the future. For example, a shortage in the number of surgeons will have a substantial influence on the actual number of procedures that are performed. We also have not incorporated the potential for future alternative technologies, such as cartilage regeneration or tissue engineering, or drug therapies that limit the progression of joint diseases, which may preempt the need for TJR. We were also unable to account for the potential impact of changes in economy, which may place additional economic burden on patients to pay substantial out-of pocket expenses for these procedures, depending on their insurance coverage. Our study also did not consider potential changes in healthcare policies, such as adoption of volume standards or regionalization of TJR to high volume centers [5] , which could limit the access to care and decrease the future demand. The above economic, policy, and scientific factors cannot be readily incorporated in the statistical model. Our study was also focused on the procedural trends in the U.S.; followup research may include an analysis of trends in other countries, though the availability of historical TJR trends in other countries may be limited. Nonetheless, these limitations in no way diminish the importance of conducting and regularly updating surgical projections to help guide future research, surgeon training, and public health policy decisions. Our study also incorporated a more conservative projection, which relied only on the future changes in population growth, while maintaining current rates of adoption of TJR. Despite these limitations, our current findings are expected to have implications in the private coverage and reimbursement of joint replacement procedures in the future, as patients less than 65 years of age are not typically covered by Medicare, which today funds the majority of total joint replacement procedures in the United States.\n\n We found the relative size of the young patient population for TJR has grown between 1993 and 2006. While 25% to 32% of primary or revision TJRs were performed in patients less than 65 years old in 1993, these proportions have increased to 40% to 46% in the most recent NIS data. The increasing trend in younger patients undergoing TJR has also been reported for different, but partly overlapping, historical periods. For example, Jain et al. reported that the proportion of primary TKA patients aged less than 60 years increased from 12.5% to 19.5% (+56%) between 1990-1993 and 1998-2000 [4] . In addition, for patients aged under 70 years, the proportion increased by 9% from 45.6% to 49.6%. Due to the difference in the stratification by age categories, we were unable to make a direct comparison with the data by Jain et al. [4] . However, our findings that the historical volume of TJR procedures in the younger patient population have been increasing is consistent with these previously reported trends.\n\n While we previously forecasted an increase in demand for primary hip and knee replacement in 2030 by 174% and 673% [6] , respectively, the current study underscores the contribution that young patients are expected to play in the Fig. 2A -B Historical incidence of primary total hip arthroplasty (A) and primary total knee arthroplasty (B) from 1993-2006, superimposed with previous projections [6] , and the updated projections from the current study. The dotted lines represent the 95% CI for the projections.\n\n future utilization of primary TJR surgery, if historical trends in prevalence continue into the future. The statistical modeling approach we have employed in the current and previous study fits a multivariate but linear Poisson regression model to the historical prevalence of TJR procedures. However, because the size of the population subgroups is free to change nonlinearly in the future based on the Census Bureau's projection, the actual projected incidence of surgical demand is therefore not constrained to be a linear function over time. The demand for primary hip and knee arthroplasty between 2004 and 2006 generally exceeded our previous projections, which employed an identical methodology. However, we are unable to judge, based on the limited window of new data for validation, whether a more complex modeling approach would provide a more reliable forecast of demand for surgical procedures.\n\n Our previous methodology provided a reasonable shortterm forecast of the demand for revision hip and knee surgeries between 2004 and 2006. In particular, for 2006, we observed a slight decrease in the estimated number of primary THA and TKA procedures compared to 2005 (Fig. 2 ), but this decrease fell within the uncertainty of the estimates.",
    '23 In cases of splenic B-cell lymphomas that do not fulfill the World Health Organization 2008 criteria for better established or provisional entities, a diagnosis of splenic B-cell lymphoma/leukemia unclassifiable should be preferred.\n\n Differentiating SMZL from lymphoplasmacytic lymphoma (LPL) may be challenging, particularly on BM biopsy, because SMZL may show a monoclonal serum component and plasmacytic morphology, and both entities lack a distinct phenotype. LPL, which develops primarily in the spleen, homogeneously infiltrates the white pulp without MZ pattern and without monocytoid B cells. MYD88 L265P mutation, present in almost all cases of LPL and rare in SMZL, may be a useful diagnostic tool. 25 A further diagnostic pitfall may be represented by detection of a BM clonal infiltrate in cases of non-CLL monoclonal B lymphocytosis. 26 Finally, secondary splenic localization of EMZL presents a pattern that overlaps with that of SMZL, but clinical dissemination is crucial for differentiation. Splenic involvement virtually excludes a diagnosis of nodal MZL; apart from the differential expression of IRTA1, which is negative in SMZL, 11, 22 clinical correlation is critical for reaching a correct diagnosis when dealing with a BM biopsy.\n\n The cellular origin of SMZL is still debated, and its identification is essential to correctly classify this lymphoma and to elucidate its pathobiology. According to the World Health Organization classification, the postulated normal counterpart of SMZL is a B cell of unknown differentiation stage. 11 According to studies of Ig gene rearrangements, a derivation from antigen-experienced B cells has been postulated in the \n\n -, ,25% of cases; -/1, 25%-50% of cases; 1/-, 50%-75% of cases; 1, .75% of cases.\n\n FL, follicular lymphoma; NMZL, nodal marginal zone lymphoma; SDRPL, splenic diffuse red pulp lymphoma.\n\n *Sporadic cases reported.\n\n vast majority of SMZL. [27] [28] [29] Skewing of the Ig gene repertoire toward the use of the IGHV1-2*04 allele in SMZL suggests that they could derive from a progenitor population adapted in the spleen to particular antigenic challenges, although definitive answers on the issue of the cell of origin of SMZL will admittedly be provided only through multidisciplinary examination of the immune repertoire and transcriptome of normal B-cell populations of the spleen compartments.\n\n The contribution of antigen stimulation to SMZL pathogenesis is suggested by the highly restricted Ig gene repertoire, including stereotyped configuration of the B-cell receptor (BCR) in ;10% of cases 30 and selective usage of the Ig heavy chain variable IGHV1-2*04 allele in ;30%.\n\n 31 Although the epitope recognized by IGHV1-2*04-expressing BCR is unknown, the features of IGHV1-2*04 rearrangements, including minimal somatic mutations and the long complementarity-determining region 3 sequence with common motifs, suggest a possible selection of T-cell-independent MZ B cells by superantigens and thus a role of antigenic drive in lymphomagenesis.\n\n Cytogenetic and genetic lesions SMZL lacks recurrent chromosomal translocations, including translocations that are typical of other lymphoma types such as the t(14;18) translocation affecting BCL2 in follicular lymphoma, the t(11;14) translocation affecting CCND1 in MCL, and the t(11;18), t(14;18), and t(1;14) translocations affecting the BIRC3/MALT1, MALT1, and BCL10 genes, respectively, in EMZL. The lack of these abnormalities may help distinguish SMZL from pathologically mimicking tumors. Approximately 30% of SMZL show hemizygous 7q deletion, which is also frequently seen in splenic B-cell lymphoma/leukemia unclassifiable, but rarely in other lymphoma subtypes.\n\n 32,33 The gene(s) targeted by the 7q deletion remain obscure despite the combined investigation of genomic and transcriptomic profiles and mutation analysis of a number of candidate genes.\n\n Unbiased genomic studies have unraveled the typical coding genome of SMZL. [37] [38] [39] [40] [41] [42] [43] However, because of the limited number of SMZL genomes and/or exomes available so far, the full spectrum of lesions that contribute to the malignant transformation of SMZL remains unknown.',
    'The chronic inflammation of rheumatoid arthritis mainly affects the synovial membranes of multiple joints and potentially involves vasculitis and pulmonary, ocular and cardiovascular systems. After the onset of the inflammation, the synovium changes dramatically (Edwards, 1998) . The synovial intima is filled with B-lymphocytes engaged in antibody production against unknown antigens (Bläß, Engel, & Burmester, 1999) . Infiltrations of plasma cells into the synovia are highly associated with inflammation of rheumatoid arthritis (Dong, Li, Liu, & Zhu, 2009; Reparon-Schuijt et al., 1998) . The resulting immune complexes activate macrophages and complement and drive a T-cell dependent antibody production in the synovial tissue. The immune complexes are mainly rheumatoid factors that are defined as auto-antibodies against Fc-fragments of IgG (Tighe & Carson, 2001) and occur in about 90% of rheumatoid arthritis patients (Dörner, Egerer, Feist, & Burmester, 2004) . Normally\n\n Rheumatoid arthritis is a desastrous progressive autoimmune disease for which no causative cure is available, simply because the eliciting antigens are unknown despite intesive research efforts. Most patients have also Rheumatiod factor activity where antibodies bind to their own structures within the constant region. Here we considered, wether mutations in the constant regions of immunoglobulins could represent the eliciting antigens.\n\n rheumatoid factors bind to an antibody-antigen complex and facilitate clearance by binding to Fcreceptors, fixation of complement and antigen processing by B-lymphocytes (Carson, Chen, & Kipps, 1991) . The rheumatoid factor binding site resides in CH 2 -CH 3 domain of Fc (Artandi, Calame, Morrison, & Bonagura, 1992; Bonagura et al., 1998; Sutton et al., 1998) . However, rheumatoid factors are also found in other conditions of B-cell hyperreactivity.\n\n The driving force for autoimmune diseases are self-reactive antibodies directed against "altered self" which can be modified proteins (Trouw, Huizinga, & Toes, 2013) . So far posttranslational modifications have been detected in citrullinated antigens that are highly specific for rheumatoid arthritis. Citrulline residues arise from arginine by peptidyl arginine deiminase. However, this posttranslational modification cannot fully explain the pathogenesis of rheumatoid arthritis (Klareskog, Amara, & Malmström, 2014) .\n\n Changes of IgG glycosylation in the IgG were also thought to be involved in rheumatoid arthritis (Parekh et al., 1985) , but recent studies showed that the glycosylation loci are not associated with rheumatoid arthritis (Yarwood et al., 2016) .\n\n Other modifications include oxidized IgG that are recognized by circulating lymphocytes leading to a proliferative response and secrete IL-2 (Grinnell, Yoshida, & Jasin, 2005) . IgG is also covalently cross linked by reactive oxygen and nitric oxide products secreted by inflammatory cells (Uesugi, Hayashi, & Jasin, 1998) .\n\n IgG has long been implicated in the pathogenesis of rheumatoid arthritis. When immune complexes from synovial fluids of patients with rheumatoid arthritis were analyzed for their constituents, mainly IgG and IgM antibodies were found (Male & Roitt, 1981) . They did not contain antibodies with rheumatoid factor specificity and a structural alteration of the IgG was considered as a cause for antigenicity (Carter, Makh, Ponsford, & Elson, 1989) . Sutton, Corper, Bonagura, and Taussig (2000) suggested that rheumatoid factors bind Fc-region and foreign antigen antigens simultaneously and the affinity is potentiated by somatic mutation. Indeed, Fc-binding antibodies from rheumatoid arthritis synovial fluids show imprints of an antigen-dependent process of somatic hypermutation and clonal selection in the variable regions of the L-and H-chains (Van Esch et al., 2003) . It is clear that the synovium of patients with rheumatoid arthritis is prone to mutations (Firestein, 2010) and several multi-evidence genes in genome wide studies have been identified (Whitaker et al., 2015) .',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.7784, -0.0542,  0.0875]])

Evaluation

Metrics

Information Retrieval

Datasets: miriad-eval-1kq-31kd and miriad-test-1kq-31kd, i.e. 1k eval/test queries and 31k passages (of which 1k eval/test passages and 30k train passages)
Evaluated with InformationRetrievalEvaluator

Metric	miriad-eval-1kq-31kd	miriad-test-1kq-31kd
cosine_accuracy@1	0.822	0.802
cosine_accuracy@3	0.926	0.907
cosine_accuracy@5	0.945	0.942
cosine_accuracy@10	0.976	0.963
cosine_precision@1	0.822	0.802
cosine_precision@3	0.3087	0.3023
cosine_precision@5	0.189	0.1884
cosine_precision@10	0.0976	0.0963
cosine_recall@1	0.822	0.802
cosine_recall@3	0.926	0.907
cosine_recall@5	0.945	0.942
cosine_recall@10	0.976	0.963
cosine_ndcg@10	0.9026	0.8862
cosine_mrr@10	0.8788	0.8611
cosine_map@100	0.8797	0.863

Training Details

Training Dataset

miriad-4.4_m-split

Dataset: miriad-4.4_m-split at 596b9ab
Size: 100,000 training samples
Columns: question and passage_text
Approximate statistics based on the first 1000 samples:
question passage_text
type string string
details
min: 7 tokens
mean: 20.79 tokens
max: 60 tokens

min: 481 tokens
mean: 945.6 tokens
max: 1024 tokens

	question	passage_text
type	string	string
details	min: 7 tokens mean: 20.79 tokens max: 60 tokens	min: 481 tokens mean: 945.6 tokens max: 1024 tokens

Samples:

question	passage_text
`What factors may contribute to increased pulmonary conduit durability in patients who undergo the Ross operation compared to those with right ventricular outflow tract obstruction?`	I n 1966, Ross and Somerville 1 reported the first use of an aortic homograft to establish right ventricle-to-pulmonary artery continuity in a patient with tetralogy of Fallot and pulmonary atresia. Since that time, pulmonary position homografts have been used in a variety of right-sided congenital heart lesions. Actuarial 5-year homograft survivals for cryopreserved homografts are reported to range between 55% and 94%, with the shortest durability noted in patients less than 2 years of age. 4 Pulmonary position homografts also are used to replace pulmonary autografts explanted to repair left-sided outflow disease (the Ross operation). Several factors may be likely to favor increased pulmonary conduit durability in Ross patients compared with those with right ventricular outflow tract obstruction, including later age at operation (allowing for larger homografts), more normal pulmonary artery architecture, absence of severe right ventricular hypertrophy, and more natural positioning of ...
`How does MCAM expression in hMSC affect the growth and maintenance of hematopoietic progenitors?`	After culture in a 3-dimensional hydrogel-based matrix, which constitutes hypoxic conditions, MCAM expression is lost. Concordantly, Tormin et al. demonstrated that MCAM is down-regulated under hypoxic conditions. 10 Furthermore, it was shown by others and our group that oxygen tension causes selective modification of hematopoietic cell and mesenchymal stromal cell interactions in co-culture systems as well as influence HSPC metabolism. [44] [45] [46] Thus, the observed differences between Sharma et al. and our data in HSPC supporting capacity of hMSC are likely due to the different culture conditions used. Further studies are required to clarify the influence of hypoxia in our model system. Altogether these findings provide further evidence for the importance of MCAM in supporting HSPC. Furthermore, previous reports have shown that MCAM is down-regulated in MSC after several passages as well as during aging and differentiation. 19, 47 Interestingly, MCAM overexpression in hMSC enhance...
`What is the relationship between Fanconi anemia and breast and ovarian cancer susceptibility genes?`	( 31 ) , of which 5% -10 % may be caused by genetic factors ( 32 ) , up to half a million of these patients may be at risk of secondary hereditary neoplasms. The historic observation of twofold to fi vefold increased risks of cancers of the ovary, thyroid, and connective tissue after breast cancer ( 33 ) presaged the later syndromic association of these tumors with inherited mutations of BRCA1, BRCA2, PTEN, and p53 ( 16 ) . By far the largest cumulative risk of a secondary cancer in BRCA mutation carriers is associated with cancer in the contralateral breast, which may reach a risk of 29.5% at 10 years ( 34 ) . The Breast Cancer Linkage Consortium ( 35 , 36 ) also documented threefold to fi vefold increased risks of subsequent cancers of prostate, pancreas, gallbladder, stomach, skin (melanoma), and uterus in BRCA2 mutation carriers and twofold increased risks of prostate and pancreas cancer in BRCA1 mutation carriers; these results are based largely on self-reported family history inf...

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 8,
    "gather_across_devices": false
}

Evaluation Dataset

miriad-4.4_m-split

Dataset: miriad-4.4_m-split at 596b9ab
Size: 1,000 evaluation samples
Columns: question and passage_text
Approximate statistics based on the first 1000 samples:
question passage_text
type string string
details
min: 7 tokens
mean: 20.91 tokens
max: 61 tokens

min: 465 tokens
mean: 943.1 tokens
max: 1024 tokens

	question	passage_text
type	string	string
details	min: 7 tokens mean: 20.91 tokens max: 61 tokens	min: 465 tokens mean: 943.1 tokens max: 1024 tokens

Samples:

question	passage_text
`What are some hereditary cancer syndromes that can result in various forms of cancer?`	Hereditary Cancer Syndromes, including Hereditary Breast and Ovarian Cancer (HBOC) and Lynch Syndrome (LS), can result in various forms of cancer due to germline mutations in cancer predisposition genes. While the major contributory genes for these syndromes have been identified and well-studied (BRCA1/ BRCA2 for HBOC and MSH2/MSH6/MLH1/PMS2/ EPCAM for LS), there remains a large percentage of associated cancer cases that are negative for germline mutations in these genes, including 80% of women with a personal or family history of breast cancer who are negative for BRCA1/2 mutations [1] . Similarly, between 30 and 50% of families fulfill stringent criteria for LS and test negative for germline mismatch repair gene mutations [2] . Adding complexity to these disorders is the significant overlap in the spectrum of cancers observed between various hereditary cancer syndromes, including many cancer susceptibility syndromes. Some that contribute to elevated breast cancer risk include Li-Frau...
`How do MAK-4 and MAK-5 exert their antioxidant properties?`	Hybrid F1 mice were injected with urethane (300 mg/kg) at 8 days of age. A group was then put on a MAK-supplemented diet, another group was fed a standard pellet diet. At 36 weeks of age the mice were sacrificed and the livers examined for the presence of tumors mouse (Panel A) and for the number of nodules per mouse (Panel B) (* p < 0.05, ** P < 0.001). Statistical analysis was performed by Two Way ANOVA Test followed by Post Hoc Bonferroni analysis. We than measured the influence of the MAK-4+5 combination on the expression of the three liver-specific connexins (cx26, cx32, and cx43). The level of cx26 expression was similar in all the groups of mice treated with the MAK-supplemented diet and in the control (Figure 4, Panel A) . A significant, time-dependent increase in cx32 was observed in the liver of all the groups of MAK treated mice compared to the normal diet-fed controls. Cx32 expression increased 2-fold after 1 week of treatment, and 3-to 4-fold at 3 months (Figure 4, Pane...
`What are the primary indications for a decompressive craniectomy, and what role does neurocritical care play in determining the suitability of a patient for this procedure?`	Decompressive craniectomy is a valid neurosurgical strategy now a day as an alternative to control an elevated intracranial pressure (ICP) and controlling the risk of uncal and/or subfalcine herniation, in refractory cases to the postural, ventilator, and pharmacological measures to control it. The neurocritical care and the ICP monitorization are key determinants to identify and postulate the inclusion criteria to consider a patient as candidate to this procedure, as it is always considered a rescue surgical technique. Head trauma and ischemic or hemorrhagic cerebrovascular disease with progressive deterioration due to mass effect are some of the cases that may require a decompressive craniectomy with its different variants. However, this procedure per se can have complications described in the postcraniectomy syndrome and may occur in short, medium, or even long term. 1,2 The paradoxical herniation is a condition in which there is a deviation of the midline with mass effect, even t...

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 8,
    "gather_across_devices": false
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
fp16: True
prompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	miriad-eval-1kq-31kd_cosine_ndcg@10	miriad-test-1kq-31kd_cosine_ndcg@10
-1	-1	-	-	0.8474	0.8340
0.0256	20	0.1019	-	-	-
0.0512	40	0.0444	-	-	-
0.0767	60	0.0408	-	-	-
0.1023	80	0.0462	-	-	-
0.1279	100	0.0542	0.0525	0.8616	-
0.1535	120	0.0454	-	-	-
0.1790	140	0.0403	-	-	-
0.2046	160	0.0463	-	-	-
0.2302	180	0.0508	-	-	-
0.2558	200	0.0497	0.0449	0.8643	-
0.2813	220	0.0451	-	-	-
0.3069	240	0.0445	-	-	-
0.3325	260	0.0489	-	-	-
0.3581	280	0.0452	-	-	-
0.3836	300	0.0461	0.0406	0.8832	-
0.4092	320	0.0415	-	-	-
0.4348	340	0.04	-	-	-
0.4604	360	0.0399	-	-	-
0.4859	380	0.0423	-	-	-
0.5115	400	0.0352	0.0316	0.8823	-
0.5371	420	0.0408	-	-	-
0.5627	440	0.0356	-	-	-
0.5882	460	0.0371	-	-	-
0.6138	480	0.0276	-	-	-
0.6394	500	0.028	0.0280	0.8807	-
0.6650	520	0.0302	-	-	-
0.6905	540	0.0345	-	-	-
0.7161	560	0.0325	-	-	-
0.7417	580	0.033	-	-	-
0.7673	600	0.0314	0.0264	0.8910	-
0.7928	620	0.033	-	-	-
0.8184	640	0.029	-	-	-
0.8440	660	0.0396	-	-	-
0.8696	680	0.0266	-	-	-
0.8951	700	0.0262	0.0240	0.8968	-
0.9207	720	0.0262	-	-	-
0.9463	740	0.0327	-	-	-
0.9719	760	0.0293	-	-	-
0.9974	780	0.0304	-	-	-
-1	-1	-	-	0.9026	0.8862

Environmental Impact

Carbon emissions were measured using CodeCarbon.

Energy Consumed: 0.828 kWh
Carbon Emitted: 0.331 kg of CO2
Hours Used: 5.520 hours

Training Hardware

On Cloud: No
GPU Model: 1 x NVIDIA GeForce RTX 3090
CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
RAM Size: 31.78 GB

Framework Versions

Python: 3.11.6
Sentence Transformers: 5.2.0.dev0
Transformers: 4.56.0.dev0
PyTorch: 2.7.1+cu126
Accelerate: 1.6.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Downloads last month: 13,220

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for sentence-transformers/embeddinggemma-300m-medical

Base model

google/embeddinggemma-300m

Finetuned

(122)

this model

Quantizations

2 models

Dataset used to train sentence-transformers/embeddinggemma-300m-medical

Evaluation results

Cosine Accuracy@1 on miriad eval 1kq 31kd
self-reported

0.822
Cosine Accuracy@3 on miriad eval 1kq 31kd
self-reported

0.926
Cosine Accuracy@5 on miriad eval 1kq 31kd
self-reported

0.945
Cosine Accuracy@10 on miriad eval 1kq 31kd
self-reported

0.976
Cosine Precision@1 on miriad eval 1kq 31kd
self-reported

0.822
Cosine Precision@3 on miriad eval 1kq 31kd
self-reported

0.309
Cosine Precision@5 on miriad eval 1kq 31kd
self-reported

0.189
Cosine Precision@10 on miriad eval 1kq 31kd
self-reported

0.098
Cosine Recall@1 on miriad eval 1kq 31kd
self-reported

0.822
Cosine Recall@3 on miriad eval 1kq 31kd
self-reported

0.926

View on Papers With Code