EmbeddingGemma-300m finetuned on the Medical Instruction and RetrIeval Dataset (MIRIAD)
This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the miriad/miriad-4.4M dataset (specifically the first 100.000 question-passage pairs from tomaarsen/miriad-4.4M-split). It maps sentences & documents to a 768-dimensional dense vector space and can be used for medical information retrieval, specifically designed for searching for passages (up to 1k tokens) of scientific medical papers using detailed medical questions.
This model has been trained using code from our EmbeddingGemma blogpost to showcase how the EmbeddingGemma model can be finetuned on specific domains/tasks for even stronger performance. It is not affiliated with Google.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: google/embeddinggemma-300m
- Maximum Sequence Length: 1024 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- miriad-4.4_m-split (the first 100.000 samples of the
default
subset)
- miriad-4.4_m-split (the first 100.000 samples of the
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(4): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence-transformers/embeddinggemma-300m-medical")
# Run inference
queries = [
"What are some potential limitations in projecting the future demand for joint replacement surgeries?\n",
]
documents = [
"We also asked whether current trends are advancing according to earlier expectations [6] .\n\n Our study has several limitations. Our projections are based on the historical growth trajectory of joint replacement surgeries, and do not take into account potential limitations in the availability of surgeons or limited economic resources by private and public payers and hospitals in the future. For example, a shortage in the number of surgeons will have a substantial influence on the actual number of procedures that are performed. We also have not incorporated the potential for future alternative technologies, such as cartilage regeneration or tissue engineering, or drug therapies that limit the progression of joint diseases, which may preempt the need for TJR. We were also unable to account for the potential impact of changes in economy, which may place additional economic burden on patients to pay substantial out-of pocket expenses for these procedures, depending on their insurance coverage. Our study also did not consider potential changes in healthcare policies, such as adoption of volume standards or regionalization of TJR to high volume centers [5] , which could limit the access to care and decrease the future demand. The above economic, policy, and scientific factors cannot be readily incorporated in the statistical model. Our study was also focused on the procedural trends in the U.S.; followup research may include an analysis of trends in other countries, though the availability of historical TJR trends in other countries may be limited. Nonetheless, these limitations in no way diminish the importance of conducting and regularly updating surgical projections to help guide future research, surgeon training, and public health policy decisions. Our study also incorporated a more conservative projection, which relied only on the future changes in population growth, while maintaining current rates of adoption of TJR. Despite these limitations, our current findings are expected to have implications in the private coverage and reimbursement of joint replacement procedures in the future, as patients less than 65 years of age are not typically covered by Medicare, which today funds the majority of total joint replacement procedures in the United States.\n\n We found the relative size of the young patient population for TJR has grown between 1993 and 2006. While 25% to 32% of primary or revision TJRs were performed in patients less than 65 years old in 1993, these proportions have increased to 40% to 46% in the most recent NIS data. The increasing trend in younger patients undergoing TJR has also been reported for different, but partly overlapping, historical periods. For example, Jain et al. reported that the proportion of primary TKA patients aged less than 60 years increased from 12.5% to 19.5% (+56%) between 1990-1993 and 1998-2000 [4] . In addition, for patients aged under 70 years, the proportion increased by 9% from 45.6% to 49.6%. Due to the difference in the stratification by age categories, we were unable to make a direct comparison with the data by Jain et al. [4] . However, our findings that the historical volume of TJR procedures in the younger patient population have been increasing is consistent with these previously reported trends.\n\n While we previously forecasted an increase in demand for primary hip and knee replacement in 2030 by 174% and 673% [6] , respectively, the current study underscores the contribution that young patients are expected to play in the Fig. 2A -B Historical incidence of primary total hip arthroplasty (A) and primary total knee arthroplasty (B) from 1993-2006, superimposed with previous projections [6] , and the updated projections from the current study. The dotted lines represent the 95% CI for the projections.\n\n future utilization of primary TJR surgery, if historical trends in prevalence continue into the future. The statistical modeling approach we have employed in the current and previous study fits a multivariate but linear Poisson regression model to the historical prevalence of TJR procedures. However, because the size of the population subgroups is free to change nonlinearly in the future based on the Census Bureau's projection, the actual projected incidence of surgical demand is therefore not constrained to be a linear function over time. The demand for primary hip and knee arthroplasty between 2004 and 2006 generally exceeded our previous projections, which employed an identical methodology. However, we are unable to judge, based on the limited window of new data for validation, whether a more complex modeling approach would provide a more reliable forecast of demand for surgical procedures.\n\n Our previous methodology provided a reasonable shortterm forecast of the demand for revision hip and knee surgeries between 2004 and 2006. In particular, for 2006, we observed a slight decrease in the estimated number of primary THA and TKA procedures compared to 2005 (Fig. 2 ), but this decrease fell within the uncertainty of the estimates.",
'23 In cases of splenic B-cell lymphomas that do not fulfill the World Health Organization 2008 criteria for better established or provisional entities, a diagnosis of splenic B-cell lymphoma/leukemia unclassifiable should be preferred.\n\n Differentiating SMZL from lymphoplasmacytic lymphoma (LPL) may be challenging, particularly on BM biopsy, because SMZL may show a monoclonal serum component and plasmacytic morphology, and both entities lack a distinct phenotype. LPL, which develops primarily in the spleen, homogeneously infiltrates the white pulp without MZ pattern and without monocytoid B cells. MYD88 L265P mutation, present in almost all cases of LPL and rare in SMZL, may be a useful diagnostic tool. 25 A further diagnostic pitfall may be represented by detection of a BM clonal infiltrate in cases of non-CLL monoclonal B lymphocytosis. 26 Finally, secondary splenic localization of EMZL presents a pattern that overlaps with that of SMZL, but clinical dissemination is crucial for differentiation. Splenic involvement virtually excludes a diagnosis of nodal MZL; apart from the differential expression of IRTA1, which is negative in SMZL, 11, 22 clinical correlation is critical for reaching a correct diagnosis when dealing with a BM biopsy.\n\n The cellular origin of SMZL is still debated, and its identification is essential to correctly classify this lymphoma and to elucidate its pathobiology. According to the World Health Organization classification, the postulated normal counterpart of SMZL is a B cell of unknown differentiation stage. 11 According to studies of Ig gene rearrangements, a derivation from antigen-experienced B cells has been postulated in the \n\n -, ,25% of cases; -/1, 25%-50% of cases; 1/-, 50%-75% of cases; 1, .75% of cases.\n\n FL, follicular lymphoma; NMZL, nodal marginal zone lymphoma; SDRPL, splenic diffuse red pulp lymphoma.\n\n *Sporadic cases reported.\n\n vast majority of SMZL. [27] [28] [29] Skewing of the Ig gene repertoire toward the use of the IGHV1-2*04 allele in SMZL suggests that they could derive from a progenitor population adapted in the spleen to particular antigenic challenges, although definitive answers on the issue of the cell of origin of SMZL will admittedly be provided only through multidisciplinary examination of the immune repertoire and transcriptome of normal B-cell populations of the spleen compartments.\n\n The contribution of antigen stimulation to SMZL pathogenesis is suggested by the highly restricted Ig gene repertoire, including stereotyped configuration of the B-cell receptor (BCR) in ;10% of cases 30 and selective usage of the Ig heavy chain variable IGHV1-2*04 allele in ;30%.\n\n 31 Although the epitope recognized by IGHV1-2*04-expressing BCR is unknown, the features of IGHV1-2*04 rearrangements, including minimal somatic mutations and the long complementarity-determining region 3 sequence with common motifs, suggest a possible selection of T-cell-independent MZ B cells by superantigens and thus a role of antigenic drive in lymphomagenesis.\n\n Cytogenetic and genetic lesions SMZL lacks recurrent chromosomal translocations, including translocations that are typical of other lymphoma types such as the t(14;18) translocation affecting BCL2 in follicular lymphoma, the t(11;14) translocation affecting CCND1 in MCL, and the t(11;18), t(14;18), and t(1;14) translocations affecting the BIRC3/MALT1, MALT1, and BCL10 genes, respectively, in EMZL. The lack of these abnormalities may help distinguish SMZL from pathologically mimicking tumors. Approximately 30% of SMZL show hemizygous 7q deletion, which is also frequently seen in splenic B-cell lymphoma/leukemia unclassifiable, but rarely in other lymphoma subtypes.\n\n 32,33 The gene(s) targeted by the 7q deletion remain obscure despite the combined investigation of genomic and transcriptomic profiles and mutation analysis of a number of candidate genes.\n\n Unbiased genomic studies have unraveled the typical coding genome of SMZL. [37] [38] [39] [40] [41] [42] [43] However, because of the limited number of SMZL genomes and/or exomes available so far, the full spectrum of lesions that contribute to the malignant transformation of SMZL remains unknown.',
'The chronic inflammation of rheumatoid arthritis mainly affects the synovial membranes of multiple joints and potentially involves vasculitis and pulmonary, ocular and cardiovascular systems. After the onset of the inflammation, the synovium changes dramatically (Edwards, 1998) . The synovial intima is filled with B-lymphocytes engaged in antibody production against unknown antigens (Bläß, Engel, & Burmester, 1999) . Infiltrations of plasma cells into the synovia are highly associated with inflammation of rheumatoid arthritis (Dong, Li, Liu, & Zhu, 2009; Reparon-Schuijt et al., 1998) . The resulting immune complexes activate macrophages and complement and drive a T-cell dependent antibody production in the synovial tissue. The immune complexes are mainly rheumatoid factors that are defined as auto-antibodies against Fc-fragments of IgG (Tighe & Carson, 2001) and occur in about 90% of rheumatoid arthritis patients (Dörner, Egerer, Feist, & Burmester, 2004) . Normally\n\n Rheumatoid arthritis is a desastrous progressive autoimmune disease for which no causative cure is available, simply because the eliciting antigens are unknown despite intesive research efforts. Most patients have also Rheumatiod factor activity where antibodies bind to their own structures within the constant region. Here we considered, wether mutations in the constant regions of immunoglobulins could represent the eliciting antigens.\n\n rheumatoid factors bind to an antibody-antigen complex and facilitate clearance by binding to Fcreceptors, fixation of complement and antigen processing by B-lymphocytes (Carson, Chen, & Kipps, 1991) . The rheumatoid factor binding site resides in CH 2 -CH 3 domain of Fc (Artandi, Calame, Morrison, & Bonagura, 1992; Bonagura et al., 1998; Sutton et al., 1998) . However, rheumatoid factors are also found in other conditions of B-cell hyperreactivity.\n\n The driving force for autoimmune diseases are self-reactive antibodies directed against "altered self" which can be modified proteins (Trouw, Huizinga, & Toes, 2013) . So far posttranslational modifications have been detected in citrullinated antigens that are highly specific for rheumatoid arthritis. Citrulline residues arise from arginine by peptidyl arginine deiminase. However, this posttranslational modification cannot fully explain the pathogenesis of rheumatoid arthritis (Klareskog, Amara, & Malmström, 2014) .\n\n Changes of IgG glycosylation in the IgG were also thought to be involved in rheumatoid arthritis (Parekh et al., 1985) , but recent studies showed that the glycosylation loci are not associated with rheumatoid arthritis (Yarwood et al., 2016) .\n\n Other modifications include oxidized IgG that are recognized by circulating lymphocytes leading to a proliferative response and secrete IL-2 (Grinnell, Yoshida, & Jasin, 2005) . IgG is also covalently cross linked by reactive oxygen and nitric oxide products secreted by inflammatory cells (Uesugi, Hayashi, & Jasin, 1998) .\n\n IgG has long been implicated in the pathogenesis of rheumatoid arthritis. When immune complexes from synovial fluids of patients with rheumatoid arthritis were analyzed for their constituents, mainly IgG and IgM antibodies were found (Male & Roitt, 1981) . They did not contain antibodies with rheumatoid factor specificity and a structural alteration of the IgG was considered as a cause for antigenicity (Carter, Makh, Ponsford, & Elson, 1989) . Sutton, Corper, Bonagura, and Taussig (2000) suggested that rheumatoid factors bind Fc-region and foreign antigen antigens simultaneously and the affinity is potentiated by somatic mutation. Indeed, Fc-binding antibodies from rheumatoid arthritis synovial fluids show imprints of an antigen-dependent process of somatic hypermutation and clonal selection in the variable regions of the L-and H-chains (Van Esch et al., 2003) . It is clear that the synovium of patients with rheumatoid arthritis is prone to mutations (Firestein, 2010) and several multi-evidence genes in genome wide studies have been identified (Whitaker et al., 2015) .',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.7784, -0.0542, 0.0875]])
Evaluation
Metrics
Information Retrieval
- Datasets:
miriad-eval-1kq-31kd
andmiriad-test-1kq-31kd
, i.e. 1k eval/test queries and 31k passages (of which 1k eval/test passages and 30k train passages) - Evaluated with
InformationRetrievalEvaluator
Metric | miriad-eval-1kq-31kd | miriad-test-1kq-31kd |
---|---|---|
cosine_accuracy@1 | 0.822 | 0.802 |
cosine_accuracy@3 | 0.926 | 0.907 |
cosine_accuracy@5 | 0.945 | 0.942 |
cosine_accuracy@10 | 0.976 | 0.963 |
cosine_precision@1 | 0.822 | 0.802 |
cosine_precision@3 | 0.3087 | 0.3023 |
cosine_precision@5 | 0.189 | 0.1884 |
cosine_precision@10 | 0.0976 | 0.0963 |
cosine_recall@1 | 0.822 | 0.802 |
cosine_recall@3 | 0.926 | 0.907 |
cosine_recall@5 | 0.945 | 0.942 |
cosine_recall@10 | 0.976 | 0.963 |
cosine_ndcg@10 | 0.9026 | 0.8862 |
cosine_mrr@10 | 0.8788 | 0.8611 |
cosine_map@100 | 0.8797 | 0.863 |
Training Details
Training Dataset
miriad-4.4_m-split
- Dataset: miriad-4.4_m-split at 596b9ab
- Size: 100,000 training samples
- Columns:
question
andpassage_text
- Approximate statistics based on the first 1000 samples:
question passage_text type string string details - min: 7 tokens
- mean: 20.79 tokens
- max: 60 tokens
- min: 481 tokens
- mean: 945.6 tokens
- max: 1024 tokens
- Samples:
question passage_text What factors may contribute to increased pulmonary conduit durability in patients who undergo the Ross operation compared to those with right ventricular outflow tract obstruction?
I n 1966, Ross and Somerville 1 reported the first use of an aortic homograft to establish right ventricle-to-pulmonary artery continuity in a patient with tetralogy of Fallot and pulmonary atresia. Since that time, pulmonary position homografts have been used in a variety of right-sided congenital heart lesions. Actuarial 5-year homograft survivals for cryopreserved homografts are reported to range between 55% and 94%, with the shortest durability noted in patients less than 2 years of age. 4 Pulmonary position homografts also are used to replace pulmonary autografts explanted to repair left-sided outflow disease (the Ross operation). Several factors may be likely to favor increased pulmonary conduit durability in Ross patients compared with those with right ventricular outflow tract obstruction, including later age at operation (allowing for larger homografts), more normal pulmonary artery architecture, absence of severe right ventricular hypertrophy, and more natural positioning of ...
How does MCAM expression in hMSC affect the growth and maintenance of hematopoietic progenitors?
After culture in a 3-dimensional hydrogel-based matrix, which constitutes hypoxic conditions, MCAM expression is lost. Concordantly, Tormin et al. demonstrated that MCAM is down-regulated under hypoxic conditions. 10 Furthermore, it was shown by others and our group that oxygen tension causes selective modification of hematopoietic cell and mesenchymal stromal cell interactions in co-culture systems as well as influence HSPC metabolism. [44] [45] [46] Thus, the observed differences between Sharma et al. and our data in HSPC supporting capacity of hMSC are likely due to the different culture conditions used. Further studies are required to clarify the influence of hypoxia in our model system. Altogether these findings provide further evidence for the importance of MCAM in supporting HSPC. Furthermore, previous reports have shown that MCAM is down-regulated in MSC after several passages as well as during aging and differentiation. 19, 47 Interestingly, MCAM overexpression in hMSC enhance...
What is the relationship between Fanconi anemia and breast and ovarian cancer susceptibility genes?
( 31 ) , of which 5% -10 % may be caused by genetic factors ( 32 ) , up to half a million of these patients may be at risk of secondary hereditary neoplasms. The historic observation of twofold to fi vefold increased risks of cancers of the ovary, thyroid, and connective tissue after breast cancer ( 33 ) presaged the later syndromic association of these tumors with inherited mutations of BRCA1, BRCA2, PTEN, and p53 ( 16 ) . By far the largest cumulative risk of a secondary cancer in BRCA mutation carriers is associated with cancer in the contralateral breast, which may reach a risk of 29.5% at 10 years ( 34 ) . The Breast Cancer Linkage Consortium ( 35 , 36 ) also documented threefold to fi vefold increased risks of subsequent cancers of prostate, pancreas, gallbladder, stomach, skin (melanoma), and uterus in BRCA2 mutation carriers and twofold increased risks of prostate and pancreas cancer in BRCA1 mutation carriers; these results are based largely on self-reported family history inf...
- Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 8, "gather_across_devices": false }
Evaluation Dataset
miriad-4.4_m-split
- Dataset: miriad-4.4_m-split at 596b9ab
- Size: 1,000 evaluation samples
- Columns:
question
andpassage_text
- Approximate statistics based on the first 1000 samples:
question passage_text type string string details - min: 7 tokens
- mean: 20.91 tokens
- max: 61 tokens
- min: 465 tokens
- mean: 943.1 tokens
- max: 1024 tokens
- Samples:
question passage_text What are some hereditary cancer syndromes that can result in various forms of cancer?
Hereditary Cancer Syndromes, including Hereditary Breast and Ovarian Cancer (HBOC) and Lynch Syndrome (LS), can result in various forms of cancer due to germline mutations in cancer predisposition genes. While the major contributory genes for these syndromes have been identified and well-studied (BRCA1/ BRCA2 for HBOC and MSH2/MSH6/MLH1/PMS2/ EPCAM for LS), there remains a large percentage of associated cancer cases that are negative for germline mutations in these genes, including 80% of women with a personal or family history of breast cancer who are negative for BRCA1/2 mutations [1] . Similarly, between 30 and 50% of families fulfill stringent criteria for LS and test negative for germline mismatch repair gene mutations [2] . Adding complexity to these disorders is the significant overlap in the spectrum of cancers observed between various hereditary cancer syndromes, including many cancer susceptibility syndromes. Some that contribute to elevated breast cancer risk include Li-Frau...
How do MAK-4 and MAK-5 exert their antioxidant properties?
Hybrid F1 mice were injected with urethane (300 mg/kg) at 8 days of age. A group was then put on a MAK-supplemented diet, another group was fed a standard pellet diet. At 36 weeks of age the mice were sacrificed and the livers examined for the presence of tumors mouse (Panel A) and for the number of nodules per mouse (Panel B) (* p < 0.05, ** P < 0.001). Statistical analysis was performed by Two Way ANOVA Test followed by Post Hoc Bonferroni analysis.
We than measured the influence of the MAK-4+5 combination on the expression of the three liver-specific connexins (cx26, cx32, and cx43). The level of cx26 expression was similar in all the groups of mice treated with the MAK-supplemented diet and in the control (Figure 4, Panel A) . A significant, time-dependent increase in cx32 was observed in the liver of all the groups of MAK treated mice compared to the normal diet-fed controls. Cx32 expression increased 2-fold after 1 week of treatment, and 3-to 4-fold at 3 months (Figure 4, Pane...What are the primary indications for a decompressive craniectomy, and what role does neurocritical care play in determining the suitability of a patient for this procedure?
Decompressive craniectomy is a valid neurosurgical strategy now a day as an alternative to control an elevated intracranial pressure (ICP) and controlling the risk of uncal and/or subfalcine herniation, in refractory cases to the postural, ventilator, and pharmacological measures to control it. The neurocritical care and the ICP monitorization are key determinants to identify and postulate the inclusion criteria to consider a patient as candidate to this procedure, as it is always considered a rescue surgical technique. Head trauma and ischemic or hemorrhagic cerebrovascular disease with progressive deterioration due to mass effect are some of the cases that may require a decompressive craniectomy with its different variants. However, this procedure per se can have complications described in the postcraniectomy syndrome and may occur in short, medium, or even long term.
1,2 The paradoxical herniation is a condition in which there is a deviation of the midline with mass effect, even t... - Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 8, "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 128per_device_eval_batch_size
: 128learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1fp16
: Trueprompts
: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 128per_device_eval_batch_size
: 128per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}batch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportionalrouter_mapping
: {}learning_rate_mapping
: {}
Training Logs
Epoch | Step | Training Loss | Validation Loss | miriad-eval-1kq-31kd_cosine_ndcg@10 | miriad-test-1kq-31kd_cosine_ndcg@10 |
---|---|---|---|---|---|
-1 | -1 | - | - | 0.8474 | 0.8340 |
0.0256 | 20 | 0.1019 | - | - | - |
0.0512 | 40 | 0.0444 | - | - | - |
0.0767 | 60 | 0.0408 | - | - | - |
0.1023 | 80 | 0.0462 | - | - | - |
0.1279 | 100 | 0.0542 | 0.0525 | 0.8616 | - |
0.1535 | 120 | 0.0454 | - | - | - |
0.1790 | 140 | 0.0403 | - | - | - |
0.2046 | 160 | 0.0463 | - | - | - |
0.2302 | 180 | 0.0508 | - | - | - |
0.2558 | 200 | 0.0497 | 0.0449 | 0.8643 | - |
0.2813 | 220 | 0.0451 | - | - | - |
0.3069 | 240 | 0.0445 | - | - | - |
0.3325 | 260 | 0.0489 | - | - | - |
0.3581 | 280 | 0.0452 | - | - | - |
0.3836 | 300 | 0.0461 | 0.0406 | 0.8832 | - |
0.4092 | 320 | 0.0415 | - | - | - |
0.4348 | 340 | 0.04 | - | - | - |
0.4604 | 360 | 0.0399 | - | - | - |
0.4859 | 380 | 0.0423 | - | - | - |
0.5115 | 400 | 0.0352 | 0.0316 | 0.8823 | - |
0.5371 | 420 | 0.0408 | - | - | - |
0.5627 | 440 | 0.0356 | - | - | - |
0.5882 | 460 | 0.0371 | - | - | - |
0.6138 | 480 | 0.0276 | - | - | - |
0.6394 | 500 | 0.028 | 0.0280 | 0.8807 | - |
0.6650 | 520 | 0.0302 | - | - | - |
0.6905 | 540 | 0.0345 | - | - | - |
0.7161 | 560 | 0.0325 | - | - | - |
0.7417 | 580 | 0.033 | - | - | - |
0.7673 | 600 | 0.0314 | 0.0264 | 0.8910 | - |
0.7928 | 620 | 0.033 | - | - | - |
0.8184 | 640 | 0.029 | - | - | - |
0.8440 | 660 | 0.0396 | - | - | - |
0.8696 | 680 | 0.0266 | - | - | - |
0.8951 | 700 | 0.0262 | 0.0240 | 0.8968 | - |
0.9207 | 720 | 0.0262 | - | - | - |
0.9463 | 740 | 0.0327 | - | - | - |
0.9719 | 760 | 0.0293 | - | - | - |
0.9974 | 780 | 0.0304 | - | - | - |
-1 | -1 | - | - | 0.9026 | 0.8862 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.828 kWh
- Carbon Emitted: 0.331 kg of CO2
- Hours Used: 5.520 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 5.2.0.dev0
- Transformers: 4.56.0.dev0
- PyTorch: 2.7.1+cu126
- Accelerate: 1.6.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 74
Model tree for sentence-transformers/embeddinggemma-300m-medical
Base model
google/embeddinggemma-300mDataset used to train sentence-transformers/embeddinggemma-300m-medical
Evaluation results
- Cosine Accuracy@1 on miriad eval 1kq 31kdself-reported0.822
- Cosine Accuracy@3 on miriad eval 1kq 31kdself-reported0.926
- Cosine Accuracy@5 on miriad eval 1kq 31kdself-reported0.945
- Cosine Accuracy@10 on miriad eval 1kq 31kdself-reported0.976
- Cosine Precision@1 on miriad eval 1kq 31kdself-reported0.822
- Cosine Precision@3 on miriad eval 1kq 31kdself-reported0.309
- Cosine Precision@5 on miriad eval 1kq 31kdself-reported0.189
- Cosine Precision@10 on miriad eval 1kq 31kdself-reported0.098
- Cosine Recall@1 on miriad eval 1kq 31kdself-reported0.822
- Cosine Recall@3 on miriad eval 1kq 31kdself-reported0.926