CoCondenser trained on MIRIAD question-passage tuples
This is a SPLADE Sparse Encoder model finetuned from Luyu/co-condenser-marco on the miriad-4.4_m-split dataset using the sentence-transformers library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
Model Details
Model Description
- Model Type: SPLADE Sparse Encoder
- Base model: Luyu/co-condenser-marco
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 30522 dimensions
- Similarity Function: Dot Product
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Sparse Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sparse Encoders on Hugging Face
Full Model Architecture
SparseEncoder(
(0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
(1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SparseEncoder
# Download from the 🤗 Hub
model = SparseEncoder("tomaarsen/splade-cocondenser-base-miriad-1e-5")
# Run inference
queries = [
"How have infection control measures been effective in preventing nosocomial transmission of TB?\n",
]
documents = [
'Henry M. Blumberg, MD In this issue of Infection Control and Hospital Epidemiology, a potpourri of tuberculosis (TB)-related articles are being published. 1-7 Tuberculosisrelated issues have been an important focus for the past decade for those in infection control and hospital epidemiology, especially in urban areas where the large majority of TB cases occur, 8 but also, because of federal regulations, for those in low-endemic areas or areas where no TB cases occur (approximately half of the counties in the United States).\n\n The resurgence of TB beginning in the mid1980s in the United States (in large part, due to failure and underfunding of the public health infrastructure and to the epidemic of human immunodeficiency virus [HIV] infection) and outbreaks of TB have highlighted the risk of nosocomial transmission of TB. 9,10 These outbreaks affected both healthcare workers (HCWs) and patients. The fact that outbreaks in New York and Miami, among others, involved multidrug-resistant (MDR) strains that were associated with high morbidity and mortality among HIV-infected individuals punctuated the importance of effective TB infection control measures. Commingling of patients with unsuspected TB and those who were quite immunosuppressed led to amplification of nosocomial transmission. A decade ago, few institutions were prepared for the changing epidemiology of TB.\n\n Several recent studies have demonstrated that infection control measures are effective in preventing nosocomial transmission of TB, 11-13 and two reports in this issue, from institutions in Kentucky 1 and New York, 2 provide additional data on decreases in HCW tuberculin skin-test (TST) conversions following implementation of TB infection control measures. In most studies, multiple interventions (administrative controls, environmental controls, and respiratory protection) were initiated at approximately the same time, making it more difficult to identify the most crucial aspect of the program. The importance of TB infection control measures in contributing to the decline in TB cases in the United States, as well as the reduction in the number of MDR-TB cases in New York City, often has been understated. Increased federal funding for TB control activities and expansion of directly observed therapy clearly are important in efforts to prevent TB, but the initial decline in TB cases and in MDR TB in the United States beginning in 1993 likely was due, in large part, to interruption of TB transmission within healthcare facilities. Unfortunately, increased funding for TB control in the United States in the last 5 years often has not trickled down to inner-city hospitals, which frequently are the first line in the battle against TB.\n\n From our experience and that of others, it appears clear that administrative controls are the most important component of a TB infection control program. At Grady Memorial Hospital in Atlanta, we were able to decrease TB exposure episodes markedly and concomitantly to decrease HCW TST conversions after implementing an expanded respiratory isolation policy. 11 We continue to isolate appropriately approximately 95% of those subsequently diagnosed with TB. We were able to reduce TST conver-sion rates markedly during a period of time in which we had isolation rooms that would be considered suboptimal by Centers for Disease Control and Prevention (CDC) guidelines 14 (rooms that were under negative pressure but had less than six air changes per hour) and were using submicron masks. Implementation of better-engineered isolation rooms (>12 air changes per hour) with the completion of renovations to the hospital may have put us in better compliance with regulatory agencies and made the staff feel more secure, but has had little impact on further reducing low rates of HCW TST conversions. In addition, the termination of outbreaks and reduction of TST conversion rates at several institutions took place before introduction of National Institute for Occupational Safety and Health-approved masks and fit testing. 2,15,16 United States healthcare institutions are required by regulatory mandates to develop a "respiratory protection program" (including fit testing), which can be time-consuming, expensive, and logistically difficult. 17 Data published to date suggest that the impact of formal fit testing on proper mask use is small. 18 These federal mandates also have turned some well-meaning (trying to comply fully with the Occupational Safety and Health Administration [OSHA] regulations) but misguided infection control practitioners into "facial hair police." These types of processes divert time, effort, and resources away from what truly is effective in preventing nosocomial transmission of TB, as well as from other important infection control activities such as preventing nosocomial bloodstream infections or transmission of highly resistant pathogens such as vancomycin-resistant Enterococcus or preparing for the onslaught of vancomycin-resistant Staphylococcus aureus. At a time when US healthcare institutions are under enormous pressure due to healthcare reform, market forces, and managed care, it is essential that federal regulatory agencies look carefully at scientific data when issuing regulations.',
'Drug Reaction with Eosinophilia and Systemic Symptoms (DRESS) syndrome is a severe and potentially life-threatening hypersensitivity reaction caused by exposure to certain medications (Phillips et al., 2011; Bocquet et al., 1996) . It is extremely heterogeneous in its manifestation but has characteristic delayed-onset cutaneous and multisystem features with a protracted natural history. The reaction typically starts with a fever, followed by widespread skin eruption of variable nature. This progresses to inflammation of internal organs such as hepatitis, pneumonitis, myocarditis and nephritis, and haematological abnormalities including eosinophilia and atypical lymphocytosis (Kardaun et al., 2013; Cho et al., 2017) .\n\n DRESS syndrome is most commonly classified according to the international scoring system developed by the RegiSCAR group (Kardaun et al., 2013) . RegiSCAR accurately defines the syndrome by considering the major manifestations, with each feature scored between −1 and 2, and 9 being the maximum total number of points. According to this classification, a score of < 2 means no case, 2-3 means possible case, 4-5 means probable case, and 6 or above means definite DRESS syndrome. Table 1 gives an overview of the RegiSCAR scoring system. DRESS syndrome usually develops 2 to 6 weeks after exposure to the causative drug, with resolution of symptoms after drug withdrawal in the majority of cases (Husain et al., 2013a) . Some patients require supportive treatment with corticosteroids, although there is a lack of evidence surrounding the most effective dose, route and duration of the therapy (Adwan, 2017) . Although extremely rare, with an estimated population risk of between 1 and 10 in 10,000 drug exposures, it is significant due to its high mortality rate, at around 10% (Tas and The pathogenesis of DRESS syndrome remains largely unknown. Current evidence suggests that patients may be genetically predisposed to this form of hypersensitivity, with a superimposed risk resulting from Human Herpes Virus (HHV) exposure and subsequent immune reactivation (Cho et al., 2017; Husain et al., 2013a) . In fact, the serological detection of HHV-6 has even been proposed as an additional diagnostic marker for DRESS syndrome (Shiohara et al., 2007) . Other potential risk factors identified are family history (Sullivan and Shear, 2001; Pereira De Silva et al., 2011) and concomitant drug use, particularly antibiotics . DRESS syndrome appears to occur in patients of any age, with patient demographics from several reviews finding age ranges between 6 and 89 years (Picard et al., 2010; Kano et al., 2015; Cacoub et al., 2013) . DRESS syndrome was first described as an adverse reaction to antiepileptic therapy, but has since been recognised as a complication of an extremely wide range of medications (Adwan, 2017) . In rheumatology, it has been classically associated with allopurinol and sulfasalazine, but has also been documented in association with many other drugs including leflunomide, hydroxychloroquine, febuxostat and NSAIDs (Adwan, 2017) . Recent evidence has also identified a significant risk of DRESS syndrome with strontium ranelate use (Cacoub et al., 2013) . Thus far, that is the only anti-osteoporotic drug associated with DRESS syndrome, although there are various cases of other adverse cutaneous reactions linked to anti-osteoporotic medications, ranging from benign maculopapular eruption to Stevens-Johnson syndrome (SJS) and Toxic Epidermal Necrolysis (TEN) . Denosumab, an antiresorptive RANK ligand (RANKL) inhibitor licensed for osteoporosis, is currently known to be associated with some dermatological manifestations including dermatitis, eczema, pruritus and, less commonly, cellulitis (Prolia, n.d.).\n\n We hereby describe the first documented case of DRESS syndrome associated with denosumab treatment.\n\n The patient is a 76-year old female with osteoporosis and a background of alcoholic fatty liver disease and lower limb venous insufficiency. Osteoporosis was first diagnosed in 2003 and treated with risedronate, calcium and vitamin D, until 2006. While on this treatment, the patient sustained T12 and L3 fractures, the latter treated with kyphoplasty, and was therefore deemed a non-responder to risedronate.',
"The regulation of these events is known to go awry in certain pathologies especially in diseases associated with neurodegeneration. Mitochondrial fission helps to enhance the number of mitochondria, which can be efficiently distributed to each corner of neuronal cells and thus helps them to maintain their energy demands. Mitochondrial fission is highly essential during the periods of energy starvation to produce new, efficient mitochondrial energy generating systems. However, enhanced fission associated with bioenergetic crisis causes BAX foci formation on mitochondrial membrane and thus causes mitochondrial outer membrane permeabilization (MOMP), releasing cytochrome c and other pro apoptotic mediators into cytosol, results in apoptosis [93] . Impairment in the mitochondrial dynamics has also been observed in case of inflammatory neuropathies and oxaliplatin induced neuropathy [94] . Excessive nitric oxide is known to cause s-nitrosylation of dynamin related protein-1 (Drp-1), and increases the mitochondrial fission [95, 96] . Tumor necrosis factor-α (TNF-α) reported to inhibit the kinensin 1 protein, and thus impairs trafficking by halting mitochondrial movement along axons [97] . In addition to impaired dynamics, aggregates of abnormal shaped, damaged mitochondria are responsible for aberrant mitochondrial trafficking, which contributes to axonal degeneration observed in various peripheral neuropathies [81] .\n\n Autophagy is the discerning cellular catabolic process responsible for recycling the damaged proteins/ organelles in the cells [98] . Mitophagy is a selective autophagic process involved in recycling of damaged mitochondria and helps in supplying the constituents for mitochondrial biogenesis [99] . Excessive accumulation and impaired clearance of dysfunctional mitochondria are known to be observed in various disorders associated with oxidative stress [100] . Oxidative damage to Atg 4, a key component involved in mitophagy causes impaired autophagosome formation and clearance of damaged mitochondria [101] . Loss in the function of molecular chaperons and associated accumulation of damaged proteins are known to be involved in various peripheral neuropathies including trauma induced neuropathy [102, 103] . A model of demyelinating neuropathy corresponds to the accumulation of improperly folded myelin protein PMP-22 is also being observed recently [104, 105] .\n\n Mitochondrial dysfunction and associated disturbances are well connected to neuroinflammatory changes that occur in various neurodegenerative diseases [106] . Dysfunctional mitochondria are also implicated in several pathologies such as cardiovascular and neurodegenerative diseases. Several mitochondrial toxins have been found to inhibit the respiration in microglial cells and also inhibit IL-4 induced alternative anti inflammatory response and thus potentiates neuroinflammation [107] . Mitochondrial ROS are well identified to be involved in several inflammatory pathways such as NF-κB, MAPK activation [108] . Similarly, the pro inflammatory mediators released as a result of an inflammatory episode found to be interfere with the functioning of the mitochondrial electron transport chain and thus compromise ATP production [109] . TNF-α is known to inhibit the complex I, IV of ETC and decreases energy production. Nitric oxide (NO) is a potent inhibitor of cytochrome c oxidase (complex IV) and similarly IL-6 is also known to enhance mitochondrial generation of superoxide [110] . Mitochondrial dysfunction initiates inflammation by increased formation of complexes of damaged mitochondrial parts and cytoplasmic pattern recognition receptors (PRR's). The resulting inflammasome directed activation of interleukin-1β production, which starts an immune response and leads to Fig. (4) . Mitotoxicity in peripheral neuropathies: Various pathophysiological insults like hyperglycemic, chemotherapeutic and traumatic injury to the peripheral nerves results in mitochondrial dysfunction through enhanced generation of ROS induced biomolecular damage and bioenergetic crisis. Following the nerve injury accumulation of mitochondria occurs resulting in the release of mtDNA & formyl peptides into circulation which acts as Death associated molecular patterns (DAMP's). These are recognized by immune cells as foreign bodies and can elicit a local immune/inflammatory response. Interaction between inflammatory mediators and structural proteins involved in mitochondrial trafficking will cause impairment in mitochondrial motility. Oxidative stress induced damage to the mt proteins like Atg4, Parkin etc cause insufficient mitophagy. Excess nitrosative stress also results in excessive mt fission associated with apoptosis. In addition, mtDNA damage impairs its transcription and reduces mitochondrial biogenesis. Ca 2+ dyshomeostasis, loss in mitochondrial potential and bioenergetic crisis cause neuronal death via apoptosis/necrosis. All these modifications cause defects in ultra structure, physiology and trafficking of mitochondria resulting in loss of neuronal function producing peripheral neuropathy.",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 29.6354, 3.0391, 0.0017]])
Evaluation
Metrics
Sparse Information Retrieval
- Datasets:
miriad_eval
andmiriad_test
- Evaluated with
SparseInformationRetrievalEvaluator
Metric | miriad_eval | miriad_test |
---|---|---|
dot_accuracy@1 | 0.7433 | 0.7391 |
dot_accuracy@3 | 0.8552 | 0.8547 |
dot_accuracy@5 | 0.891 | 0.8869 |
dot_accuracy@10 | 0.9226 | 0.9194 |
dot_precision@1 | 0.7433 | 0.7391 |
dot_precision@3 | 0.2851 | 0.2849 |
dot_precision@5 | 0.1782 | 0.1774 |
dot_precision@10 | 0.0923 | 0.0919 |
dot_recall@1 | 0.7433 | 0.7391 |
dot_recall@3 | 0.8552 | 0.8547 |
dot_recall@5 | 0.891 | 0.8869 |
dot_recall@10 | 0.9226 | 0.9194 |
dot_ndcg@10 | 0.8348 | 0.8315 |
dot_mrr@10 | 0.8065 | 0.8031 |
dot_map@100 | 0.809 | 0.8058 |
query_active_dims | 25.3797 | 25.4315 |
query_sparsity_ratio | 0.9992 | 0.9992 |
corpus_active_dims | 135.7358 | 136.0033 |
corpus_sparsity_ratio | 0.9956 | 0.9955 |
Training Details
Training Dataset
miriad-4.4_m-split
- Dataset: miriad-4.4_m-split at 596b9ab
- Size: 100,000 training samples
- Columns:
question
andpassage_text
- Approximate statistics based on the first 1000 samples:
question passage_text type string string details - min: 9 tokens
- mean: 23.38 tokens
- max: 71 tokens
- min: 511 tokens
- mean: 512.0 tokens
- max: 512 tokens
- Samples:
question passage_text What factors may contribute to increased pulmonary conduit durability in patients who undergo the Ross operation compared to those with right ventricular outflow tract obstruction?
I n 1966, Ross and Somerville 1 reported the first use of an aortic homograft to establish right ventricle-to-pulmonary artery continuity in a patient with tetralogy of Fallot and pulmonary atresia. Since that time, pulmonary position homografts have been used in a variety of right-sided congenital heart lesions. Actuarial 5-year homograft survivals for cryopreserved homografts are reported to range between 55% and 94%, with the shortest durability noted in patients less than 2 years of age. 4 Pulmonary position homografts also are used to replace pulmonary autografts explanted to repair left-sided outflow disease (the Ross operation). Several factors may be likely to favor increased pulmonary conduit durability in Ross patients compared with those with right ventricular outflow tract obstruction, including later age at operation (allowing for larger homografts), more normal pulmonary artery architecture, absence of severe right ventricular hypertrophy, and more natural positioning of ...
How does MCAM expression in hMSC affect the growth and maintenance of hematopoietic progenitors?
After culture in a 3-dimensional hydrogel-based matrix, which constitutes hypoxic conditions, MCAM expression is lost. Concordantly, Tormin et al. demonstrated that MCAM is down-regulated under hypoxic conditions. 10 Furthermore, it was shown by others and our group that oxygen tension causes selective modification of hematopoietic cell and mesenchymal stromal cell interactions in co-culture systems as well as influence HSPC metabolism. [44] [45] [46] Thus, the observed differences between Sharma et al. and our data in HSPC supporting capacity of hMSC are likely due to the different culture conditions used. Further studies are required to clarify the influence of hypoxia in our model system. Altogether these findings provide further evidence for the importance of MCAM in supporting HSPC. Furthermore, previous reports have shown that MCAM is down-regulated in MSC after several passages as well as during aging and differentiation. 19, 47 Interestingly, MCAM overexpression in hMSC enhance...
What is the relationship between Fanconi anemia and breast and ovarian cancer susceptibility genes?
( 31 ) , of which 5% -10 % may be caused by genetic factors ( 32 ) , up to half a million of these patients may be at risk of secondary hereditary neoplasms. The historic observation of twofold to fi vefold increased risks of cancers of the ovary, thyroid, and connective tissue after breast cancer ( 33 ) presaged the later syndromic association of these tumors with inherited mutations of BRCA1, BRCA2, PTEN, and p53 ( 16 ) . By far the largest cumulative risk of a secondary cancer in BRCA mutation carriers is associated with cancer in the contralateral breast, which may reach a risk of 29.5% at 10 years ( 34 ) . The Breast Cancer Linkage Consortium ( 35 , 36 ) also documented threefold to fi vefold increased risks of subsequent cancers of prostate, pancreas, gallbladder, stomach, skin (melanoma), and uterus in BRCA2 mutation carriers and twofold increased risks of prostate and pancreas cancer in BRCA1 mutation carriers; these results are based largely on self-reported family history inf...
- Loss:
SpladeLoss
with these parameters:{ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')", "lambda_corpus": 1e-05, "lambda_query": 5e-05 }
Evaluation Dataset
miriad-4.4_m-split
- Dataset: miriad-4.4_m-split at 596b9ab
- Size: 1,000 evaluation samples
- Columns:
question
andpassage_text
- Approximate statistics based on the first 1000 samples:
question passage_text type string string details - min: 8 tokens
- mean: 23.55 tokens
- max: 74 tokens
- min: 512 tokens
- mean: 512.0 tokens
- max: 512 tokens
- Samples:
question passage_text What are some hereditary cancer syndromes that can result in various forms of cancer?
Hereditary Cancer Syndromes, including Hereditary Breast and Ovarian Cancer (HBOC) and Lynch Syndrome (LS), can result in various forms of cancer due to germline mutations in cancer predisposition genes. While the major contributory genes for these syndromes have been identified and well-studied (BRCA1/ BRCA2 for HBOC and MSH2/MSH6/MLH1/PMS2/ EPCAM for LS), there remains a large percentage of associated cancer cases that are negative for germline mutations in these genes, including 80% of women with a personal or family history of breast cancer who are negative for BRCA1/2 mutations [1] . Similarly, between 30 and 50% of families fulfill stringent criteria for LS and test negative for germline mismatch repair gene mutations [2] . Adding complexity to these disorders is the significant overlap in the spectrum of cancers observed between various hereditary cancer syndromes, including many cancer susceptibility syndromes. Some that contribute to elevated breast cancer risk include Li-Frau...
How do MAK-4 and MAK-5 exert their antioxidant properties?
Hybrid F1 mice were injected with urethane (300 mg/kg) at 8 days of age. A group was then put on a MAK-supplemented diet, another group was fed a standard pellet diet. At 36 weeks of age the mice were sacrificed and the livers examined for the presence of tumors mouse (Panel A) and for the number of nodules per mouse (Panel B) (* p < 0.05, ** P < 0.001). Statistical analysis was performed by Two Way ANOVA Test followed by Post Hoc Bonferroni analysis.
We than measured the influence of the MAK-4+5 combination on the expression of the three liver-specific connexins (cx26, cx32, and cx43). The level of cx26 expression was similar in all the groups of mice treated with the MAK-supplemented diet and in the control (Figure 4, Panel A) . A significant, time-dependent increase in cx32 was observed in the liver of all the groups of MAK treated mice compared to the normal diet-fed controls. Cx32 expression increased 2-fold after 1 week of treatment, and 3-to 4-fold at 3 months (Figure 4, Pane...What are the primary indications for a decompressive craniectomy, and what role does neurocritical care play in determining the suitability of a patient for this procedure?
Decompressive craniectomy is a valid neurosurgical strategy now a day as an alternative to control an elevated intracranial pressure (ICP) and controlling the risk of uncal and/or subfalcine herniation, in refractory cases to the postural, ventilator, and pharmacological measures to control it. The neurocritical care and the ICP monitorization are key determinants to identify and postulate the inclusion criteria to consider a patient as candidate to this procedure, as it is always considered a rescue surgical technique. Head trauma and ischemic or hemorrhagic cerebrovascular disease with progressive deterioration due to mass effect are some of the cases that may require a decompressive craniectomy with its different variants. However, this procedure per se can have complications described in the postcraniectomy syndrome and may occur in short, medium, or even long term.
1,2 The paradoxical herniation is a condition in which there is a deviation of the midline with mass effect, even t... - Loss:
SpladeLoss
with these parameters:{ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')", "lambda_corpus": 1e-05, "lambda_query": 5e-05 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 4per_device_eval_batch_size
: 4learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 4per_device_eval_batch_size
: 4per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportionalrouter_mapping
: {}learning_rate_mapping
: {}
Training Logs
Epoch | Step | Training Loss | Validation Loss | miriad_eval_dot_ndcg@10 | miriad_test_dot_ndcg@10 |
---|---|---|---|---|---|
0.032 | 800 | 63.8384 | - | - | - |
0.064 | 1600 | 0.254 | - | - | - |
0.096 | 2400 | 0.1055 | - | - | - |
0.128 | 3200 | 0.0646 | - | - | - |
0.16 | 4000 | 0.0716 | 0.0498 | 0.7428 | - |
0.192 | 4800 | 0.0608 | - | - | - |
0.224 | 5600 | 0.0626 | - | - | - |
0.256 | 6400 | 0.0489 | - | - | - |
0.288 | 7200 | 0.0531 | - | - | - |
0.32 | 8000 | 0.0403 | 0.0193 | 0.7973 | - |
0.352 | 8800 | 0.0376 | - | - | - |
0.384 | 9600 | 0.0292 | - | - | - |
0.416 | 10400 | 0.0258 | - | - | - |
0.448 | 11200 | 0.0437 | - | - | - |
0.48 | 12000 | 0.0492 | 0.0306 | 0.7863 | - |
0.512 | 12800 | 0.0359 | - | - | - |
0.544 | 13600 | 0.0285 | - | - | - |
0.576 | 14400 | 0.0301 | - | - | - |
0.608 | 15200 | 0.0384 | - | - | - |
0.64 | 16000 | 0.0232 | 0.0237 | 0.7872 | - |
0.672 | 16800 | 0.042 | - | - | - |
0.704 | 17600 | 0.0316 | - | - | - |
0.736 | 18400 | 0.0183 | - | - | - |
0.768 | 19200 | 0.0363 | - | - | - |
0.8 | 20000 | 0.031 | 0.0293 | 0.8209 | - |
0.832 | 20800 | 0.0237 | - | - | - |
0.864 | 21600 | 0.025 | - | - | - |
0.896 | 22400 | 0.0177 | - | - | - |
0.928 | 23200 | 0.0161 | - | - | - |
0.96 | 24000 | 0.0171 | 0.0221 | 0.8341 | - |
0.992 | 24800 | 0.0263 | - | - | - |
-1 | -1 | - | - | 0.8348 | 0.8315 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.417 kWh
- Carbon Emitted: 0.162 kg of CO2
- Hours Used: 1.391 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 4.2.0.dev0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.1
- Datasets: 2.21.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
SpladeLoss
@misc{formal2022distillationhardnegativesampling,
title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
year={2022},
eprint={2205.04733},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2205.04733},
}
SparseMultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
FlopsLoss
@article{paria2020minimizing,
title={Minimizing flops to learn efficient sparse representations},
author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
journal={arXiv preprint arXiv:2004.05665},
year={2020}
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for tomaarsen/splade-cocondenser-base-miriad-1e-5
Base model
Luyu/co-condenser-marcoDataset used to train tomaarsen/splade-cocondenser-base-miriad-1e-5
Evaluation results
- Dot Accuracy@1 on miriad evalself-reported0.743
- Dot Accuracy@3 on miriad evalself-reported0.855
- Dot Accuracy@5 on miriad evalself-reported0.891
- Dot Accuracy@10 on miriad evalself-reported0.923
- Dot Precision@1 on miriad evalself-reported0.743
- Dot Precision@3 on miriad evalself-reported0.285
- Dot Precision@5 on miriad evalself-reported0.178
- Dot Precision@10 on miriad evalself-reported0.092
- Dot Recall@1 on miriad evalself-reported0.743
- Dot Recall@3 on miriad evalself-reported0.855