CoCondenser trained on MS MARCO
This is a SPLADE Sparse Encoder model finetuned from Luyu/co-condenser-marco on the msmarco dataset using the sentence-transformers library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
Model Details
Model Description
- Model Type: SPLADE Sparse Encoder
- Base model: Luyu/co-condenser-marco
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 30522 dimensions
- Similarity Function: Dot Product
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Sparse Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sparse Encoders on Hugging Face
Full Model Architecture
SparseEncoder(
(0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
(1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SparseEncoder
# Download from the 🤗 Hub
model = SparseEncoder("tomaarsen/splade-cocondenser-msmarco-margin-mse-minilm-small")
# Run inference
queries = [
"how much would dreamers cost the taxpayers",
]
documents = [
'Plus, the CBO said the Dreamers would bring an additional 80,000 immigrants to the U.S., adding to the liability. In total, the immigrants and their families would cost taxpayers $26.8 billion, but only pay back $.9 billion in taxes, the CBO said. The analysis found that roughly 3.25 million undocumented immigrants are eligible for Dreamer status, while only 2 million would apply and only 1.6 million would be accepted over the next decade.',
'Playing Chicken with the $18 Trillion U.S. Economy: The full cost of the last government shutdown two years ago was staggering â\x80\x93 it delivered a $24 billion blow to the U.S. economy and taxpayers. Now we may be about to repeat government shutdown history on Dec. 11.',
'Sustain is defined as to support something or to endure a trial or hardship. 1 An example of sustain is for a foundation to support the house. 2 An example of sustain is to survive days without food or water.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[27.1439, 12.7876, 0.6402]])
Evaluation
Metrics
Sparse Information Retrieval
- Datasets:
NanoMSMARCO
,NanoNFCorpus
andNanoNQ
- Evaluated with
SparseInformationRetrievalEvaluator
Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
---|---|---|---|
dot_accuracy@1 | 0.48 | 0.4 | 0.5 |
dot_accuracy@3 | 0.62 | 0.58 | 0.74 |
dot_accuracy@5 | 0.76 | 0.64 | 0.76 |
dot_accuracy@10 | 0.9 | 0.68 | 0.82 |
dot_precision@1 | 0.48 | 0.4 | 0.5 |
dot_precision@3 | 0.2067 | 0.3733 | 0.2533 |
dot_precision@5 | 0.152 | 0.328 | 0.156 |
dot_precision@10 | 0.09 | 0.274 | 0.09 |
dot_recall@1 | 0.48 | 0.0418 | 0.46 |
dot_recall@3 | 0.62 | 0.0962 | 0.69 |
dot_recall@5 | 0.76 | 0.1174 | 0.7 |
dot_recall@10 | 0.9 | 0.1424 | 0.79 |
dot_ndcg@10 | 0.6685 | 0.3411 | 0.6444 |
dot_mrr@10 | 0.5974 | 0.5052 | 0.617 |
dot_map@100 | 0.6011 | 0.1532 | 0.5955 |
query_active_dims | 57.08 | 50.46 | 54.1 |
query_sparsity_ratio | 0.9981 | 0.9983 | 0.9982 |
corpus_active_dims | 187.3189 | 331.6617 | 211.6348 |
corpus_sparsity_ratio | 0.9939 | 0.9891 | 0.9931 |
Sparse Nano BEIR
- Dataset:
NanoBEIR_mean
- Evaluated with
SparseNanoBEIREvaluator
with these parameters:{ "dataset_names": [ "msmarco", "nfcorpus", "nq" ] }
Metric | Value |
---|---|
dot_accuracy@1 | 0.46 |
dot_accuracy@3 | 0.6467 |
dot_accuracy@5 | 0.72 |
dot_accuracy@10 | 0.8 |
dot_precision@1 | 0.46 |
dot_precision@3 | 0.2778 |
dot_precision@5 | 0.212 |
dot_precision@10 | 0.1513 |
dot_recall@1 | 0.3273 |
dot_recall@3 | 0.4687 |
dot_recall@5 | 0.5258 |
dot_recall@10 | 0.6108 |
dot_ndcg@10 | 0.5513 |
dot_mrr@10 | 0.5732 |
dot_map@100 | 0.4499 |
query_active_dims | 53.88 |
query_sparsity_ratio | 0.9982 |
corpus_active_dims | 229.4242 |
corpus_sparsity_ratio | 0.9925 |
Training Details
Training Dataset
msmarco
- Dataset: msmarco at 9e329ed
- Size: 90,000 training samples
- Columns:
score
,query
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
score query positive negative type float string string string details - min: -1.28
- mean: 13.47
- max: 22.27
- min: 4 tokens
- mean: 8.97 tokens
- max: 21 tokens
- min: 19 tokens
- mean: 81.41 tokens
- max: 220 tokens
- min: 17 tokens
- mean: 76.39 tokens
- max: 195 tokens
- Samples:
score query positive negative 13.85562777519226
what is a reflective journal?
A reflective journal is a tool that students are encouraged to use to help them understand not just what they have learned while studying but also how they learned it by reflecting on the learning experience itself.
The point is that this approach believes that literature can be used to illuminate some truth about something which is not literature. The difference between this approach and the didactic approach is that didactic approach considers author as a teacher, while the reflective approach considers him or her an observer.
12.178914229075115
original footloose release
Footloose (2011 film) Footloose is a 2011 American musical dance film directed by Craig Brewer. It is a remake of the 1984 film of the same name and stars Kenny Wormald, Julianne Hough, Andie MacDowell, and Dennis Quaid. The film follows a young man who moves from Boston to a small southern town and protests the town's ban against dancing.
In the 2001 re-release of Thriller they added the second verse of the rap which was recorded but not included on the original here is the second verse by Vincent Price (I heard a 3rd was written but never recorded) The demons squeal in sheer delight. It's you they spy, so plump, so right.
19.897210280100506
time of day blood pressure
Day Time Blood Pressure. For most people, your body's blood pressure rises during the morning hours and reaches its highest point around midday. This is because your body is preset to increase its functions for anticipated daily activity. Your body reaches its lowest blood pressure at bedtime, between 8 p.m. and 2 a.m.
labetalol is used alone or together with other medicines to treat high blood pressure hypertension high blood pressure adds to the workload of the heart and arteriesif it continues for a long time the heart and arteries may not function properlyabetalol is used alone or together with other medicines to treat high blood pressure hypertension high blood pressure adds to the workload of the heart and arteries
- Loss:
SpladeLoss
with these parameters:{ "loss": "SparseMarginMSELoss", "lambda_corpus": 0.08, "lambda_query": 0.1 }
Evaluation Dataset
msmarco
- Dataset: msmarco at 9e329ed
- Size: 10,000 evaluation samples
- Columns:
score
,query
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
score query positive negative type float string string string details - min: -2.25
- mean: 13.3
- max: 22.52
- min: 4 tokens
- mean: 9.31 tokens
- max: 40 tokens
- min: 17 tokens
- mean: 79.88 tokens
- max: 227 tokens
- min: 18 tokens
- mean: 77.64 tokens
- max: 250 tokens
- Samples:
score query positive negative 11.338554302851358
victor cruz dance salsa in the super bowl
The popular former Giant â who helped lead the team to a Super Bowl title in the 2011 season â ... Victor Cruz performed his first salsa dance in Chicago. The popular former Giant â who helped lead the team to a Super Bowl title in the 2011 season â caught a 2-yard touchdown pass from Mitch Trubisky in the Bearsâ 24-17 preseason loss to the Broncos on Thursday night.
Victor Cruz, Giants hammer out deal. Receiver Victor Cruz on Monday signed a six-year contract through the 2018 season with the New York Giants. The contract is worth $46 million and pays him $15.625 million fully guaranteed the first two seasons, a source said.
18.167373975118
what is the phone number for roblox
im calling roblox hq and the roblox number is 888 858 2569 or if u live in canada its 1888 858 2569 subscibe to us (waffleman514 and twitterelgo) and join our youtube group on our profile Category
[edit] Create A New Place. This is where you define where your game will be published. 1 Go to Roblox.com and login. 2 Click My ROBLOX and then click Places. 3 Click Create Game Place. 4 Fill out the form. 5 Name is the name of the game.
17.668365399042766
can you freeze cream soup
With a modest investment in time and effort, you can make your own cream of mushroom soup and freeze it for later use. This leaves you firmly in control of the soup's ingredients and enables you to portion the soup in quantities that make sense for you.
I purchased 1.5 lbs of 3 large boneless, skinless chicken breasts. I am cooking them in a crockpot with 1 can of cream of mushroom soup, 1 can cream of chicken soup, 1 can of water and some canned mushrooms...the chicken is all at the bottom. About how long will it take the chicken to cook completely if I set my... show more I purchased 1.5 lbs of 3 large boneless, skinless chicken breasts.
- Loss:
SpladeLoss
with these parameters:{ "loss": "SparseMarginMSELoss", "lambda_corpus": 0.08, "lambda_query": 0.1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1fp16
: Trueload_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportionalrouter_mapping
: {}learning_rate_mapping
: {}
Training Logs
Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
---|---|---|---|---|---|---|---|
0.0178 | 100 | 795934.08 | - | - | - | - | - |
0.0356 | 200 | 13561.4538 | - | - | - | - | - |
0.0533 | 300 | 118.2925 | - | - | - | - | - |
0.0711 | 400 | 61.485 | - | - | - | - | - |
0.0889 | 500 | 44.6503 | 38.6276 | 0.5126 | 0.2701 | 0.5829 | 0.4552 |
0.1067 | 600 | 38.3666 | - | - | - | - | - |
0.1244 | 700 | 35.2046 | - | - | - | - | - |
0.1422 | 800 | 33.2246 | - | - | - | - | - |
0.16 | 900 | 31.5866 | - | - | - | - | - |
0.1778 | 1000 | 29.3914 | 38.9004 | 0.5849 | 0.3140 | 0.6009 | 0.4999 |
0.1956 | 1100 | 28.9009 | - | - | - | - | - |
0.2133 | 1200 | 29.5258 | - | - | - | - | - |
0.2311 | 1300 | 27.7958 | - | - | - | - | - |
0.2489 | 1400 | 27.0228 | - | - | - | - | - |
0.2667 | 1500 | 25.0953 | 22.5132 | 0.6090 | 0.3377 | 0.6166 | 0.5211 |
0.2844 | 1600 | 25.4396 | - | - | - | - | - |
0.3022 | 1700 | 22.53 | - | - | - | - | - |
0.32 | 1800 | 24.0084 | - | - | - | - | - |
0.3378 | 1900 | 23.5741 | - | - | - | - | - |
0.3556 | 2000 | 23.141 | 22.6775 | 0.6408 | 0.3560 | 0.5984 | 0.5317 |
0.3733 | 2100 | 22.0953 | - | - | - | - | - |
0.3911 | 2200 | 22.2789 | - | - | - | - | - |
0.4089 | 2300 | 20.9582 | - | - | - | - | - |
0.4267 | 2400 | 19.1969 | - | - | - | - | - |
0.4444 | 2500 | 21.047 | 28.3245 | 0.6209 | 0.3487 | 0.6260 | 0.5319 |
0.4622 | 2600 | 20.7531 | - | - | - | - | - |
0.48 | 2700 | 19.8115 | - | - | - | - | - |
0.4978 | 2800 | 18.6278 | - | - | - | - | - |
0.5156 | 2900 | 19.3731 | - | - | - | - | - |
0.5333 | 3000 | 18.4502 | 20.3191 | 0.6390 | 0.3506 | 0.6087 | 0.5328 |
0.5511 | 3100 | 18.4525 | - | - | - | - | - |
0.5689 | 3200 | 17.0456 | - | - | - | - | - |
0.5867 | 3300 | 17.256 | - | - | - | - | - |
0.6044 | 3400 | 17.6203 | - | - | - | - | - |
0.6222 | 3500 | 18.7721 | 17.7983 | 0.6685 | 0.3411 | 0.6444 | 0.5513 |
0.64 | 3600 | 16.7819 | - | - | - | - | - |
0.6578 | 3700 | 18.6132 | - | - | - | - | - |
0.6756 | 3800 | 15.5466 | - | - | - | - | - |
0.6933 | 3900 | 17.7706 | - | - | - | - | - |
0.7111 | 4000 | 16.6612 | 15.7565 | 0.6727 | 0.3519 | 0.6159 | 0.5468 |
0.7289 | 4100 | 16.4755 | - | - | - | - | - |
0.7467 | 4200 | 16.9832 | - | - | - | - | - |
0.7644 | 4300 | 14.9855 | - | - | - | - | - |
0.7822 | 4400 | 14.6835 | - | - | - | - | - |
0.8 | 4500 | 17.0725 | 18.0495 | 0.6652 | 0.3430 | 0.6423 | 0.5502 |
0.8178 | 4600 | 15.8136 | - | - | - | - | - |
0.8356 | 4700 | 15.6528 | - | - | - | - | - |
0.8533 | 4800 | 15.5791 | - | - | - | - | - |
0.8711 | 4900 | 15.1496 | - | - | - | - | - |
0.8889 | 5000 | 14.7461 | 16.4918 | 0.6373 | 0.3353 | 0.6403 | 0.5376 |
0.9067 | 5100 | 16.3102 | - | - | - | - | - |
0.9244 | 5200 | 14.5521 | - | - | - | - | - |
0.9422 | 5300 | 14.4375 | - | - | - | - | - |
0.96 | 5400 | 15.2282 | - | - | - | - | - |
0.9778 | 5500 | 14.4738 | 15.4439 | 0.6426 | 0.3385 | 0.6334 | 0.5382 |
0.9956 | 5600 | 14.6468 | - | - | - | - | - |
-1 | -1 | - | - | 0.6685 | 0.3411 | 0.6444 | 0.5513 |
- The bold row denotes the saved checkpoint.
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.216 kWh
- Carbon Emitted: 0.084 kg of CO2
- Hours Used: 0.609 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 4.2.0.dev0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.1
- Datasets: 2.21.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
SpladeLoss
@misc{formal2022distillationhardnegativesampling,
title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
year={2022},
eprint={2205.04733},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2205.04733},
}
SparseMarginMSELoss
@misc{hofstätter2021improving,
title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
year={2021},
eprint={2010.02666},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
FlopsLoss
@article{paria2020minimizing,
title={Minimizing flops to learn efficient sparse representations},
author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
journal={arXiv preprint arXiv:2004.05665},
year={2020}
}
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for tomaarsen/splade-cocondenser-msmarco-margin-mse-minilm-small
Base model
Luyu/co-condenser-marcoDataset used to train tomaarsen/splade-cocondenser-msmarco-margin-mse-minilm-small
Evaluation results
- Dot Accuracy@1 on NanoMSMARCOself-reported0.480
- Dot Accuracy@3 on NanoMSMARCOself-reported0.620
- Dot Accuracy@5 on NanoMSMARCOself-reported0.760
- Dot Accuracy@10 on NanoMSMARCOself-reported0.900
- Dot Precision@1 on NanoMSMARCOself-reported0.480
- Dot Precision@3 on NanoMSMARCOself-reported0.207
- Dot Precision@5 on NanoMSMARCOself-reported0.152
- Dot Precision@10 on NanoMSMARCOself-reported0.090
- Dot Recall@1 on NanoMSMARCOself-reported0.480
- Dot Recall@3 on NanoMSMARCOself-reported0.620