CoCondenser trained on MS MARCO

This is a SPLADE Sparse Encoder model finetuned from Luyu/co-condenser-marco on the msmarco dataset using the sentence-transformers library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.

Model Details

Model Description

  • Model Type: SPLADE Sparse Encoder
  • Base model: Luyu/co-condenser-marco
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 30522 dimensions
  • Similarity Function: Dot Product
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SparseEncoder(
  (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM 
  (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SparseEncoder

# Download from the 🤗 Hub
model = SparseEncoder("tomaarsen/splade-cocondenser-msmarco-margin-mse-minilm-small")
# Run inference
queries = [
    "how much would dreamers cost the taxpayers",
]
documents = [
    'Plus, the CBO said the Dreamers would bring an additional 80,000 immigrants to the U.S., adding to the liability. In total, the immigrants and their families would cost taxpayers $26.8 billion, but only pay back $.9 billion in taxes, the CBO said. The analysis found that roughly 3.25 million undocumented immigrants are eligible for Dreamer status, while only 2 million would apply and only 1.6 million would be accepted over the next decade.',
    'Playing Chicken with the $18 Trillion U.S. Economy: The full cost of the last government shutdown two years ago was staggering â\x80\x93 it delivered a $24 billion blow to the U.S. economy and taxpayers. Now we may be about to repeat government shutdown history on Dec. 11.',
    'Sustain is defined as to support something or to endure a trial or hardship. 1  An example of sustain is for a foundation to support the house. 2  An example of sustain is to survive days without food or water.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[27.1439, 12.7876,  0.6402]])

Evaluation

Metrics

Sparse Information Retrieval

Metric NanoMSMARCO NanoNFCorpus NanoNQ
dot_accuracy@1 0.48 0.4 0.5
dot_accuracy@3 0.62 0.58 0.74
dot_accuracy@5 0.76 0.64 0.76
dot_accuracy@10 0.9 0.68 0.82
dot_precision@1 0.48 0.4 0.5
dot_precision@3 0.2067 0.3733 0.2533
dot_precision@5 0.152 0.328 0.156
dot_precision@10 0.09 0.274 0.09
dot_recall@1 0.48 0.0418 0.46
dot_recall@3 0.62 0.0962 0.69
dot_recall@5 0.76 0.1174 0.7
dot_recall@10 0.9 0.1424 0.79
dot_ndcg@10 0.6685 0.3411 0.6444
dot_mrr@10 0.5974 0.5052 0.617
dot_map@100 0.6011 0.1532 0.5955
query_active_dims 57.08 50.46 54.1
query_sparsity_ratio 0.9981 0.9983 0.9982
corpus_active_dims 187.3189 331.6617 211.6348
corpus_sparsity_ratio 0.9939 0.9891 0.9931

Sparse Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with SparseNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ]
    }
    
Metric Value
dot_accuracy@1 0.46
dot_accuracy@3 0.6467
dot_accuracy@5 0.72
dot_accuracy@10 0.8
dot_precision@1 0.46
dot_precision@3 0.2778
dot_precision@5 0.212
dot_precision@10 0.1513
dot_recall@1 0.3273
dot_recall@3 0.4687
dot_recall@5 0.5258
dot_recall@10 0.6108
dot_ndcg@10 0.5513
dot_mrr@10 0.5732
dot_map@100 0.4499
query_active_dims 53.88
query_sparsity_ratio 0.9982
corpus_active_dims 229.4242
corpus_sparsity_ratio 0.9925

Training Details

Training Dataset

msmarco

  • Dataset: msmarco at 9e329ed
  • Size: 90,000 training samples
  • Columns: score, query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    score query positive negative
    type float string string string
    details
    • min: -1.28
    • mean: 13.47
    • max: 22.27
    • min: 4 tokens
    • mean: 8.97 tokens
    • max: 21 tokens
    • min: 19 tokens
    • mean: 81.41 tokens
    • max: 220 tokens
    • min: 17 tokens
    • mean: 76.39 tokens
    • max: 195 tokens
  • Samples:
    score query positive negative
    13.85562777519226 what is a reflective journal? A reflective journal is a tool that students are encouraged to use to help them understand not just what they have learned while studying but also how they learned it by reflecting on the learning experience itself. The point is that this approach believes that literature can be used to illuminate some truth about something which is not literature. The difference between this approach and the didactic approach is that didactic approach considers author as a teacher, while the reflective approach considers him or her an observer.
    12.178914229075115 original footloose release Footloose (2011 film) Footloose is a 2011 American musical dance film directed by Craig Brewer. It is a remake of the 1984 film of the same name and stars Kenny Wormald, Julianne Hough, Andie MacDowell, and Dennis Quaid. The film follows a young man who moves from Boston to a small southern town and protests the town's ban against dancing. In the 2001 re-release of Thriller they added the second verse of the rap which was recorded but not included on the original here is the second verse by Vincent Price (I heard a 3rd was written but never recorded) The demons squeal in sheer delight. It's you they spy, so plump, so right.
    19.897210280100506 time of day blood pressure Day Time Blood Pressure. For most people, your body's blood pressure rises during the morning hours and reaches its highest point around midday. This is because your body is preset to increase its functions for anticipated daily activity. Your body reaches its lowest blood pressure at bedtime, between 8 p.m. and 2 a.m. labetalol is used alone or together with other medicines to treat high blood pressure hypertension high blood pressure adds to the workload of the heart and arteriesif it continues for a long time the heart and arteries may not function properlyabetalol is used alone or together with other medicines to treat high blood pressure hypertension high blood pressure adds to the workload of the heart and arteries
  • Loss: SpladeLoss with these parameters:
    {
        "loss": "SparseMarginMSELoss",
        "lambda_corpus": 0.08,
        "lambda_query": 0.1
    }
    

Evaluation Dataset

msmarco

  • Dataset: msmarco at 9e329ed
  • Size: 10,000 evaluation samples
  • Columns: score, query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    score query positive negative
    type float string string string
    details
    • min: -2.25
    • mean: 13.3
    • max: 22.52
    • min: 4 tokens
    • mean: 9.31 tokens
    • max: 40 tokens
    • min: 17 tokens
    • mean: 79.88 tokens
    • max: 227 tokens
    • min: 18 tokens
    • mean: 77.64 tokens
    • max: 250 tokens
  • Samples:
    score query positive negative
    11.338554302851358 victor cruz dance salsa in the super bowl The popular former Giant — who helped lead the team to a Super Bowl title in the 2011 season — ... Victor Cruz performed his first salsa dance in Chicago. The popular former Giant — who helped lead the team to a Super Bowl title in the 2011 season — caught a 2-yard touchdown pass from Mitch Trubisky in the Bears’ 24-17 preseason loss to the Broncos on Thursday night. Victor Cruz, Giants hammer out deal. Receiver Victor Cruz on Monday signed a six-year contract through the 2018 season with the New York Giants. The contract is worth $46 million and pays him $15.625 million fully guaranteed the first two seasons, a source said.
    18.167373975118 what is the phone number for roblox im calling roblox hq and the roblox number is 888 858 2569 or if u live in canada its 1888 858 2569 subscibe to us (waffleman514 and twitterelgo) and join our youtube group on our profile Category [edit] Create A New Place. This is where you define where your game will be published. 1 Go to Roblox.com and login. 2 Click My ROBLOX and then click Places. 3 Click Create Game Place. 4 Fill out the form. 5 Name is the name of the game.
    17.668365399042766 can you freeze cream soup With a modest investment in time and effort, you can make your own cream of mushroom soup and freeze it for later use. This leaves you firmly in control of the soup's ingredients and enables you to portion the soup in quantities that make sense for you. I purchased 1.5 lbs of 3 large boneless, skinless chicken breasts. I am cooking them in a crockpot with 1 can of cream of mushroom soup, 1 can cream of chicken soup, 1 can of water and some canned mushrooms...the chicken is all at the bottom. About how long will it take the chicken to cook completely if I set my... show more I purchased 1.5 lbs of 3 large boneless, skinless chicken breasts.
  • Loss: SpladeLoss with these parameters:
    {
        "loss": "SparseMarginMSELoss",
        "lambda_corpus": 0.08,
        "lambda_query": 0.1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_dot_ndcg@10 NanoNFCorpus_dot_ndcg@10 NanoNQ_dot_ndcg@10 NanoBEIR_mean_dot_ndcg@10
0.0178 100 795934.08 - - - - -
0.0356 200 13561.4538 - - - - -
0.0533 300 118.2925 - - - - -
0.0711 400 61.485 - - - - -
0.0889 500 44.6503 38.6276 0.5126 0.2701 0.5829 0.4552
0.1067 600 38.3666 - - - - -
0.1244 700 35.2046 - - - - -
0.1422 800 33.2246 - - - - -
0.16 900 31.5866 - - - - -
0.1778 1000 29.3914 38.9004 0.5849 0.3140 0.6009 0.4999
0.1956 1100 28.9009 - - - - -
0.2133 1200 29.5258 - - - - -
0.2311 1300 27.7958 - - - - -
0.2489 1400 27.0228 - - - - -
0.2667 1500 25.0953 22.5132 0.6090 0.3377 0.6166 0.5211
0.2844 1600 25.4396 - - - - -
0.3022 1700 22.53 - - - - -
0.32 1800 24.0084 - - - - -
0.3378 1900 23.5741 - - - - -
0.3556 2000 23.141 22.6775 0.6408 0.3560 0.5984 0.5317
0.3733 2100 22.0953 - - - - -
0.3911 2200 22.2789 - - - - -
0.4089 2300 20.9582 - - - - -
0.4267 2400 19.1969 - - - - -
0.4444 2500 21.047 28.3245 0.6209 0.3487 0.6260 0.5319
0.4622 2600 20.7531 - - - - -
0.48 2700 19.8115 - - - - -
0.4978 2800 18.6278 - - - - -
0.5156 2900 19.3731 - - - - -
0.5333 3000 18.4502 20.3191 0.6390 0.3506 0.6087 0.5328
0.5511 3100 18.4525 - - - - -
0.5689 3200 17.0456 - - - - -
0.5867 3300 17.256 - - - - -
0.6044 3400 17.6203 - - - - -
0.6222 3500 18.7721 17.7983 0.6685 0.3411 0.6444 0.5513
0.64 3600 16.7819 - - - - -
0.6578 3700 18.6132 - - - - -
0.6756 3800 15.5466 - - - - -
0.6933 3900 17.7706 - - - - -
0.7111 4000 16.6612 15.7565 0.6727 0.3519 0.6159 0.5468
0.7289 4100 16.4755 - - - - -
0.7467 4200 16.9832 - - - - -
0.7644 4300 14.9855 - - - - -
0.7822 4400 14.6835 - - - - -
0.8 4500 17.0725 18.0495 0.6652 0.3430 0.6423 0.5502
0.8178 4600 15.8136 - - - - -
0.8356 4700 15.6528 - - - - -
0.8533 4800 15.5791 - - - - -
0.8711 4900 15.1496 - - - - -
0.8889 5000 14.7461 16.4918 0.6373 0.3353 0.6403 0.5376
0.9067 5100 16.3102 - - - - -
0.9244 5200 14.5521 - - - - -
0.9422 5300 14.4375 - - - - -
0.96 5400 15.2282 - - - - -
0.9778 5500 14.4738 15.4439 0.6426 0.3385 0.6334 0.5382
0.9956 5600 14.6468 - - - - -
-1 -1 - - 0.6685 0.3411 0.6444 0.5513
  • The bold row denotes the saved checkpoint.

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.216 kWh
  • Carbon Emitted: 0.084 kg of CO2
  • Hours Used: 0.609 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 4.2.0.dev0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.1
  • Datasets: 2.21.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

SpladeLoss

@misc{formal2022distillationhardnegativesampling,
      title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
      author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
      year={2022},
      eprint={2205.04733},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2205.04733},
}

SparseMarginMSELoss

@misc{hofstätter2021improving,
    title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
    author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
    year={2021},
    eprint={2010.02666},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}

FlopsLoss

@article{paria2020minimizing,
    title={Minimizing flops to learn efficient sparse representations},
    author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
    journal={arXiv preprint arXiv:2004.05665},
    year={2020}
    }
Downloads last month
5
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/splade-cocondenser-msmarco-margin-mse-minilm-small

Finetuned
(5)
this model

Dataset used to train tomaarsen/splade-cocondenser-msmarco-margin-mse-minilm-small

Evaluation results