CoCondenser trained on Natural-Questions tuples

This is a SPLADE Sparse Encoder model finetuned from Luyu/co-condenser-marco on the msmarco dataset using the sentence-transformers library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.

Model Details

Model Description

  • Model Type: SPLADE Sparse Encoder
  • Base model: Luyu/co-condenser-marco
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 30522 dimensions
  • Similarity Function: Dot Product
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SparseEncoder(
  (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM 
  (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SparseEncoder

# Download from the 🤗 Hub
model = SparseEncoder("tomaarsen/splade-cocondenser-msmarco-margin-mse")
# Run inference
queries = [
    "when did shanghai disneyland open",
]
documents = [
    "Shanghai Disney officially opens: A peek inside. June 17, 2016, 6 p.m. After five years of construction, $5.5 billion in spending and a month of testing to work out the kinks, Shanghai Disney Resort opened to the public just before noon, Shanghai time, on Thursday, June 16 (which was 9 p.m. Wednesday in Anaheim, home of the original Disney park). Shanghai Disneyland features six themed areas, and the resort contains two hotels, a shopping district and 99 acres of gardens, lakes and parkland. We'll keep you updated throughout the week with new details and peeks inside the resort.",
    'Map of the Old City of Shanghai. By the early 1400s, Shanghai had become important enough for Ming dynasty engineers to begin dredging the Huangpu River (also known as Shen). In 1553, a city wall was built around the Old Town (Nanshi) as a defense against the depredations of the Wokou (Japanese pirates).',
    'The conflict is referred to in China as the War of Resistance against Japanese Aggression (1937-45) and the Anti-Fascist War. Japanâ\x80\x99s expansionist policy of the 1930s, driven by the military, was to set up what it called the Greater East Asia Co-Prosperity Sphere. Marco Polo Bridge, Beijing.A sphere.e are marking the anniversary of Germany and Japanâ\x80\x99s surrender in 1945, but it is legitimate to suggest that the incident that sparked the conflict that became WWII occurred not in Poland in 1939 but in China, near this eleven-arched bridge on the outskirts of Beijing, in July 1937. Letâ\x80\x99s look at the undisputed facts.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[31.8057, 19.5344, 12.4372]])

Evaluation

Metrics

Sparse Information Retrieval

Metric NanoMSMARCO NanoNFCorpus NanoNQ
dot_accuracy@1 0.46 0.38 0.5
dot_accuracy@3 0.64 0.58 0.76
dot_accuracy@5 0.72 0.62 0.8
dot_accuracy@10 0.82 0.74 0.88
dot_precision@1 0.46 0.38 0.5
dot_precision@3 0.2133 0.36 0.26
dot_precision@5 0.144 0.316 0.168
dot_precision@10 0.082 0.27 0.096
dot_recall@1 0.46 0.0397 0.48
dot_recall@3 0.64 0.0752 0.71
dot_recall@5 0.72 0.0936 0.75
dot_recall@10 0.82 0.1467 0.85
dot_ndcg@10 0.6289 0.3304 0.6772
dot_mrr@10 0.5689 0.4958 0.6329
dot_map@100 0.5779 0.1478 0.6167
query_active_dims 56.1 53.68 55.94
query_sparsity_ratio 0.9982 0.9982 0.9982
corpus_active_dims 192.4087 367.5432 228.8362
corpus_sparsity_ratio 0.9937 0.988 0.9925

Sparse Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with SparseNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ]
    }
    
Metric Value
dot_accuracy@1 0.4467
dot_accuracy@3 0.66
dot_accuracy@5 0.7133
dot_accuracy@10 0.8133
dot_precision@1 0.4467
dot_precision@3 0.2778
dot_precision@5 0.2093
dot_precision@10 0.1493
dot_recall@1 0.3266
dot_recall@3 0.4751
dot_recall@5 0.5212
dot_recall@10 0.6056
dot_ndcg@10 0.5455
dot_mrr@10 0.5658
dot_map@100 0.4475
query_active_dims 55.24
query_sparsity_ratio 0.9982
corpus_active_dims 246.1716
corpus_sparsity_ratio 0.9919

Training Details

Training Dataset

msmarco

  • Dataset: msmarco at 9e329ed
  • Size: 90,000 training samples
  • Columns: score, query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    score query positive negative
    type float string string string
    details
    • min: -2.22
    • mean: 13.59
    • max: 22.53
    • min: 4 tokens
    • mean: 9.05 tokens
    • max: 40 tokens
    • min: 19 tokens
    • mean: 81.18 tokens
    • max: 203 tokens
    • min: 15 tokens
    • mean: 77.08 tokens
    • max: 249 tokens
  • Samples:
    score query positive negative
    4.470368494590124 where does the bile duct carry its secretions The function of the common bile duct is to carry bile from the liver and the gallbladder into the duodenum, the top of the small intestine directly after the stomach. The bile it carries interacts with ingested fats and fat-soluble vitamins to enable them to be absorbed by the intestine. The gall bladder is a pouch-shaped organ that stores the bile produced by the liver. The gall bladder shares a vessel, called the common bile duct, with the liver. When bile is needed, it moves through the common bile duct into the first part of the small intestine, the duodenum. It is here that the bile breaks down fat.
    9.550037781397503 definition of reverse auction Reverse auction. A reverse auction is a type of auction in which the roles of buyer and seller are reversed. In an ordinary auction (also known as a 'forward auction'), buyers compete to obtain goods or services by offering increasingly higher prices. In a reverse auction, the sellers compete to obtain business from the buyer and prices will typically decrease as the sellers underbid each other. No-reserve auction. A No-reserve auction (NR), also known as an absolute auction, is an auction in which the item for sale will be sold regardless of price. From the seller's perspective, advertising an auction as having no reserve price can be desirable because it potentially attracts a greater number of bidders due to the possibility of a bargain.
    19.58259622255961 how do i prevent diverticulitis Follow Following Unfollow Pending Disabled. A , Gastroenterology, answered. The suggestion to prevent diverticulitis is to eat a diet high in fiber, and that includes high-fiber whole grains, fruits, vegetables, nuts, and seeds. I’m aware that some gastroenterologists say to avoid all seeds and nuts, so some of you are nuts enough to wash tomato seeds from slices and pick free poppy seeds from buns. The test is fast and easy especially with the newer CT scanners. But does it provide the information needed? CT KUBs are used to screen for a variety of intra-abdominal conditions, including appendicitis, kidney stones, diverticulitis, and others.
  • Loss: SpladeLoss with these parameters:
    {
        "loss": "SparseMarginMSELoss",
        "lambda_corpus": 0.08,
        "lambda_query": 0.1
    }
    

Evaluation Dataset

msmarco

  • Dataset: msmarco at 9e329ed
  • Size: 10,000 evaluation samples
  • Columns: score, query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    score query positive negative
    type float string string string
    details
    • min: -1.34
    • mean: 13.49
    • max: 22.2
    • min: 4 tokens
    • mean: 8.85 tokens
    • max: 27 tokens
    • min: 14 tokens
    • mean: 80.48 tokens
    • max: 211 tokens
    • min: 20 tokens
    • mean: 77.44 tokens
    • max: 209 tokens
  • Samples:
    score query positive negative
    15.64028427998225 what is a protected seedbed A seedbed is a plot of garden set aside to grow vegetables seeds, which can later be transplanted. seedbed is a plot of garden set aside to grow vegetables seeds, which can later be transplanted. Several articles within the Confederate States’ Constitution specifically protected slavery within the Confederacy, but some articles of the U.S. Constitution also protected slavery—the Emancipation Proclamation drew a clearer distinction between the two.
    6.375148057937622 who founded ecuador The first Spanish settlement in Ecuador was established in 1534 at Quito on the site of an important Incan town of the same name. Another settlement was established four years later near the river Guayas in Guayaquil. Zuleta is a colonial working farm of 4,000 acres (2,000 hectares) that belongs to the family of Mr. Galo Plaza lasso, a former president of Ecuador, for more than 100 years. It was chosen as one of the world’s “Top Ten Finds” by Outside magazine and named as one of the best Ecuador Hotel by National Geographic Traveler.
    8.436618288358051 what is aol problem AOL problems. Lots of people are reporting ongoing (RTR:GE) messages from AOL today. This indicates the AOL mail servers are having problems and can’t accept mail. This has nothing to do with spam, filtering or malicious email. This is simply their servers aren’t functioning as well as they should be and so AOL can’t accept all the mail thrown at them. These types of blocks resolve themselves. Update Feb 8, 2016: AOL users are having problems logging in. Executive Director. I have read these complaints of poor service and agree 110%. I'm a college professor and give extra credit to all AOL users and over the 100% highest grade. I thought I phoned AOL and get some chap in India who is a proven scam man and I'm the poor American SOB who gets whacked.
  • Loss: SpladeLoss with these parameters:
    {
        "loss": "SparseMarginMSELoss",
        "lambda_corpus": 0.08,
        "lambda_query": 0.1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_dot_ndcg@10 NanoNFCorpus_dot_ndcg@10 NanoNQ_dot_ndcg@10 NanoBEIR_mean_dot_ndcg@10
0.0178 100 805201.68 - - - - -
0.0356 200 11999.3975 - - - - -
0.0533 300 124.0031 - - - - -
0.0711 400 62.6813 - - - - -
0.0889 500 46.0329 49.7658 0.4890 0.2543 0.5131 0.4188
0.1067 600 41.2877 - - - - -
0.1244 700 35.3636 - - - - -
0.1422 800 33.3727 - - - - -
0.16 900 29.389 - - - - -
0.1778 1000 31.2482 28.1527 0.5652 0.2875 0.5423 0.4650
0.1956 1100 31.43 - - - - -
0.2133 1200 27.9919 - - - - -
0.2311 1300 26.9214 - - - - -
0.2489 1400 27.5533 - - - - -
0.2667 1500 25.7473 26.8466 0.5837 0.3265 0.6268 0.5123
0.2844 1600 26.7899 - - - - -
0.3022 1700 24.0652 - - - - -
0.32 1800 23.5837 - - - - -
0.3378 1900 24.1051 - - - - -
0.3556 2000 24.6901 22.0851 0.6018 0.3325 0.6359 0.5234
0.3733 2100 21.5136 - - - - -
0.3911 2200 22.066 - - - - -
0.4089 2300 20.8234 - - - - -
0.4267 2400 20.1988 - - - - -
0.4444 2500 20.0342 20.3437 0.5901 0.3222 0.6010 0.5044
0.4622 2600 18.8835 - - - - -
0.48 2700 19.4797 - - - - -
0.4978 2800 19.6199 - - - - -
0.5156 2900 16.6963 - - - - -
0.5333 3000 19.9204 18.0851 0.5915 0.3111 0.6323 0.5116
0.5511 3100 18.7849 - - - - -
0.5689 3200 18.3169 - - - - -
0.5867 3300 17.1938 - - - - -
0.6044 3400 18.0807 - - - - -
0.6222 3500 16.7721 20.1195 0.6012 0.3119 0.6337 0.5156
0.64 3600 16.7909 - - - - -
0.6578 3700 16.4954 - - - - -
0.6756 3800 16.3734 - - - - -
0.6933 3900 17.2231 - - - - -
0.7111 4000 16.8486 17.5785 0.6228 0.3423 0.6553 0.5401
0.7289 4100 18.2939 - - - - -
0.7467 4200 16.1108 - - - - -
0.7644 4300 16.878 - - - - -
0.7822 4400 15.6163 - - - - -
0.8 4500 15.8337 16.1847 0.6286 0.3376 0.6639 0.5434
0.8178 4600 15.5014 - - - - -
0.8356 4700 15.7579 - - - - -
0.8533 4800 15.9361 - - - - -
0.8711 4900 16.3308 - - - - -
0.8889 5000 14.8395 17.4054 0.6221 0.3280 0.6853 0.5451
0.9067 5100 14.8655 - - - - -
0.9244 5200 14.6498 - - - - -
0.9422 5300 15.5189 - - - - -
0.96 5400 14.608 - - - - -
0.9778 5500 15.6019 16.4883 0.6298 0.3317 0.6831 0.5482
0.9956 5600 14.6263 - - - - -
-1 -1 - - 0.6289 0.3304 0.6772 0.5455

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.218 kWh
  • Carbon Emitted: 0.085 kg of CO2
  • Hours Used: 0.618 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 4.2.0.dev0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.1
  • Datasets: 2.21.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

SpladeLoss

@misc{formal2022distillationhardnegativesampling,
      title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
      author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
      year={2022},
      eprint={2205.04733},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2205.04733},
}

SparseMarginMSELoss

@misc{hofstätter2021improving,
    title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
    author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
    year={2021},
    eprint={2010.02666},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}

FlopsLoss

@article{paria2020minimizing,
    title={Minimizing flops to learn efficient sparse representations},
    author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
    journal={arXiv preprint arXiv:2004.05665},
    year={2020}
    }
Downloads last month
3
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/splade-cocondenser-msmarco-margin-mse-small

Finetuned
(5)
this model

Dataset used to train tomaarsen/splade-cocondenser-msmarco-margin-mse-small

Evaluation results