SPLADE-Tiny-MSMARCO
Collection
SPLADE sparse retrieval models based on BERT-Tiny (4M) and BERT-Mini (11M) distilled from a Cross-Encoder on the MSMARCO dataset
•
6 items
•
Updated
This is an index created with the splade-index library (version 0.1.2
)
You can install the splade-index
library with pip
:
pip install "splade-index==0.1.2"
# Include extra dependencies like stemmer
pip install "splade-index[full]==0.1.2"
# For huggingface hub usage
pip install huggingface_hub
You can use the following code to load this SPLADE index from Hugging Face hub:
from sentence_transformers import SparseEncoder
from splade_index import SPLADE
# Download the SPLADE model that was used to create the index from the HuggingFace Hub
model_id = "naver/splade-v3-distilbert" # the SPLADE model id
model = SparseEncoder(model_id)
repo_id = "rasyosef/natural_questions_108k_splade_index"
# Load a SPLADE index from the Hugging Face model hub
retriever = SPLADE.load_from_hub(repo_id, model=model)
This dataset was created using the following data:
Statistic | Value |
---|---|
Number of documents | 108593 |
Number of tokens | 20265694 |
Average tokens per document | 186.62 |