Post
233
‼️Sentence Transformers v5.0 is out! The biggest update yet introduces Sparse Embedding models, encode methods improvements, Router module & much more. Sparse + Dense = 🔥 hybrid search performance!
1️⃣ Sparse Encoder Models - New support for sparse embeddings (30k+ dims, <1% non-zero)
* Full SPLADE, Inference-free SPLADE, CSR support
* 4 new modules, 12 losses, 9 evaluators
* Integration with elastic, opensearch-project, Qdrant, ibm-granite
* Decode interpretable embeddings
* Hybrid search integration
2️⃣ Enhanced Encode Methods
* encode_query & encode_document with auto prompts
* Direct device list passing to encode()
* Cleaner multi-processing
3️⃣ Router Module & Training
* Different paths for queries vs documents
* Custom learning rates per parameter group
* Composite loss logging
* Perfect for two-tower architectures
4️⃣ Documentation & Training
* New Training/Loss Overview docs
* 6 training example pages
* Search engine integration examples
Read the comprehensive blogpost about training sparse embedding models: https://huggingface.co/blog/train-sparse-encoder
See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v5.0.0
What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!
I'm incredibly excited to see the community explore sparse embeddings and hybrid search! The interpretability alone makes this a game-changer for understanding what your models are actually doing.
🙏 Thanks to @tomaarsen for this incredible opportunity!
1️⃣ Sparse Encoder Models - New support for sparse embeddings (30k+ dims, <1% non-zero)
* Full SPLADE, Inference-free SPLADE, CSR support
* 4 new modules, 12 losses, 9 evaluators
* Integration with elastic, opensearch-project, Qdrant, ibm-granite
* Decode interpretable embeddings
* Hybrid search integration
2️⃣ Enhanced Encode Methods
* encode_query & encode_document with auto prompts
* Direct device list passing to encode()
* Cleaner multi-processing
3️⃣ Router Module & Training
* Different paths for queries vs documents
* Custom learning rates per parameter group
* Composite loss logging
* Perfect for two-tower architectures
4️⃣ Documentation & Training
* New Training/Loss Overview docs
* 6 training example pages
* Search engine integration examples
Read the comprehensive blogpost about training sparse embedding models: https://huggingface.co/blog/train-sparse-encoder
See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v5.0.0
What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!
I'm incredibly excited to see the community explore sparse embeddings and hybrid search! The interpretability alone makes this a game-changer for understanding what your models are actually doing.
🙏 Thanks to @tomaarsen for this incredible opportunity!