view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 By tomaarsen and 1 other • 9 days ago • 85
Korean Embedding Models Collection A collection of embedding models that are optimized for understanding and representing Korean text. • 4 items • Updated 30 days ago • 1
view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings By manu and 1 other • Jun 2 • 24
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 17
💜 Kotlin ML Pack Collection A collection of datasets, fine-tuned models and benchmarks to train your models for perfect Kotlin code generation. • 9 items • Updated Jun 11, 2024 • 23
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Paper • 2503.00865 • Published Mar 2 • 66
GemmaX2 Collection GemmaX2 language models, including pretrained and instruction-tuned models of 2 sizes, including 2B, 9B. • 7 items • Updated Feb 7 • 22
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 405
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 181
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT Paper • 2402.07440 • Published Feb 12, 2024 • 1
DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection Paper • 2406.00856 • Published Jun 2, 2024 • 12
NeMo Curator - Classifier Models Collection Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 11 items • Updated 2 days ago • 18
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published May 30, 2024 • 37