Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published 4 days ago • 40
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 67 items • Updated Jul 3, 2024 • 95
view article Article Train 400x faster Static Embedding Models with Sentence Transformers 3 days ago • 102
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model Paper • 2501.01028 • Published 16 days ago • 11
view article Article Recipe: Preparing Multilingual Speech Datasets for TTS Training By PHBJT • Nov 4, 2024 • 15
view article Article Deploying Language Models on Azure Kubernetes: A Complete Beginner's Guide By vpkprasanna • Nov 11, 2024 • 2
view article Article Unlocking the Power of Reasoning: Introducing CriticalThinker-LLaMA-3.1-8B-GGUF and Its Groundbreaking Dataset By theeseus-ai • 22 days ago • 1
view article Article **Fine-tune SmolLM's on custom synthetic data** By prithivMLmods • 13 days ago • 16
view article Article Fine-tune a SmolLM on domain-specific synthetic data from a LLM By davidberenstein1957 • 15 days ago • 30
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching Paper • 2311.11284 • Published Nov 19, 2023 • 17
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 30 days ago • 124
Structured 3D Latents for Scalable and Versatile 3D Generation Paper • 2412.01506 • Published Dec 2, 2024 • 53
view article Article Welcome FalconMamba: The first strong attention-free 7B model Aug 12, 2024 • 108
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models Paper • 2407.03181 • Published Jul 3, 2024 • 1
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. • 39 items • Updated Jul 17, 2024 • 37