hanspeterlyngsoeraaschoujensen/Qwen3_1.7B-8-layer-fineweb_edu-10M-context_2048 Updated 7 days ago • 20
hanspeterlyngsoeraaschoujensen/Qwen3_1.7B-8-layer-fineweb_edu-10M-context_2048 Updated 7 days ago • 20
hanspeterlyngsoeraaschoujensen/Qwen3_0.6B-8-layer-fineweb_edu-10M-context_2048 Updated 7 days ago • 32
hanspeterlyngsoeraaschoujensen/Qwen3_0.6B-8-layer-fineweb_edu-10M-context_2048 Updated 7 days ago • 32
hanspeterlyngsoeraaschoujensen/Reasoning_Data_25K_DeepScaleR_1.5B_Preview Viewer • Updated Jun 18 • 25.2k • 14
hanspeterlyngsoeraaschoujensen/Reasoning_Data_25K_DeepScaleR_1.5B_Preview Viewer • Updated Jun 18 • 25.2k • 14
hanspeterlyngsoeraaschoujensen/reasoning_data_DeepScaleR_1.5B_Preview Viewer • Updated Jun 5 • 5.18k • 9
hanspeterlyngsoeraaschoujensen/reasoning_data_DeepScaleR_1.5B_Preview Viewer • Updated Jun 5 • 5.18k • 9
hanspeterlyngsoeraaschoujensen/10K_open_r1_OpenR1_Math_220k_synthetic_dataset Preview • Updated May 12 • 8
hanspeterlyngsoeraaschoujensen/10K_open_r1_OpenR1_Math_220k_synthetic_dataset Preview • Updated May 12 • 8
Running 2.85k 2.85k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled and 1 other • Oct 14, 2024 • 95