MiniMax-01: Scaling Foundation Models with Lightning Attention Paper β’ 2501.08313 β’ Published 4 days ago β’ 258
MiniCPM Collection The MiniCPM family of LLMs and VLLMs. β’ 31 items β’ Updated Oct 22, 2024 β’ 59
Agent Laboratory: Using LLM Agents as Research Assistants Paper β’ 2501.04227 β’ Published 10 days ago β’ 77
Agentless: Demystifying LLM-based Software Engineering Agents Paper β’ 2407.01489 β’ Published Jul 1, 2024 β’ 57
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs By wolfram β’ Dec 4, 2024 β’ 76
Quantifying the Carbon Emissions of Machine Learning Paper β’ 1910.09700 β’ Published Oct 21, 2019 β’ 13
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model Paper β’ 2501.01028 β’ Published 16 days ago β’ 11
Deepseek V3 (All Versions) Collection Deepseek V3 - available in bf16, original, and GGUF formats, with support for 2, 3, 4, 5, 6 and 8-bit quantized versions. β’ 3 items β’ Updated 6 days ago β’ 24
Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published Dec 9, 2024 β’ 74
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper β’ 2402.03300 β’ Published Feb 5, 2024 β’ 78
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Paper β’ 2309.12284 β’ Published Sep 21, 2023 β’ 19
Skywork-o1-Open Collection Skywork o1 open model collections β’ 3 items β’ Updated Nov 27, 2024 β’ 20
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper β’ 2405.14333 β’ Published May 23, 2024 β’ 37
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Paper β’ 2412.10302 β’ Published Dec 13, 2024 β’ 11