view article Article Timm β€οΈ Transformers: Use any timm model with transformers 3 days ago β’ 25
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper β’ 2501.06282 β’ Published 8 days ago β’ 32
$\text{Transformer}^2$: Self-adaptive LLMs Paper β’ 2501.06252 β’ Published 9 days ago β’ 46 β’ 6
$\text{Transformer}^2$: Self-adaptive LLMs Paper β’ 2501.06252 β’ Published 9 days ago β’ 46 β’ 6
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper β’ 2501.06186 β’ Published 8 days ago β’ 55
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 10 days ago β’ 230
Scaling Laws for Floating Point Quantization Training Paper β’ 2501.02423 β’ Published 13 days ago β’ 25
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper β’ 2501.03262 β’ Published 14 days ago β’ 82
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 16 days ago β’ 37
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper β’ 2412.19326 β’ Published 23 days ago β’ 18
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper β’ 2412.18319 β’ Published 25 days ago β’ 37