MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Paper • 2405.15593 • Published May 24, 2024 • 1
SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models Paper • 2505.17967 • Published May 23 • 17
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering Paper • 2505.07782 • Published May 12 • 17
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Paper • 2504.10449 • Published Apr 14 • 12
Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences Paper • 2502.01126 • Published Feb 3 • 4
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
Feedback-Based Self-Learning in Large-Scale Conversational AI Agents Paper • 1911.02557 • Published Nov 6, 2019
A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning Paper • 2204.10815 • Published Apr 22, 2022
Self-Aware Feedback-Based Self-Learning in Large-Scale Conversational AI Paper • 2205.00029 • Published Apr 29, 2022
Training-Free Activation Sparsity in Large Language Models Paper • 2408.14690 • Published Aug 26, 2024
view post Post 2641 https://huggingface.co/organizations/nerdyface/share/xvWxWxYmYpCLqZlvNJEZbJHFsDITAicJAT 🚀 3 3 + Reply