Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 3 days ago • 19
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 8 days ago • 60
view post Post 2041 Only a single RTX 4090 running model pre-training is really slow, even for small language models!!! ( JingzeShi/doge-slm-677fd879f8c4fd0f43e05458) See translation 2 replies · 👀 8 8 🤯 6 6 👍 4 4 + Reply
Control LLM: Controlled Evolution for Intelligence Retention in LLM Paper • 2501.10979 • Published 11 days ago • 4
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published 9 days ago • 20
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 8 days ago • 267
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 10 days ago • 84
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 16 days ago • 55
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published 19 days ago • 8