Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 6 days ago • 42
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 16 days ago • 40
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published Feb 19 • 25
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach Paper • 2502.03639 • Published Feb 5 • 9
view post Post 1425 Training a model to reason in the continuous latent space based on Meta's Coconut. If it all works will apply it on the MiniCPM-o SVD-LR. Endgame is a multimodal, adaptive, and efficient foundational on device AI model. See translation 2 replies · 👀 7 7 🚀 2 2 + Reply