Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Nazzaroth2 's Collections
models to test out
data synthesis
RL_Papers in general
OCR
imageGen
VLM RL Reasoning
LLM-External_information
llm_compression
LLM_Reasoning-ErrorCorrection
Loras
3D (nerfs, gaussians, generation etc.)
t2i consistency works
videogames_roleplay
small_or_multimodal_llm
manga_translation
long_context
model training

RL_Papers in general

updated 2 days ago
Upvote
-

  • Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

    Paper • 2504.08672 • Published Apr 11 • 55

  • A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

    Paper • 2504.12322 • Published Apr 11 • 28

  • Learning to Reason under Off-Policy Guidance

    Paper • 2504.14945 • Published Apr 21 • 84

  • TTRL: Test-Time Reinforcement Learning

    Paper • 2504.16084 • Published about 1 month ago • 107

  • Absolute Zero: Reinforced Self-play Reasoning with Zero Data

    Paper • 2505.03335 • Published 17 days ago • 154

  • Reasoning Models Better Express Their Confidence

    Paper • 2505.14489 • Published 3 days ago • 16
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs