Microsoft's rStar-Math paper claims that π€ ~7B models can match the math skills of o1 using clever train- and test-time techniques. You can now download their prompt templates from Hugging Face ! π The paper introduces rStar-Math, which claims to rival OpenAI o1's math reasoning capabilities by integrating Monte Carlo Tree Search (MCTS) with step-by-step verified reasoning trajectories. π€ A Process Preference Model (PPM) enables fine-grained evaluation of intermediate steps, improving training data quality. π§ͺ The system underwent four rounds of self-evolution, progressively refining both the policy and reward models to tackle Olympiad-level math problemsβwithout GPT-4-based data distillation. πΎ While we wait for the release of code and datasets, you can already download the prompts they used from the HF Hub! Details and links here π Prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/ Templates on the hub: MoritzLaurer/rstar-math-prompts Prompt-templates collection: MoritzLaurer/prompt-templates-6776aa0b0b8a923957920bb4 Paper: https://arxiv.org/pdf/2501.04519
FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!
π The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.
π€ Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.
π§ͺ The authors tested different prompt templates on held-out data to ensure their generalization.
π It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.
πΎ You can now download and reuse these prompt templates via the prompt-templates library!
π The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Letβs make LLM work more transparent and reproducible by sharing more templates like this!