UltraIF series Collection Open-Sourced model and data for ULTRAIF: Advancing Instruction Following from the Wild. • 6 items • Updated Apr 3 • 3
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding Paper • 2506.07434 • Published 5 days ago • 7
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding Paper • 2506.07434 • Published 5 days ago • 7
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios Paper • 2505.12891 • Published 26 days ago • 2
MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling Paper • 2410.13610 • Published Oct 17, 2024
Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation Paper • 2503.08057 • Published Mar 11
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios Paper • 2505.12891 • Published 26 days ago • 2
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation Paper • 2503.06680 • Published Mar 9 • 20 • 7
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation Paper • 2503.06680 • Published Mar 9 • 20