Advantage-Guided Distillation for Preference Alignment in Small Language Models Paper • 2502.17927 • Published Feb 25 • 1
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 89
QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization Paper • 2505.18092 • Published May 23 • 44
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion Paper • 2503.04222 • Published Mar 6 • 15
BlockPruner: Fine-grained Pruning for Large Language Models Paper • 2406.10594 • Published Jun 15, 2024 • 1
Weighted-Reward Preference Optimization for Implicit Model Fusion Paper • 2412.03187 • Published Dec 4, 2024 • 12
Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment Paper • 2401.10768 • Published Jan 19, 2024 • 2
Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration Paper • 2310.09168 • Published Oct 13, 2023 • 2
Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog Paper • 2305.10149 • Published May 17, 2023 • 2