✨ Archer2.0

🏹️ Reinforcement Learning for Enhanced Reasoning in LLMs 🎯

Overview

Archer2.0 marks a significant evolution from its predecessor through the introduction of Asymmetric Importance Sampling Policy Optimization (ASPO), which is designed to overcome the fundamental limitations of PPO-Clip, effectively mitigating issues like entropy collapse and repetitive outputs, preventing premature convergence, and thereby enabling more advanced reinforcement learning capabilities.

While our mathematical models are still in training and have not converged, we have evaluated Archer2.0 on the LiveCodeBench v5 and v6 code benchmarks. The results are detailed in the table below.

Method	LCB v5 (2024.08.01–2025.02.01)		LCB v6 (2025.02.01–2025.05.01)		Avg.
Method	avg@8	pass@8	avg@16	pass@16	Avg.
DeepSeek-R1-1.5B	16.7	29.0	17.2	34.4	17.0
DAPO	26.0	40.5	27.6	43.5	26.8
DeepCoder-1.5B	23.3	39.1	22.6	42.0	23.0
Nemotron-1.5B	26.1	35.5	29.5	42.8	27.8
Archer-Code-1.5B	29.4	43.7	30.2	45.8	29.8
Archer2.0-Code-1.5B-Preview	31.5	47.0	30.5	46.0	31.0

Downloads last month: 21

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Fate-Zero/Archer2.0-Code-1.5B-Preview

Quantizations

1 model

Collection including Fate-Zero/Archer2.0-Code-1.5B-Preview

Archer2.0

Collection

5 items • Updated 11 days ago • 1