AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter2-4k Text Generation • 0.0B • Updated 5 days ago • 17 • 1
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-iterDPO-iter2-4k Text Generation • 0.0B • Updated 5 days ago • 15 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter1-4k Text Generation • 0.0B • Updated 7 days ago • 29 • 1
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-iterDPO-iter1-4k Text Generation • 0.0B • Updated 7 days ago • 23
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-DRIFT-iter2-RPO Text Generation • 0.0B • Updated 8 days ago • 15
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-spin-iter2-RPO Text Generation • 0.0B • Updated 8 days ago • 15
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter2-RPO Text Generation • 0.0B • Updated 8 days ago • 17 • 1
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter1-RPO Text Generation • 0.0B • Updated 9 days ago • 26
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-spin-iter1-RPO Text Generation • 0.0B • Updated 9 days ago • 25 • 1
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-drift-iter1-RPO Text Generation • 0.0B • Updated 10 days ago • 20
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter2-4k Text Generation • 0.0B • Updated 10 days ago • 12 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter1-4k Text Generation • 0.0B • Updated 11 days ago • 22 • 1
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-DRIFT-iter2-RPO Text Generation • 0.0B • Updated 13 days ago • 7
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter2 Text Generation • 0.0B • Updated 16 days ago • 23
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter2 Text Generation • 0.0B • Updated 16 days ago • 17
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-SPIN-iter2 Text Generation • 0.0B • Updated 16 days ago • 17
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter1 Text Generation • 0.0B • Updated 17 days ago • 31 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter1 Text Generation • 0.0B • Updated 17 days ago • 32 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-SPIN-iter1 Text Generation • 0.0B • Updated 17 days ago • 35 • 1
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-RPO-DRIFT-iter1 Text Generation • 0.0B • Updated 18 days ago • 6
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-RPO-iterDPO-iter1 Text Generation • 0.0B • Updated 18 days ago • 6
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-RPO-SPIN-iter1 Text Generation • 0.0B • Updated 18 days ago • 6
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-seed-RPO-0.001 Text Generation • 0.0B • Updated 18 days ago • 52
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-0.001 Text Generation • 0.0B • Updated 18 days ago • 45
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-0.005 Text Generation • 0.0B • Updated 19 days ago • 10
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-1.5 Text Generation • 0.0B • Updated 19 days ago • 9
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-0.01 Text Generation • 0.0B • Updated 19 days ago • 6
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-0.1 Text Generation • 0.0B • Updated 19 days ago • 10
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-0.5 Text Generation • 0.0B • Updated 21 days ago • 7