Regularized Self-Play

community

AI & ML interests

None defined yet.

models 44

RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter3

Text Generation • 3B • Updated Aug 11, 2025 • 1

RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter2

Text Generation • 3B • Updated Aug 11, 2025 • 1

RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter1

Text Generation • 3B • Updated Aug 11, 2025

RegularizedSelfPlay/Gemma-2-2B-SPPO-It-Iter1

Text Generation • 3B • Updated Aug 11, 2025 • 4

RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.5-sppo-reversekl-table

Text Generation • 8B • Updated Jul 30, 2025

RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.05-sppo-reversekl-table

Text Generation • 8B • Updated Jul 30, 2025

RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.05-sppo-reversekl-table

Text Generation • 8B • Updated Jul 30, 2025

RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table

Text Generation • 8B • Updated Jul 30, 2025 • 1

RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.5-sppo-reversekl-table

Text Generation • 8B • Updated Jul 30, 2025

RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.5-sppo-reversekl-table

Text Generation • 8B • Updated Jul 30, 2025

datasets 0

None public yet