RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter3 Text Generation • 3B • Updated Aug 11 • 3
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter2 Text Generation • 3B • Updated Aug 11 • 3
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter1 Text Generation • 3B • Updated Aug 11 • 3
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.05-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 4
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.05-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 4
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table Text Generation • 8B • Updated Jul 30 • 10
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.05-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-table Text Generation • 8B • Updated Jul 30 • 4
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter2 7B • Updated Mar 27 • 2
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter3 7B • Updated Mar 27 • 2
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter1 7B • Updated Mar 27 • 2