RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter3
Text Generation
•
3B
•
Updated
•
7
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter2
Text Generation
•
3B
•
Updated
•
6
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter1
Text Generation
•
3B
•
Updated
•
5
RegularizedSelfPlay/Gemma-2-2B-SPPO-It-Iter1
Text Generation
•
3B
•
Updated
•
9
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.5-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
7
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.05-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
7
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.05-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table
Text Generation
•
8B
•
Updated
•
3
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.5-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.5-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
7
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-table
Text Generation
•
8B
•
Updated
•
7
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-table
Text Generation
•
8B
•
Updated
•
8
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table
Text Generation
•
8B
•
Updated
•
6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.05-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table
Text Generation
•
8B
•
Updated
•
4
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-table
Text Generation
•
8B
•
Updated
•
6
RegularizedSelfPlay/Llama-3-8B-Instruct-GPM-8B-SPPO-Iter1
Updated
RegularizedSelfPlay/Mistral-7B-Instruct-GPM-8B-SPPO-Iter1
Text Generation
•
8B
•
Updated
•
6
RegularizedSelfPlay/Mistral-7B-Instruct-GPM-SPPO-Iter2
Text Generation
•
7B
•
Updated
•
7
RegularizedSelfPlay/Mistral-7B-Instruct-GPM-SPPO-Iter1
Text Generation
•
7B
•
Updated
•
3
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3
8B
•
Updated
•
6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2
8B
•
Updated
•
5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1
8B
•
Updated
•
4
RegularizedSelfPlay/sppo_reversekl-0.5-Llama-3-8B-Instruct-RSPO-Iter3
8B
•
Updated
•
5
RegularizedSelfPlay/sppo_reversekl-0.5-Llama-3-8B-Instruct-RSPO-Iter2
8B
•
Updated
•
5
RegularizedSelfPlay/sppo_reversekl-0.5-Llama-3-8B-Instruct-RSPO-Iter1
8B
•
Updated
•
5
RegularizedSelfPlay/sppo_forward1reverse5-0.1-Llama-3-8B-Instruct-RSPO-Iter1
Updated
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter2
7B
•
Updated
•
4
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter3
7B
•
Updated
•
5
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter1
7B
•
Updated
•
5