RegularizedSelfPlay
/

Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.05-sppo-reversekl-table

Text Generation

text-generation-inference

Model card Files Files and versions

Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.05-sppo-reversekl-table / model-00002-of-00007.safetensors

Commit History

Upload LlamaForCausalLM

86963ec
verified

timxiaohangt commited on Jul 30