kz919
/

DeepSeek-R1-Distill-Qwen-1.5B-GRPO-Cautious-TRL-0.18.0.dev

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

DeepSeek-R1-Distill-Qwen-1.5B-GRPO-Cautious-TRL-0.18.0.dev / trainer_state.json

kz919's picture

Model save

37a04ea verified 5 days ago

history contribute delete

3.3 MB

File too large to display, you can check the raw version instead.