A RL fine-tune of PrimeIntellect/Qwen3-1.7B-Wordle-SFT. Details here.
PrimeIntellect/Qwen3-1.7B-Wordle-SFT