Reverse Text Model Qwen3-0.6B
Simple model that was RL FT for 20 steps / epochs after SFT to reverse text using prime-rl (RL Training) and reverse-text (RL Environment). See the improvement in results:
Comparison with SFT (base) model
The reward (correctness score) distribution has improved for the RLFT model across all rollouts.
At an instance level, if we compare the best scores across rollouts, we see a mean improvement of 3.73%. But a maximum of ~30% and reduction of ~3%
Example Prompt & Reward
Task: reverse-text
Prompt:
- System:
“Reverse the text character-by-character. Put your answer in<reversed_text>
tags.” - User:
“The community in Bruck was merged into it”
Expected Completion:
<reversed_text>
.ti otni degrem saw kcuBr ni ytinummoc ehT
</reversed_text>
Expected Reward: 0.963855421686747
Note: Reward is basd on the long common subsequence
- Downloads last month
- 28
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for sameersegal/Qwen3-0.6B-Reverse-Text-SFT-RLFT
Base model
Qwen/Qwen3-0.6B-Base
Finetuned
PrimeIntellect/Qwen3-0.6B