File size: 1,280 Bytes
5c7566f dda6c99 5c7566f f1006e5 08a8e65 45a3583 08a8e65 5c7566f 45a3583 dda6c99 5c7566f dda6c99 5c7566f dda6c99 5c7566f dda6c99 5c7566f 7c7d4f8 dda6c99 7c7d4f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
license: mit
datasets:
- PrimeIntellect/Reverse-Text-RL
language:
- en
base_model:
- Qwen/Qwen3-0.6B
- PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT
---
# Reverse Text Model Qwen3-0.6B
Simple model that was RL FT for 20 steps / epochs after SFT to reverse text using [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl/) (RL Training) and [reverse-text](https://github.com/PrimeIntellect-ai/prime-environments/tree/main/environments/reverse_text) (RL Environment). See the improvement in results:
## Comparison with SFT (base) model
The reward (correctness score) distribution has improved for the RLFT model across all rollouts.

At an instance level, if we compare the best scores across rollouts, we see a mean improvement of 3.73%. But a maximum of ~30% and reduction of ~3%

## Example Prompt & Reward
**Task:** `reverse-text`
**Prompt:**
- **System:**
“Reverse the text character-by-character. Put your answer in `<reversed_text>` tags.”
- **User:**
“The community in Bruck was merged into it”
**Expected Completion:**
```text
<reversed_text>
.ti otni degrem saw kcuBr ni ytinummoc ehT
</reversed_text>
```
**Expected Reward:**
0.963855421686747
Note: Reward is basd on the long common subsequence
|