Model Summary

Qwen-VL-PRM-3B is a process reward model finetuned from Qwen2.5-3B-Instruct on approximately 300,000 examples. It demonstrates strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks when used with Qwen2.5-VL and Gemma-3 models despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.

Use

The model usage is documented here.

Evaluation

Commercial Models

Model MMMU PuzzleVQA AlgoPuzzleVQA MathVista MathVision Overall
GPT-4o 70.7 60.0 57.8 30.9 31.2 50.1
o1 78.2 78.9 54.4 73.9 60.3 69.1
o3 82.9 84.1 62.3 86.8 -- --

Qwen-2.5-VL Family

Model MMMU PuzzleVQA AlgoPuzzleVQA MathVista MathVision Overall
Qwen-2.5-VL-3B 51.7 34.5 25.7 60.0 21.2 38.6
+ VL-PRM-7B 53.7 (+2.0) 44.9 (+10.5) 28.3 (+2.6) 64.1 (+4.1) 21.8 (+0.6) 42.6 (+4.0)
Qwen-2.5-VL-7B 55.0 48.0 29.1 67.8 24.2 44.8
+ VL-PRM-3B 57.6 (+2.6) 55.5 (+7.5) 33.8 (+4.7) 70.0 (+2.2) 26.1 (+1.9) 48.6 (+3.6)
+ VL-PRM-7B 57.4 (+2.4) 54.8 (+6.8) 35.3 (+6.2) 71.0 (+3.2) 26.2 (+2.0) 48.9 (+4.1)
Qwen-2.5-VL-32B 66.0 46.2 26.9 76.9 36.7 50.5
+ VL-PRM-3B 67.0 (+1.0) 67.1 (+20.8) 41.6 (+14.7) 77.7 (+0.8) 40.5 (+3.8) 58.7 (+8.2)
+ VL-PRM-7B 67.6 (+1.6) 66.8 (+20.6) 44.2 (+17.3) 78.3 (+1.4) 40.1 (+3.2) 59.4 (+8.9)

Gemma-3 Family

Model MMMU PuzzleVQA AlgoPuzzleVQA MathVista MathVision Overall
Gemma-3-12B 57.6 45.0 29.1 58.9 28.1 43.7
+ VL-PRM-3B 60.4 (+2.8) 57.7 (+12.7) 39.7 (+10.6) 60.3 (+1.4) 33.8 (+5.7) 50.4 (+6.7)
+ VL-PRM-7B 60.2 (+2.6) 59.0 (+12.0) 41.1 (+4.5) 63.3 (+4.4) 33.9 (+5.8) 51.5 (+7.8)
Gemma-3-27B 62.9 50.8 29.9 61.6 32.4 47.5
+ VL-PRM-3B 65.5 (+2.6) 67.4 (+16.6) 40.3 (+10.4) 65.4 (+3.8) 39.8 (+7.4) 55.7 (+8.2)
+ VL-PRM-7B 64.5 (+1.6) 67.6 (+16.8) 41.1 (+11.2) 65.2 (+3.6) 40.9 (+8.5) 55.9 (+8.4)

Framework versions

  • TRL: 0.19.1
  • Transformers: 4.55.3
  • Pytorch: 2.7.1
  • Datasets: 3.0.1
  • Tokenizers: 0.21.4

Citations

@misc{ong2025vlprms,
      title={Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned}, 
      author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi, and Soujanya Poria},
      year={2025},
      eprint={2509.23250},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/pdf/2509.23250}, 
}
Downloads last month
31
Safetensors
Model size
3.75B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ob11/Qwen-VL-PRM-3B

Finetuned
(501)
this model
Quantizations
2 models

Dataset used to train ob11/Qwen-VL-PRM-3B