Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Abstract
RLRF, a reinforcement learning method utilizing rendering feedback, enhances SVG generation in VLMs, improving accuracy and efficiency by comparing rendered SVGs to original images.
Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in vision-language models (VLMs) have enabled high-quality SVG generation by framing the problem as a code generation task and leveraging large-scale pretraining. VLMs are particularly suitable for this task as they capture both global semantics and fine-grained visual patterns, while transferring knowledge across vision, natural language, and code domains. However, existing VLM approaches often struggle to produce faithful and efficient SVGs because they never observe the rendered images during training. Although differentiable rendering for autoregressive SVG code generation remains unavailable, rendered outputs can still be compared to original inputs, enabling evaluative feedback suitable for reinforcement learning (RL). We introduce RLRF(Reinforcement Learning from Rendering Feedback), an RL method that enhances SVG generation in autoregressive VLMs by leveraging feedback from rendered SVG outputs. Given an input image, the model generates SVG roll-outs that are rendered and compared to the original image to compute a reward. This visual fidelity feedback guides the model toward producing more accurate, efficient, and semantically coherent SVGs. RLRF significantly outperforms supervised fine-tuning, addressing common failure modes and enabling precise, high-quality SVG generation with strong structural understanding and generalization.
Community
We introduce RLRF: Reinforcement Learning from Rendering Feedback. RL for SVGs!
Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in vision-language models (VLMs) have enabled high-quality SVG generation by framing the problem as a code generation task and leveraging large-scale pretraining. VLMs are particularly suitable for this task as they capture both global semantics and fine-grained visual patterns, while transferring knowledge across vision, natural language, and code domains. However, existing VLM approaches often struggle to produce faithful and efficient SVGs because they never observe the rendered images during training. Although differentiable rendering for autoregressive SVG code generation remains unavailable, rendered outputs can still be compared to original inputs, enabling evaluative feedback suitable for reinforcement learning (RL). We introduce RLRF(Reinforcement Learning from Rendering Feedback), an RL method that enhances SVG generation in autoregressive VLMs by leveraging feedback from rendered SVG outputs. Given an input image, the model generates SVG roll-outs that are rendered and compared to the original image to compute a reward. This visual fidelity feedback guides the model toward producing more accurate, efficient, and semantically coherent SVGs. RLRF significantly outperforms supervised fine-tuning, addressing common failure modes and enabling precise, high-quality SVG generation with strong structural understanding and generalization.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning (2025)
- OmniSVG: A Unified Scalable Vector Graphics Generation Model (2025)
- RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning (2025)
- VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning (2025)
- Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding (2025)
- Style Customization of Text-to-Vector Generation with Image Diffusion Priors (2025)
- Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Great paper! Just a couple questions:
- What was the generation prompt during training for RLRF on Image→SVG?
- Since cairosvg only supports a subset of svg, did you compare to how SVGs rendered in the browser?
- Any thoughts about curriculum learning, ie starting of with easy cases, then medium, etc instead of doing SFT for bootstrapping?
Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper