SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper
•
2506.04180
•
Published
•
32
Note "Writing" a long novel is separated into 3 tasks, "Outline", "Writing", "Refining". A regular LLM would generate a novel in a single pass, limiting the coherency. Achieves >50% win against even 600B model using this approach. A Monte Carlo Tree Search method (DPO) is used to rank various responses to the same input.
Note Highlights that not enough coordination is there between physical action by agents (robots) and fetching information from internet by agents. In the exp, most failures are attributed to this lack of coordination. Simple "go from pt. A -> B" involves fetching coordinates and performing actions, iteratively.