arxiv:2506.07553

GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition

Published on Jun 9

· Submitted by

Hoter on Jun 10

Upvote

Authors:

Jingchao Wang ,

Haote Yang ,

Abstract

GTR-Mol-VLM, featuring graph traversal and data-centric principles, outperforms existing models in Optical Chemical Structure Recognition by accurately parsing molecular graphs and handling abbreviated structures.

AI-generated summary

Optical Chemical Structure Recognition (OCSR) is crucial for digitizing chemical knowledge by converting molecular images into machine-readable formats. While recent vision-language models (VLMs) have shown potential in this task, their image-captioning approach often struggles with complex molecular structures and inconsistent annotations. To overcome these challenges, we introduce GTR-Mol-VLM, a novel framework featuring two key innovations: (1) the Graph Traversal as Visual Chain of Thought mechanism that emulates human reasoning by incrementally parsing molecular graphs through sequential atom-bond predictions, and (2) the data-centric principle of Faithfully Recognize What You've Seen, which addresses the mismatch between abbreviated structures in images and their expanded annotations. To support model development, we constructed GTR-CoT-1.3M, a large-scale instruction-tuning dataset with meticulously corrected annotations, and introduced MolRec-Bench, the first benchmark designed for a fine-grained evaluation of graph-parsing accuracy in OCSR. Comprehensive experiments demonstrate that GTR-Mol-VLM achieves superior results compared to specialist models, chemistry-domain VLMs, and commercial general-purpose VLMs. Notably, in scenarios involving molecular images with functional group abbreviations, GTR-Mol-VLM outperforms the second-best baseline by approximately 14 percentage points, both in SMILES-based and graph-based metrics. We hope that this work will drive OCSR technology to more effectively meet real-world needs, thereby advancing the fields of cheminformatics and AI for Science. We will release GTR-CoT at https://github.com/opendatalab/GTR-CoT.

View arXiv page View PDF Add to collection

Community

Hoter

Paper author Paper submitter 2 days ago

We introduce GTR-Mol-VLM, a novel framework for Optical Chemical Structure Recognition (OCSR). GTR-Mol-VLM features in two key innovations: (1) the Graph Traversal as a Visual Chain of Thought mechanism that emulates human reasoning by incrementally parsing molecular graphs through sequential atom-bond predictions, and (2) the data-centric principle of Faithfully Recognize What You’ve Seen, which addresses the mismatch between abbreviated structures in images and their expanded annotations.
To support model development, we constructed GTR-CoT-1.3M, a large-scale instruction-tuning dataset with meticulously corrected annotations. And we introduced MolRec-Bench, the first benchmark designed
for a fine-grained evaluation of graph-parsing accuracy in OCSR.
Comprehensive experiments demonstrate that GTR-Mol-VLM achieves superior results compared to specialist models, chemistry-domain VLMs, and commercial general-purpose VLMs.
Our repo is at https://github.com/opendatalab/GTR-CoT.