Visual Sketchbook: Enabling Reflection and Refinement in MLLMs Chart-to-Code Generation

ACL ARR 2025 February Submission8505 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Chart-to-code is an emerging task with significant potential in data analysis, automated reporting, and education. It requires accurate visual interpretation of charts and the ability to translate this understanding into executable code. However, existing methods often struggle to generate precise code for more complex charts, resulting in non-executable code and inaccurate chart reconstructions. To address these challenges, we introduce Visual Sketchbook—a novel framework that employs a multistage optimization process through iterative multimodal feedback, inspired by recent test-time scaling techniques. Our method decomposes the generation process into reflection and refinement stages, allowing for progressive reasoning and verification. Experiments show that Visual Sketchbook achieves substantial improvements (on average a 12\% gain with a maximum of 17\%) in chart-to-code tasks compared to baseline methods. We further demonstrate that the effectiveness and generalizability of our proposed method through detailed analysis and ablation studies.
Paper Type: Short
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: cross-modal content generation, multimodality
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 8505
Loading