PAGER: Point-Precise GUI Control for Geometric Tasks
A new research paper introduces PAGER, a framework designed to address precision-sensitive GUI tasks requiring point-level accuracy in geometric construction. Unlike standard GUI interactions that tolerate region-level clicks, geometric tasks demand exact coordinate placement to avoid cascading topological errors. The paper presents PAGE Bench, a benchmark containing 4,906 problems and over 224,000 process-supervised pixel-level actions. PAGER aims to bridge the semantic-execution gap in large vision-language models for point-precise control. The work is published on arXiv under identifier 2605.15963.
Key facts
- PAGER addresses precision-sensitive GUI tasks requiring point-level accuracy.
- Geometric primitives have ontological dependencies causing cascading errors.
- PAGE Bench includes 4,906 problems and over 224K pixel-level actions.
- The paper is published on arXiv with ID 2605.15963.
- Large vision-language models currently rely on region-tolerant paradigms.
Entities
Institutions
- arXiv