PhotoFlow: AI Agent for Autonomous 3D Virtual Photography
Researchers have introduced PhotoFlow, an AI agent for autonomous 3D virtual photography, along with VPhotoBench, a benchmark of 47 open-license Blender scenes. The system uses a Director-Reviewer-Reflector architecture to navigate 3D scenes without preselected camera poses, inferring shots from language intent and scene information. The Director creates a photographic blueprint and proposes candidate cameras; the Reviewer applies rule checks, visual critique, and pairwise selection; the Reflector logs failures into region memory and dead-zone suppression. This work addresses the challenge of combining spatial understanding with aesthetic judgment in vision-language models.
Key facts
- PhotoFlow is a Director-Reviewer-Reflector agent for closed-loop camera search.
- VPhotoBench is a benchmark of 47 open-license Blender scenes.
- The agent operates without preselected camera poses or reference images.
- The Director builds a soft photographic blueprint and proposes diverse candidate cameras.
- The Reviewer combines rule checks, visual critique, and pairwise incumbent selection.
- The Reflector converts failures into region memory, dead-zone suppression, and high-explore relocation.
- The task stresses complex 3D spatial understanding and abstract aesthetic judgment.
- Published on arXiv with ID 2605.23771.
Entities
Institutions
- arXiv