S2ED: Training-Free Framework for Consistent Story Illustration
Researchers have introduced Story-to-Executable Descriptions (S2ED), a framework that is model-agnostic and does not require training, designed for illustrating multi-frame stories. S2ED transforms an entire narrative into a series of clear, editable executable descriptions, maintaining consistency throughout the frames. This framework orchestrates three agents to divide the narrative, establish canonical character traits, and enhance spatial and emotional cues. It allows for interpretable state propagation via prompts and enables local adjustments to correct drift without needing to retrain the generator. Tests conducted on the Flintstones and Shakoo Maku datasets demonstrate that S2ED enhances both sequence-level consistency and character fidelity compared to advanced prompting, large-model planning, and a reference training-based approach, as evidenced by automatic metrics and human evaluations. The paper can be found on arXiv.
Key facts
- S2ED is a training-free, model-agnostic framework.
- It converts stories into executable descriptions for consistent rendering.
- Three agents coordinate to segment narrative, ground character attributes, and enrich cues.
- Enables local edits to repair drift without retraining.
- Tested on Flintstones and Shakoo Maku datasets.
- Outperforms strong prompting, large-model planning, and reference training-based methods.
- Improves sequence-level consistency and character fidelity.
- Paper available on arXiv.
Entities
Institutions
- arXiv