PhyCo: Physics-Controlled Video Generation Framework

ai-technology · 2026-05-01

Researchers have introduced PhyCo, a framework that integrates physical priors into video generation, addressing the lack of physical consistency in current diffusion models. PhyCo uses a dataset of over 100K photorealistic simulation videos with varied friction, restitution, deformation, and force. It fine-tunes a pretrained diffusion model via ControlNet conditioned on pixel-aligned physical property maps, and employs VLM-guided reward optimization for feedback. This approach enables controllable generation of physically plausible motion, such as realistic collisions and material responses.

Key facts

PhyCo introduces continuous, interpretable, physically grounded control into video generation.
Dataset includes over 100K photorealistic simulation videos with systematic variation of friction, restitution, deformation, and force.
Physics-supervised fine-tuning uses a ControlNet conditioned on pixel-aligned physical property maps.
VLM-guided reward optimization provides differentiable feedback via a fine-tuned vision-language model.
Addresses issues like object drift, unrealistic collisions, and mismatched material responses in video diffusion models.

PhyCo: Physics-Controlled Video Generation Framework

Key facts

Entities

Institutions

Sources