PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

European Conference on Computer Vision (ECCV) 2024, Oral Presentation


1Massachusetts Institute of Technology,   2Stanford University,   3Columbia University,   4 Cornell University

Poking an object in a variety of ways


PhysDreamer enables realistic 3D interaction with objects.


Abstract

Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner.


Phys Dreamer = Physics-Based Simulation + Video Diffusion Prior



(Left) Leveraging and distilling prior knowledge of dynamics from a pre-trained video generation model, we estimate a physical material field for the static 3D object. (Right) The physical material field allows synthesizing interactive 3D dynamics under arbitrary forces. We show rendered sequences from two viewpoints, with red arrows indicating force directions.



Results gallery for interactive motions


We present an interactive gallery showcasing our results in synthesizing interactive motions for eight objects. The drawn circle and arrow roughly indicate the area and direction of the applied force (the circles are for illustrative purposes only and not drawn precisely).

red_rose telephone hat carnation whiterose orange_rose yellow_tulip alocasia



Comparison with baselines and real captured videos

Visual comparison between our synthesized videos, real captured footage, and baseline methods (DreamGaussian4D and PhysGaussian). Additionally, these videos are the same ones utilized in our user study. Details about the user study is mentioned in the experiment section of the paper.

carnations hat telephone white_rose orange_rose yellow_tulip alocasia

Real Capture Ours PhysGaussian DreamGaussian4D
Select different samples.



Citation

Acknowledgements

We would like to thank Peter Yichen Chen, Zhengqi Li, Pingchuan Ma, Minghao Guo, Ge Yang, and Shai Avidan for their help and insightful discussions. This work is in part supported by the NSF PHY-2019786 (The NSF AI Institute for Artificial Intelligence and Fundamental Interactions), RI #2211258, #2338203, ONR MURI N00014-22-1-2740, Quanta Computer, and Samsung

The website template was borrowed from ReconFusion, Ref-NeRF and DMD.