ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

1Zhejiang University,  2Kuaishou Technology,  3CUHK,  4HUST

TL;DR: We propose ReCamMaster to re-capture in-the-wild videos with novel camera trajectories.



Demos

Arc Trajectories

      Source Videos Synthesized Videos Source Videos Synthesized Videos 


Translation Up Trajectories

      Source Videos Synthesized Videos Source Videos Synthesized Videos 


Translation Down Trajectories

      Source Videos Synthesized Videos Source Videos Synthesized Videos 


Pan Trajectories

      Source Videos Synthesized Videos Source Videos Synthesized Videos 


Tilt Trajectories

      Source Videos Synthesized Videos Source Videos Synthesized Videos 


Zoom in / Zoom out Trajectories

      Source Videos Synthesized Videos Source Videos Synthesized Videos 


More Complex Trajectories

      Source Videos Synthesized Videos Source Videos Synthesized Videos   


      Source Videos Synthesized Videos Source Videos Synthesized Videos   





Application in 4D Reconstruction


      Source Videos Synthesized Videos Source Videos Synthesized Videos 



Application in Video Stabilization


      Source Videos Synthesized Videos Source Videos Synthesized Videos 



Application in Embodied AI


      Source Videos Synthesized Videos Source Videos Synthesized Videos 



Application in Autonomous Driving


      Source Videos Synthesized Videos Source Videos Synthesized Videos 


Abstract

Camera control has been actively studied in text or image conditioned video generation tasks. However, altering camera trajectories of a given video remains under-explored, despite its importance in the field of video creation. This is non-trivial because it induces extra constraints of maintaining multiple-frame appearance and dynamic synchronization. To address this, we present ReCamMaster, a camera-controlled generative video re-rendering framework that reproduces the dynamic scene of an input video at novel camera trajectories. The core innovation lies in harnessing the generative capabilities of pre-trained text-to-video models through a thoroughly explored video conditioning mechanism. Considering the scarcity of qualified training data, we constructed a large-scale multi-camera synchronized video dataset using Unreal Engine 5, which is carefully curated to follow real-world filming characteristics, covering diverse scenes and camera movements. It helps the generalization of trained models to in-the-wild videos. Lastly, we further improve the robustness to diverse input through a meticulously designed training strategy. Extensive experiments tell that our method substantially outperforms existing state-of-the-art approaches and strong baselines. Our method also finds promising applications in video stabilization, super-resolution, and outpainting. Our code and dataset will be publicly available.

Method

To re-shoot a source video with novel camera trajectories, we propose to harness the generative capability of pre-trained text-to-video diffusion models by imposing dual conditions, i.e. the source video and target camera trajectories through a meticulously designed framework. The overview of the model is depicted below.


Comparisons

Ablation on Video Conditioning Machanisms

Reference:
[1] Van Hoorick, Basile, et al. "Generative camera dolly: Extreme monocular dynamic novel view synthesis." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024.
[2] Bian, Weikang, et al. "GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking." arXiv preprint arXiv:2501.02690 (2025).
[3] Bai, Jianhong, et al. "SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints." arXiv preprint arXiv:2412.07760 (2024).
[4] Zeqi Xiao, et al. "Trajectory attention for fine-grained video motion control." The Thirteenth International Conference on Learning Representations, 2025.
[5] Gu, Zekai, et al. "Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control." arXiv preprint arXiv:2501.03847 (2025).