Stable Video 4D

Introducing Stable Video 4D, stability AI’s Latest Model for Dynamic Multi-Angle Video Generation.

Key Takeaways:

Stable Video 4D can convert a single object video into multiple novel-view videos from eight different angles.
With a single inference, Stable Video generates 5 frames across these 8 views in approximately 40 seconds.
Users can specify camera angles, customizing the output to suit specific creative needs.
Currently in the research phase, this model has potential future applications in game development, video editing, and virtual reality, with ongoing improvements anticipated.
Stable Video is currently available on Hugging Face.

How It Works

Users begin by uploading a single video and specifying their desired 3D camera poses. Stable Video 4D then generates eight novel-view videos based on the specified camera angles, providing a comprehensive, multi-angle perspective of the subject. These generated videos can be used to efficiently optimize a dynamic 3D representation of the subject.

Currently, Stable Video 4D can produce 5-frame videos across 8 views in approximately 40 seconds, with the entire 4D optimization process taking around 20 to 25 minutes. Our team envisions future applications in game development, video editing, and virtual reality, where professionals can significantly benefit from visualizing objects from multiple perspectives, enhancing realism and immersion.

State-of-the-Art Performance

Unlike previous methods that often require combining image diffusion models, video diffusion models, and multi-view diffusion models, SV4D can generate multiple novel-view videos simultaneously. This approach greatly improves consistency across spatial and temporal axes, ensuring consistent object appearance across multiple views and timestamps. It also enables a more lightweight 4D optimization framework, eliminating the need for cumbersome score distillation sampling (SDS) with multiple diffusion models.

Research and Development

Stable Video 4D is available on Hugging Face and represents our first video-to-video generation model, marking an exciting milestone for Stability AI. We are actively refining the model, optimizing it to handle a wider range of real-world videos beyond the current synthetic datasets it has been trained on.

The Stability AI team is committed to continuous innovation and exploring real-world use cases for this and other technologies. We anticipate that companies will adopt our model and further fine-tune it to meet their unique requirements. The potential for this technology in creating realistic, multi-angle videos is vast, and we are excited to see how it evolves with ongoing research and development.

Read related articles: