AI Stability Introduces Stable Video Diffusion Models in Research Preview

AI

Stability AI Releases Stable Video Diffusion
As OpenAI celebrates the return of Sam Altman, its rivals are upping the ante in the AI race. Following Anthropic’s release of Claude 2.1 and Adobe’s reported acquisition of Rephrase.ai, Stability AI has announced the release of Stable Video Diffusion to mark its entry into the video generation space.

Stable Video Diffusion includes two state-of-the-art AI models, SVD and SVD-XT, designed to produce high-quality outputs, matching or surpassing the performance of other AI video generators available for research purposes. The company has open-sourced the image-to-video models as part of its research preview and plans to use user feedback for further refinement.

SVD and SVD-XT are latent diffusion models that take in a still image as a conditioning frame and generate 576×1024 video from it. Both models produce content at speeds between three to 30 frames per second, with the output lasting just up to four seconds. The SVD model produces 14 frames from stills, while the latter goes up to 25.

The company trained the base model on a large, systematically curated video dataset and then fine-tuned it on a smaller, high-quality dataset. Stability AI used publicly available research datasets for training and fine-tuning, although the exact source remains unclear.

Furthermore, the company detailed in a whitepaper that this model could serve as a base to fine-tune a diffusion model capable of multi-view synthesis, enabling it to generate multiple consistent views of an object using just a single still image.

While external evaluations found SVD outputs to be of high quality, Stability AI acknowledged that the models are far from perfect and plan to refine them, rule out their present gaps, and introduce new features for commercial applications. The company remains focused on open investigation of the models to address more issues and facilitate safe deployment in the future.

To get started with the new open-source Stable Video Diffusion models, users can find the code on the company’s GitHub repository and the weights required to run the model locally on its Hugging Face page. The company has specified allowed and excluded applications for usage, which currently permits applications in design, educational, and creative tools, but restricts the generation of “true representations of people or events”.

Stability AI’s release of Stable Video Diffusion underscores the company’s commitment to expanding the possibilities of AI-driven video generation and its dedication to achieving high-quality results for commercial and creative applications.