New AI model for hi-res video generation, Pyramid Flow, is available as open-source software

0
182
New AI model for hi-res video generation, Pyramid Flow, is available as open-source software


Ablation study of spatial pyramid at 50k image training step. On the right is a quantitative comparison of the FID results, where our method achieves almost three times the convergence speed.

A team of AI researchers from Peking University, Kuaishou Technology, and Beijing University of Posts and Telecommunications, has developed a new AI model called Pyramid Flow, that can be used to generate virtual hi-resolution (768p) video imagery. The group has written a paper describing how they built their model, its attributes and uses to which it might be put and have posted it on the arXiv preprint server.

Over the past several years, several entities, both private and public, have been scrambling to build video AI generation models. This is because such models can be used to create applications capable of producing virtual video content for use in television and —at far lower cost than filming real scenes.

This means that AI models are very rapidly increasing in value. In this new effort, the team in China has chosen to make their model open-source, which means anyone who chooses to develop an application for it (an inference shell) and run it locally—including for commercial use—can do so at no cost.

The makers of Pyramid Flow have added a new wrinkle to AI video generation models—it generates video in multiple low-resolution stages before generating the final result of its processing. The research team claims that an inference shell can generate a five-second video in 56 seconds—the result will be 384p resolution.

They point out that their approach generates video using far less computing power, which makes it less expensive. It also dramatically reduces the number of tokens needed for generation, making it more efficient.







A series of underwater explosions, creating bubbles and splashing water. Credit: Yang Jin et al

The team has posted (under an MIT License) the code for Pyramid Flow on GitHub, along with sample videos that demonstrate the highly realistic results that can be expected from the model. They have also listed the open-source datasets they used to train their model, which together, added up to 10 million short videos.

The research team did not mention the impact of ongoing claims made by those who see virtual videos made from open-source databases as violating copyright holders’ rights. However, they do suggest Pyramid Flow could be a suitable tool for use in fine-tuning material, without the need to pay a third party.

More information:
Yang Jin et al, Pyramidal Flow Matching for Efficient Video Generative Modeling, arXiv (2024). DOI: 10.48550/arxiv.2410.05954

pyramid-flow.github.io/

Demo: huggingface.co/spaces/Pyramid-Flow/pyramid-flow

Journal information:
arXiv


© 2024 Science X Network

Citation:
New AI model for hi-res video generation, Pyramid Flow, is available as open-source software (2024, October 14)
retrieved 14 October 2024
from https://techxplore.com/news/2024-10-ai-res-video-generation-pyramid.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here