New AI model for hi-res video generation, Pyramid Flow, is available as open-source software

October 14, 2024

182

Ablation study of spatial pyramid at 50k image training step. On the right is a quantitative comparison of the FID results, where our method achieves almost three times the convergence speed.

A team of AI researchers from Peking University, Kuaishou Technology, and Beijing University of Posts and Telecommunications, has developed a new AI model called Pyramid Flow, that can be used to generate virtual hi-resolution (768p) video imagery. The group has written a paper describing how they built their model, its attributes and uses to which it might be put and have posted it on the arXiv preprint server.

Over the past several years, several entities, both private and public, have been scrambling to build video AI generation models. This is because such models can be used to create applications capable of producing virtual video content for use in television and motion pictures—at far lower cost than filming real scenes.

This means that AI models are very rapidly increasing in value. In this new effort, the team in China has chosen to make their model open-source, which means anyone who chooses to develop an application for it (an inference shell) and run it locally—including for commercial use—can do so at no cost.

The makers of Pyramid Flow have added a new wrinkle to AI video generation models—it generates video in multiple low-resolution stages before generating the final result of its processing. The research team claims that an inference shell can generate a five-second video in 56 seconds—the result will be 384p resolution.

They point out that their approach generates video using far less computing power, which makes it less expensive. It also dramatically reduces the number of tokens needed for video generation, making it more efficient.

A series of underwater explosions, creating bubbles and splashing water. Credit: Yang Jin et al

The team has posted (under an MIT License) the code for Pyramid Flow on GitHub, along with sample videos that demonstrate the highly realistic results that can be expected from the model. They have also listed the open-source datasets they used to train their model, which together, added up to 10 million short videos.

The research team did not mention the impact of ongoing claims made by those who see virtual videos made from open-source databases as violating copyright holders’ rights. However, they do suggest Pyramid Flow could be a suitable tool for use in fine-tuning open-source material, without the need to pay a third party.

More information:
Yang Jin et al, Pyramidal Flow Matching for Efficient Video Generative Modeling, arXiv (2024). DOI: 10.48550/arxiv.2410.05954

pyramid-flow.github.io/

Demo: huggingface.co/spaces/Pyramid-Flow/pyramid-flow

Journal information:
arXiv

Citation:
New AI model for hi-res video generation, Pyramid Flow, is available as open-source software (2024, October 14)
retrieved 14 October 2024
from https://techxplore.com/news/2024-10-ai-res-video-generation-pyramid.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link

New AI model for hi-res video generation, Pyramid Flow, is available as open-source software

LEAVE A REPLY Cancel reply

Recent Posts

Alkami Improves Behavioral Data Tag Suite to Elevate Personalization of Banking

Razorpay Join Hands with MHA to Handle Cybersecurity for Digital Payments

dtcpay Exclusively for Payment in Stablecoins by 2025

What Payroll Documents Do You Need to Pay Employees?