It’s hard to avoid the headlines about AI-based image creation. Image portals are banning artificially created images, artists are winning prizes without making a brush stroke. Now comes the next logical step on top of static image creation – moving image creation. Whether Meta, Open AI or Google, many companies are working on their first solutions, which sometimes leads to funny results.
Imagen is already familiar to the web as Google’s way to create images from text, but the extension of the tool unveiled in May will now be usable for video creation. As the Imagen team announced in a recent paper, the system is based on a “cascade of diffusion models.” As with image AI, the system needs only text input, and from that input the system then generates incremental high-resolution video. After a neural network creates an initial video, additional processing stages follow, continuously improving spatial fidelity and dynamics. According to the Google team at Imagen, up to 24 frames per second with an HD resolution of 1280×768 pixels are currently possible.
The exact mathematical and technical parameters behind the tech company’s latest development can be read in detail in the paper. Also what they learned from the use of the text coder T5-XXL, which derives meaning and task from the entered text templates.
As with the image AI Imagen, it is also not possible to test the possibilities of the video tool. The team justifies this procedure with security concerns, since problematic images still have to be sorted out to prevent potential misuse. This is also urgently necessary, at least according to the assumption, because since the model was trained with freely available images from the Internet, explicit content must first be filtered out.
But where is the journey heading? Is it the swan song of artists and creative professionals, since films can now also be created fully automatically after artworks, or will it rather be a tool that opens up completely new possibilities.