Google has presented today Image Videoyour new artificial intelligence that convert text to video. looks like an answer to make-a-videothe Goal AI that does the same, presented a few days ago.
The diffusion models applied to machine learning, are revolutionizing image-based artificial intelligence. We have already seen some very popular AIs that create images from text, such as DALL-E or Stable Diffusion. But now comes the second generation, which create videos from text.
A few weeks ago Meta presented Make-a-video, and today Google does the same with Video Image, a new AI that convert text to video. In its first version, it generates videos at a resolution of 1280×768 pixels, and 24 fps.
Image Video, a very cinematographic artificial intelligence
Diffusion models are generative models, that is, they generate new data from the data with which they have been trained.
What they do is destroy the data into small manageable pieces, and then rebuild it as needed.
For example, if you type the sentence: “An elephant with a party hat strolling along the bottom of the sea”, the AI deconstructs the sentence to extract keywords like “elephant”, “party hat”, or “bottom of the sea”. , and searches its database for images that meet this description, mixing them consistently to obtain an image or a video with what the phrase asks for:
In the case of Image Video, it first creates a low resolution video with 24×48 pixels at 3 fps and progressively scales it with higher resolution and more frames, until obtaining videos at 1280×768 pixels at 24 fps, and about 5 seconds long.
It is capable of generating videos imitating famous artists, and various styles of animation.
As Ars Technica explains, Image Video has been entered using the LAION-400M image bank, made up of more than 400 million images. Google has added as 14 million videos.
Unfortunately, this generates results that sometimes they are racist or discriminatory.
That is why Google has decided that, for now, is not going to make this artificial intelligence public. You want to apply a series of filters first to avoid controversial results.
Image Videothe artificial intelligence of Google that convert text to video, promises to generate a media impact similar to DALL-E. But for now, we have to settle for looking at the examples on their website.