Apple is again the protagonist and, on this occasion, it is without having organized any product presentation event or software. The company has published two investigations in its automatic learning blog that tell how they plan completely change our lives From the hand of smart devices.
What a priori may seem a simple technical advance, hides a deeper revolution that promises to transform photography, video and artificial intelligence into the Apple ecosystem. And the most interesting thing is that this already has specific names, technologies and concrete examples that give us clues of what is to come.
What will AI be in a few years?
The first of these advances is called Matrix3da photogrammetry model that allows to rebuild objects and environments in 3D from only two or three images. If you have ever tried to create a three -dimensional model from photos you will know that the process is tedious, requires dozens or even hundreds of images, and can give many errors. With Matrix3D, Apple intends to simplify this process to the point of making it accessible to any user, integrating in a single architecture all the necessary steps: from estimating the depth of the image and the position of the object, to generate new perspectives.
Thanks to this new technology, the way in which we edit the photos will change completely, in the same way as the creation of objects in applications of augmented reality or how we consume content in the pro vision, which already allow to convert 2D photos into totally enveloping experiences.
This model has been trained through a masked learning technique, in which the system must fill in parts of the missing information, as if it were a puzzle to which pieces are left over. The result is an AI that not only replicates reality, but is able to imagine how to complete it with surprising realism. In addition, it greatly reduces the requirements, so it could be applied even from an iPhone without additional hardware.
The second model, called Streambridgeput the focus on the video. This system transforms visual language models (well-known V-LLMs) into assistants capable of working in real time. And what does this mean? That you could record a video with your mobile and ask questions about what appears on the screen while it happens. For example, you could focus a plant and ask Siri what species it isor record cooking and receive step by step instructions without having to touch the device. A authentic pass.
Apple ensures that your system not only understands what you see, but is capable of anticipating. If you are drawing something, you can guide you without having to ask for it. If you are seeing a tutorial, you can offer help depending on what you are doing.
Both models are roadmap for what is about to come. Apple Intelligencewhich arrived in Spain less than two months ago, with functions such as intelligent automatic responses in mail, advanced notifications and writing tools, could soon incorporate these technologies. We will finally have a Siri who understands what you see, who can guide you while you record a video or help you model a 3D object with just a couple of photos.