Google’s Gemini AI models can already answer our questions, help us organize ourselves, write documents or even code applications. But in the not-so-distant future, Gemini could also… drive vehicles. In any case, this is the new avenue that is being explored by Waymo, the subsidiary of Alphabet (parent company of Google) specializing in autonomous vehicles and robotaxis.
Today, Waymo is a leader in its field. The Alphabet subsidiary already offers a competitor to Uber which operates autonomous cars in a few American cities, and which carries out more than 150,000 trips per week. And if Waymo is satisfied with the technologies it currently uses, it is now exploring the possibility of improving its autonomous vehicles using Gemini’s intelligence.
In a recent publication, Waymo presents a scientific article in which it describes a new technology called End-to-End Multimodal Model for Autonomous Driving. “Powered by Gemini, a large multimodal language model developed by Google, EMMA uses a unified end-to-end trained model to generate future autonomous vehicle trajectories directly from sensor data. Trained and optimized specifically for autonomous driving, EMMA leverages Gemini’s extensive global knowledge to better understand complex scenarios on the road.”we read in the Waymo press release.
Why use Gemini?
Waymo’s current approach relies on several independent modules to carry out the different tasks of autonomous driving. The advantage of this system is that it makes it easier to debug and optimize each module separately. However, this one has a scalability problem. And this system would have difficulty adapting to new environments, because it is optimized for targeted scenarios.
The use of large multimodal language models (which understand both texts and images) could solve this scalability problem. “Indeed, MLLMs, as general-purpose baseline models, excel in two key areas: (1) they are trained on large, Internet-scale datasets that provide rich “knowledge of the world” to the -beyond what is contained in common driving logs, and (2) they demonstrate superior reasoning skills through techniques such as chain-of-thought reasoning”we read in the Waymo article.
Challenges to overcome
But for now, while the use of generative artificial intelligence on self-driving cars has enormous potential, Waymo believes there are still significant challenges ahead. For example, the EMMA system designed by Waymo still has limitations in its ability to process videos. In addition, this still only includes images, but not data from more complex sensors, such as LiDAR sensors.
“While EMMA demonstrates promising results, it is still in its early stages with challenges and limitations in embedded deployment, spatial reasoning capability, interpretability and closed-loop simulation. Despite this, we believe our EMMA findings will inspire further research and progress in this area”says the Waymo article.
- Gemini can already summarize emails, answer questions or even generate computer code
- But this could, later, be used by the driving systems of autonomous cars
- Waymo, the robotaxis specialist, has imagined a new system based on Gemini to manage autonomous driving
- But for the moment, the work is only just beginning, because, although this system has enormous potential, it also has significant limitations that must first be eliminated.