Google takes a major step in the development of artificial intelligence with the launch of Gemini 2.0, successor to version 1.5 launched barely 10 months ago. This new version marks the entry into what Google calls “the age of agents”, where AI becomes capable of acting more autonomously and carrying out complex tasks for users.
A more powerful and versatile AI
Version 2.0 Flash, the first version of this new generation, brings considerable improvements in terms of performance. According to Google, it is twice as fast as the Pro 1.5 model while maintaining an equivalent level of quality. This version now natively integrates image and audio generation, in addition to text, making it a truly multimodal model.
Gemini 2.0’s text-to-speech capabilities are particularly impressive, with eight different voices optimized for various accents and languages. The model can also adjust its speaking rate on demand and adapt to natural interruptions in a conversation, making interactions more fluid and natural much like GPT Voice.
Revolutionary AI agents
Google presents several innovative projects that demonstrate the capabilities of Gemini 2.0 in the field of AI agents. Project Mariner, an experimental Chrome extension, can navigate and accomplish complex tasks directly in the web browser, with an impressive 83.5% success rate on the WebVoyager benchmark. Project Astra, on the other hand, benefits from major improvements like better multilingual understanding, native use of Google Search, Lens and Maps, as well as extended memory of up to 10 minutes during sessions.
The company is also introducing Jules, a specialized agent capable of integrating directly into GitHub workflows to support developers. In gaming, Google is working with developers like Supercell to create agents that can analyze on-screen action in real time and provide relevant advice to players.
Progressive and secure integration
Google is taking a careful and methodical approach to the Gemini 2.0 rollout. The Flash version is already available via the Gemini API and the AI Studio and Vertex AI development platforms, but some advanced features such as image and audio generation are temporarily reserved for preferred partners until January 2025.
The tech giant places particular emphasis on security and accountability in developing these new capabilities. A dedicated Responsibility and Safety Committee (RSC) actively works to identify and understand potential risks. Specific measures have been put in place, such as protection against the unintentional sharing of sensitive information in Project Astra, and security controls for Project Mariner to prevent fraud and phishing attempts.
As a reminder, Gemini is expected to be integrated into Apple Intelligence in addition to ChatGPT. If the partnership with Apple sees the light of day and if the Gemini 2.0 benchmarks, this new model could give a big boost to the use of artificial intelligence on iPhone and Mac. Unless Google reserves exclusivity on Android…