The collaboration between Apple and Nvidia promises to change the rules of the game in the development of artificial intelligence models. A new method can accelerate token production in language models by up to 2.7 times, an improvement that can transform the speed and efficiency of AI-based applications, including Apple Intelligence technology.

Keep reading this post and discover how far the power of this collaboration goes. It can be very important for the future of Apple.

A problem that requires innovation

Training machine learning models is a task that consumes enormous amounts of resources and time. Often, to overcome these limitations, companies choose to purchase more hardware, which increases costs significantly. That is why in this context, Apple’s research has focused on finding innovative solutions that reduce these requirements without compromising the results.

In early 2024, Apple introduced Recurrent Drafter, or ReDrafter, a speculative decoding technique designed to improve performance in model training. Using recurrent neural networks (RNN) along with optimized searches and dynamic attention, this technology managed to triple the speed of token generation compared to traditional auto-regressive generation methods.

Apple’s advancement was not limited to its own technology. In a joint effort, Apple collaborated with Nvidia to integrate ReDrafter into TensorRT-LLM, Nvidia’s inference acceleration framework. This not only streamlined the process for developers using Apple Silicon, but also made it accessible to those using Nvidia GPUs, hardware widely used in servers dedicated to training language models.

The integration was not easy. ReDrafter uses unique operators which are not part of other speculative decoding methods, so Nvidia had to add additional elements for the technology to work effectively in its environment.

After testing a production model with tens of billions of parameters on Nvidia GPUs, the results were impressive: a 2.7x improvement in build speed of tokens per second. This not only means lower latency for end users, but also a reduction in the amount of hardware required, lowering costs for businesses and improving energy efficiency.

Apple artificial intelligence

A strategic bet for the future

This move is part of a larger effort by Apple to enhance its artificial intelligence capabilities. Recently, the company confirmed that it was exploring using Amazon’s Trainium2 chips to train Apple Intelligence models. According to Apple, this technology could offer a 50% improvement in efficiency during pre-training compared to current hardware.

The collaboration between Apple and Nvidia marks a milestone in the evolution of language models. With this innovation, not only will the development of technologies based on artificial intelligence be accelerated, but it will also be will open new possibilities to offer faster, more efficient and economical services for both companies and end users.

Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *