Whisper is a new artificial intelligence from OpenAI that plans to revolutionize speech-to-text technologies and translators. According to ArsTechnica, this AI is capable of transcribing and translating interviews, podcasts, conversations and much more. But best of all, his ability to do so is almost at the level of a human.
As they comment from OpenAI, its artificial intelligence has been trained with more than 680,000 hours of audio. But in addition to listening, Whisper has also had to match those words with written text.
Thanks to the neural network of artificial intelligence, can use context from input datato later learn associations that can be translated into the model output.
How Whisper works, the AI capable of translating and transcribing any audio input
“The input audio is split into 30-second chunks,” OpenAI describes in the official post. In this way, “it becomes a spectrogram… and is passed to the encoder.”
But it is not all. Subsequently, the encoder is trained to predict the corresponding text. How is it done? intermingle tokens special instructions that direct the model to carry out a single task, such as the language identification. Other variables, such as phrase-level timestamp identification, multilingual speech transcription, and English translation, are then added to the equation.
Best of all, Whisper’s work doesn’t end here. OpenAI has decided to publish its code so that it can function as a basis for future speech processors and accessibility tools. Therefore, there is an opportunity to see improvements in artificial intelligence.
The results are impressive
The aforementioned medium points out that the technology behind this artificial intelligence is as impressive as its results. They used a podcast episode to test its powerwhich contained a fragment where a telephone had been used to transmit audio, so the quality left much to be desired.
Despite this, Whisper did a good job of transcribing the text while it was running in Python.. Of course, this technology does not work in real time, and according to ArsTechnica, it took a long time to finish it on a mid-range Intel processor. In the end, the result was “much better than AI-powered transcription services we’ve tried in the past.”
But beware, there is a small letter in the Whisper code. According to its creators, it is a tool that could also be used for evil. For example, to identify interlocutors in a conversation, or even to automate surveillance. However, OpenAI hopes that it will be used for good, and allow developers to create much more complex translation and transcription tools.