While Meta focused efforts and large amounts of money on a metaverse that has not quite come together and has been left in very simple ups and downs, the world of big tech was betting on artificial intelligence that has meant a revolution in the general public.
But the company of Facebook, Instagram and WhatsApp is putting the batteries and has given a boost to their efforts in the area of artificial intelligence.
The launch of its LLaMA model has been positively valued, especially among the Open Source community. He then introduced his new tool called MusicGen, a generative AI for creating music. Now comes another project named Voicebox.
ZAO, the Chinese MOBILE APP that through DEEPFAKE turns you into DICAPRIO in SECONDS
What is Voicebox for?
According to the company, this generative speech AI is the first model capable of generalizing speech generation tasks. for which you have not been specifically trained with cutting-edge performance. It seems that it has managed to go beyond what other models achieve.
You can write a phrase to be converted into a voice, and this system creates those synthesized voices. There are different styles to choose from to read that text. They don’t sound totally natural, but it’s not a crazy style of canned voices either (you can hear how it sounds at this link). In addition, several languages are available: English, French, German, Spanish, Polish and Portuguese. All European.
The company explains that, to create Voicebox, Meta engineers trained it with 50,000 hours of voice from audiobooks in English, and another 60,000 hours of audiobooks in other languages. That makes the result sound like the people speaking are reading a book, whatever context you want to put in it.
In the future, Voicebox is expected to be able to give natural voices to virtual assistants and non-player characters from the metaverse, also allow people with visual disabilities to listen to written messages from your friends, read with AI in their voices, among other things.
Other Voicebox Capabilities
voicebox can produce high-quality audio clips and edit pre-recorded audio (like removing car horns or barking dogs) while preserving the content and style of the audio.
It is also possible to use a written text in any language and an audio clip in your native language. voice box will make you “say” that phrase in that language as if it were your native languageaccording to information from the company.
This artificial intelligence software is also capable of modifying the original audio clip with your voice in which you said something for some word and insert a new one indicated at the text prompt.
At the same time, this system can be used to make deepfakes, as is the case with other artificial intelligence tools. And it can be used to carry out scams by impersonating identities. For prevent it from being too accessible to the world, this software is not Open Source as it is LlaMa. Meta has decided not to publish the Voicebox code.
They have decided, according to information provided by Meta, not make it available to the public because they want to continue researching AI.
In Genbeta | Emulating ‘common sense’ with artificial intelligence when generating images and video: that is what Meta promises with I-JEPA
Image | Goal