To train an Artificial Intelligence it is necessary to give it a lot of information, information that it is subsequently able to analyze and extract information through questions. The issue is the origin of the data and whether or not it is protected by copyright. Last year, OpenAI was sued by the New York Times, accusing it of using the newspaper’s information to train its AI.
The New York Times lawsuit against OpenAI and Microsoft is not the first nor will it be the last that the companies in charge of training artificial intelligencesThe latest to jump on the bandwagon is NVIDIA, after the leak last week of a series of emails in which it was possible to read how this manufacturer is using YouTube to train its AI.
When the news broke, Google made a vague statement, neither saying what was right nor what was wrong, which is logical since it is also benefiting from all the content available on its video platform to train the different versions of Gemini that it offers.
Back to NVIDIA, the leak of the series of emails claiming that they were using YouTube videos has turned into a complaint from a YouTuber.
NVIDIA uses YouTube to train its AI
YouTuber David Millette sued OpenAI a few days ago for allegedly using YouTube video transcripts to train its AI, without asking permission from the content creators. A few days later, this same YouTuber, who seems to have plenty of free time, sued NVIDIA, for a similar reason that is not related to audio rights but to unfair enrichment and competition.
In the emails that were leaked last week, it could be read that NVIDIA trained its AI with more than 400,000 hours of YouTube video per dayIn these emails, employees questioned the ethical and legal nature of using these platforms without official consent, to which the legal department stated that there was no problem.
In the lawsuit filed by David Millette, he claims that using YouTube to crawl the Internet to train its AI “is an unfair, immoral, oppressive, unethical and harmful practice for users.” The companies behind the training of an AI never state what type of sources have been used.
There are many blogs that have grown tired of seeing how some servers are in charge of scrape data from their websites to train Artificial Intelligences, with Reddit being the one that is taking it most seriously and has taken measures to prevent them from continuing to do so. What is clear is that, if you don’t pay, no one will give you access to their servers to train an AI.
This bad practice among companies is due to the fact that they still There is no regulation which establishes what is legal and what is illegal regarding content available on social networks, YouTube and websites, moving in a legal grey area where no complaint is likely to come to fruition.
A clear example can be found in the response that NVIDIA has published after receiving this complaint, in which it states:
Anyone is free to learn facts and ideas from publicly available sources. The creation of new and transformative works is not only fair and equitable, but exactly what our legal system encourages.