OpenAI and Microsoft are increasingly singled out for their slightly odd interpretation of copyright. Their ChatGPT invention was reportedly trained through pirated content.
Many users are already taking advantage of Generative AI to create graphic, but also literary works, and sometimes, to profit from it. OpenAI, the startup which with ChatGPT is at the genesis of this digital and societal revolution, is accused by two writers of having used their content to train their AI.
To read – ChatGPT: OpenAI and Microsoft face a $3 billion fine for data theft
Artists criticize the creators of large language models for using normally paid sources to train their chatbots. If the goal of the big names in Silicon Valley is to profit from Artificial Intelligence, the authors of the original “sources” may not receive no compensation despite their essential contribution.
OpenAI is accused of using pirated books to train ChatGPT
According to Torrent Freak, “This week, authors Paul Tremblay and Mona Awad filed a class action lawsuit against OpenAI, accusing ChatGPT’s parent company of copyright infringement. According to them, ChatGPT was partially formed on their copyrighted works, without permission”. The charge is serious. How can plaintiffs be so sure that OpenAI trained its chatbot on their works? For Mr. Tremblay and Ms. Awad, the proof is obvious: “ ChatGPT generates summaries of complainants’ copyrighted workswhich is only possible if ChatGPT has been trained on these works”.
But the accusation goes even further. According to Messrs. Tremblay and Awad, OpenAI allegedly used content collected from book piracy sites like Z-Library. Indeed, if the company remains evasive on the origin of the sources used to train its AI, we know that ChatGPT has been trained on at least 360,000 books. Has the company paid to be “inspired” by these works? We can’t know, but the plaintiffs still note that “OpenAI must have used pirated resources, because there are no legitimate databases containing so many books”.