Home News > OpenAI Accuses Chinese AI Startup of Data Theft

OpenAI Accuses Chinese AI Startup of Data Theft

by Savannah Feb 21,2025

OpenAI suspects that DeepSeek, a Chinese AI model significantly cheaper than Western counterparts, may have been trained using OpenAI's data. This revelation, following the substantial market value losses experienced by Nvidia and other AI-related companies, has prompted concerns within the US tech industry. President Trump even referred to DeepSeek as a "wake-up call."

DeepSeek's R1 model, built upon the open-source DeepSeek-V3, boasts significantly lower training costs (estimated at $6 million) compared to Western models. While this claim has been contested, it has fueled investor anxieties regarding the massive investments in AI by American tech giants. DeepSeek's popularity, evidenced by its top ranking on US app download charts, further underscores this concern.

OpenAI and Microsoft are now investigating whether DeepSeek violated OpenAI's terms of service by employing "distillation," a technique involving extracting data from larger models, to integrate OpenAI's AI models into its own. OpenAI has confirmed its awareness of such attempts by Chinese and other companies to replicate leading US AI models and is actively pursuing countermeasures, including collaborating with the US government to protect its intellectual property.

David Sacks, President Trump's AI czar, corroborated OpenAI's suspicions, suggesting evidence points towards DeepSeek's use of distillation. He anticipates further actions from leading AI companies to prevent similar incidents.

This situation highlights the irony of OpenAI's accusations, given previous controversies surrounding its own use of copyrighted internet data in developing ChatGPT. Critics, like Ed Zitron, have pointed out this hypocrisy, referencing OpenAI's earlier justification for using copyrighted material in its training process, claiming it was "impossible" to create AI models like ChatGPT without it. This stance was reiterated in a submission to the UK's House of Lords and is further supported by OpenAI's defense against the New York Times' lawsuit alleging unlawful use of copyrighted material. This lawsuit follows a similar one filed by 17 authors, including George R. R. Martin. The legal landscape surrounding AI training data and copyright remains complex, particularly in light of a 2018 US Copyright Office ruling that AI-generated art is not copyrightable.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.

Latest Apps