Bill Would Mandate AI Companies Disclose Copyrighted Works Used in Training

A proposed law would require companies to disclose any copyrighted works used in training generative AI models or face fines

Ben Wodecki, Jr. Editor

April 10, 2024

3 Min Read
Abstract representation of a large language model's underlying contents being picked apart in the form of brains with colors falling out of it
Getty Images

Companies developing generative AI models would be required to reveal copyrighted works used to train their systems under a proposed law.

The Generative AI Copyright Disclosure Act would force model developers like OpenAI to submit a notice to the Register of Copyrights before releasing a new system on what copyrighted materials were used in both training and fine-tuning.

Companies would have no later than 30 days after release to submit disclosures which would contain a "sufficiently detailed summary" of the copyrighted works used, including URLs for any publicly available training datasets.

The bill’s requirements would also apply retroactively to previously released generative AI systems, meaning models like ChatGPT and Claude would face scrutiny.

The bill was introduced by California congressman Adam Schiff, who called for a balance between respecting creativity and technological progress.

“We must balance the immense potential of AI with the crucial need for ethical guidelines and protections,” Schiff said in a statement. “My Generative AI Copyright Disclosure Act is a pivotal step in this direction. It champions innovation while safeguarding the rights and contributions of creators, ensuring they are aware when their work contributes to AI training datasets.”

Related:Which AI Model Most Infringes on Copyrighted Content?

Companies that fail to comply with the prospective bill should it become law would face civil penalties of at least $5,000.

The register would be responsible for issuing penalties as well as establishing a publicly available online database containing the submitted notices, so copyright owners could check if their works were used in training datasets.

Schiff’s bill has already secured support from media trade organizations and unions, including the Recording Industry Association of America, SAG-AFTRA and the Writers Guild of America.

“Everything generated by AI ultimately originates from a human creative source. That’s why human creative content—intellectual property—must be protected,” said Duncan Crabtree-Ireland, SAG-AFTRA’s national executive director and chief negotiator. “SAG-AFTRA fully supports the Generative AI Copyright Disclosure Act, as this legislation is an important step in ensuring technology serves people and not the other way around.”

Companies developing generative AI models have faced several lawsuits in the past year over claims copyrighted content was used to train systems without permission.

ChatGPT maker OpenAI is fighting off claims from the New York Times that its chatbot generates content from its articles. Book authors, music publishers and artists have sued developers over copyright violation claims, with Nvidia, Anthropic and Stability AI among those hit with lawsuits.

Related:OpenAI: ‘Impossible’ to Train Models Without Copyrighted Content

Research from AI startup Patronus recently found that OpenAI’s GPT-4 was the model that reproduced the most copyrighted content.

To ensure access to copyrighted materials, model builders have sought to strike partnerships with media companies or social media firms to use their vast troves of data for model training.

OpenAI, for example, holds content licenses from Axel Springer and the Associated Press, while Google recently penned a deal with Reddit.

OpenAI claimed in January that it would be “impossible” to develop state-of-the-art models without access to copyrighted materials.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like