The U.S. Copyright Office has just chimed in a third time on the use of copyrighted material to train Artificial Intelligence (AI) systems. “Copyright and Artificial Intelligence, Part 3: Generative AI Training” dives deep into the application of AI “fair use” and copyright licensing. So can you legally use copyrighted material to train your large language model (LLM)? Unsurprisingly, the government’s 108-page report boils down to “It depends.”

Understanding the Core Issues
At the heart of the report lies an examination of whether incorporating copyrighted works into AI training datasets constitutes fair use under U.S. copyright law. The report acknowledges that while some uses of copyrighted material in AI training sets may be transformative and thus qualify as fair use, the large-scale, commercial exploitation of copyrighted content without authorization likely exceeds the legal boundaries of fair use. This distinction is crucial for both tech and creative industry types to understand.
Fair Use
1. Fair Use Analysis: The report applies the four-factor fair use test to determine if you may legally use copyrighted material to train your AI:
- Purpose and Character: Commercial uses that directly compete with the original works are less likely to be considered fair use.
- Nature of the Work: Using highly creative works, such as music or literature, weighs against fair use.
- Amount and Substantiality: Utilizing entire works or significant portions thereof diminishes a fair use defense.
- Effect on the Market: If the AI-generated output serves as a substitute for the original, it negatively impacts the market, undermining a fair use claim.
2. Licensing and Market Development: The report advocates for the establishment of licensing frameworks to facilitate lawful access to copyrighted materials for AI training. This approach aims to balance the interests of both content creators and AI developers.
3. Legal Implications of Unauthorized Use: The report underscores that unauthorized use of copyrighted works in AI training can lead to prima facie infringement, exposing entities to potentially catastrophic legal liabilities.
Implications for AI
This report serves as a primer to the intellectual property infringement risks underlying AI development and deployment. It highlights the necessity of conducting thorough due diligence on training data sources and encourages the pursuit of licensing agreements to mitigate infringement risks.
Moreover, the report’s findings may influence future litigation and policy-making, potentially reshaping the boundaries of fair use in the context of AI. It is critical that companies using generative AI, and especially those training AI, stay abreast of these developments so they may adapt to the ever-evolving legal landscape.
Conclusion
The U.S. Copyright Office’s report offers a comprehensive analysis of the intersection between generative AI and copyright law, emphasizing the importance of lawful data use and the potential consequences of infringement. As AI technologies continue to advance, companies must remain aware of the changing complexities of intellectual property law to not only safeguard their own intellectual property rights but also avoid running afoul of third-party rights and the emerging regulatory morass forming around AI.
Recent Comments