How the Fair Use Clause is Being Applied to Generative AI
Context
Courts in the U.S. are examining whether using copyrighted content to train Large Language Models (LLMs) without permission constitutes ‘fair use’ under copyright law. This has major implications for AI development and intellectual property rights (IPR) globally, including in India.
Understanding LLMs
-
What are LLMs?
-
Large general-purpose models for text classification, Q&A, and text generation.
-
Trained on massive datasets using deep learning (transformer architecture).
-
Use attention mechanisms for contextual understanding.
-
-
Working:
-
Predicts the next word or sequence based on input prompts.
-
Trained on data from Common Crawl, scanned books, and other sources (both authorized & unauthorized).
-
The Legal Grey Area
-
Central Issue:
-
Does training AI on copyrighted material without permission = copyright infringement?
-
-
Data Sources:
-
Public domain + copyrighted + potentially unauthorized content.
-
-
Key Legal Test (U.S.):
4-Factor Fair Use Test:-
Purpose & character of use – Is it transformative (creates a new purpose)?
-
Nature of the work – Factual vs. fictional content.
-
Amount used – Both qualitative & quantitative.
-
Effect on market value – Does it harm the market for the original work?
(Most decisive: Transformative use & market impact)
-
Recent U.S. Cases
1. Anthropic Case (Andrea Bartz vs. Anthropic PBC)
-
Issue: Claude trained on scanned/purchased + illegally sourced books.
-
Judgment:
-
Print-to-digital conversion & training = Fair use (transformative).
-
Illegal source downloading ≠ Fair use → Separate infringement analysis.
-
2. Meta Case (Kadrey vs. Meta Platforms)
-
Issue: LLaMA trained on books from unauthorized sources.
-
Judgment:
-
Training = Transformative → Fair use.
-
No proven market harm → Meta won summary judgment.
-
But: Torrent distribution still under scrutiny.
-
3. Thomson Reuters vs. Ross Intelligence (Precedent)
-
Issue: AI retrieving legal content (non-GenAI).
-
Judgment: No fair use → No transformation + direct market competition.
Implications for India
-
Indian Position:
-
No explicit ‘fair use’ → Only ‘fair dealing’ under Sections 52 & 39 of the Copyright Act, 1957.
-
Exceptions: Research, private study, criticism, review, reporting, etc.
-
-
Relevance:
-
With initiatives like AIRAWAT (AI research hub) and Bhashini (multilingual AI), clarity on copyright use in AI training is essential.
-
Need for policy updates balancing innovation with IPR protection.
-
Conclusion
The U.S. courts are setting important precedents by recognizing AI training as potentially transformative fair use, but illegal sourcing remains contentious. For India, with its growing AI ecosystem, clearer guidelines on the use of copyrighted data for AI training are urgently needed to foster innovation while protecting creators’ rights.





