Skip to content


Reddit’s Lawsuit Against AI Scrapers: What it Means for the Industry

Brett Trout

If you are involved in content creation, AI development, or simply follow the big picture of the internet economy, the recent lawsuit filed by Reddit is something you should probably understand and follow. This lawsuit is about far more than one company suing another, it touches on who controls data, how AI gets trained, and what the rules for AI training will look like going forward.


What happened: Reddit sues over alleged large?scale scraping

On October?22,?2025, Reddit filed a lawsuit in New?York federal court against Perplexity AI and three web-scraping services: SerpApi (Texas), Oxylabs UAB (Lithuania), AWMProxy (former Russian botnet) and SerpApi (Texas).  

Reddit accuses the web scraping defendants of:

  • Scraping billions of Reddit posts/comments without permission.  
  • Circumventing anti?scraping protections by Reddit and Google by using Google search result pages as a back?door to Reddit content.
  • Violating laws including copyright, unfair competition, and unjust enrichment. 

Reddit accuses Perplexity of ignoring Reddit’s cease?and?desist letter, sent May?2024, and then increasing references/citations of Reddit content by “forty?fold.”  

In short, Reddit claims that the defendants treated its user?generated content as free training fuel for AI systems without a licensing agreement, discouraging other companies from entering into paid licenses with Reddit to use its content. 


Why this matters: implications for data rights, AI training, and content platforms

Data control and monetization

Reddit’s case highlights that platforms may increasingly assert that user?generated content is not “free for anyone to use” at scale for commercial AI systems. Reddit argues that it already licenses its content to major players like Google LLC and OpenAI under agreements that include terms protecting Reddit.  

If Reddit succeeds, content platforms will gain leverage in negotiations with AI companies in demanding higher compensation and/or tighter control over AI access to content.  

Training data and “public data” assumptions

Many AI developers have assumed that publicly available internet content is fair game for training large language models or answer engines. Reddit’s suit challenges that assumption, especially when circumventing security measures to access user content currently licensed to third-parties. 

Scraper liability and indirect access

Reddit alleges the defendants did not scrape Reddit directly, but scraped Google search result pages to harvest Reddit content (i.e., an indirect path). If the court awards Reddit damages for this type of indirect access through search engine results pages (SERPs), this could broaden liability for many other types of indirect access to content. 

Precedent for AI business models

This lawsuit could set a new legal precedent, changing how AI companies build business models, especially those that rely on massive ingestion of web content without any kind of license. The outcome may force large AI companies to license more content, while squeezing smaller AI companies out of the market altogether. 


Key Legal Claims & Technical Issues

Let’s break down the major legal and technical points at play.

Legal claims

  • Bypassing Reddit and Google Security: Reddit is claiming that the defendants have violated the Digital Millennium Copyright Act (DMCA), which prohibits anyone from circumventing technological measures controlling access to copyrighted works.  
  • Unfair competition: Reddit claims the defendants’ access to Reddit content has gained for themselves an undue competitive advantage.
  • Unjust enrichment: Reddit argues that the defendants have been unjustly enriched through their access to Reddit content.
  • Civil Conspiracy: Reddit argues SerpApi and Perplexity have entered into agreements for the purpose of illegally circumventing Reddit’s technological control measures to gain access to Reddit content.  

Technical / access issues

  • Violation of Google and Reddit terms of service. 
  • Use of: proxies, fake user-agent strings, shifting IP addresses, bots to mimic human users, rate limit circumvention, and CAPTCHA circumvention. 
  • Indirect content scraping via Google SERPS access Reddit content, rather than directly from Reddit’s API or website. 
  • Increased citation volume after cease?and?desist, which Reddit takes as evidence of unauthorized use. 

What this means for Reddit and the AI industry

For Reddit

  • Reddit is attempting to set a precedent for its licensing-of-user-generated-content-for-AI-training model.  
  • Reddit may also set a precedent that SERPS scraping of user-generated content can be blocked and compensated.
  • Reddit is taking a shot across the bow of AI companies that if you want to train on Reddit’s third-party content, you have to pay to play. 

For AI companies and content platforms

  • AI firms may have to take greater care how they obtain and use third?party content for training. “Publicly accessible” may not necessarily mean “free for AI training.”
  • Other platforms with valuable user?generated content may now demand licensing, or institute stricter API or access rules.
  • Scraper?services and data?broker services may find increased risk of liability if user-generated content platforms pursue them as enablers of unauthorized data acquisition.
  • A win for Reddit may spur more litigation and possibly state and federal regulation around training?data sourcing, fairness, transparency, and consent.
  • Enormous liability and/or license-fees may push smaller AI platforms out of the market and prevent new AI platforms from obtaining the capital needed to launch. 

For content creators and users

  • Users who generate content on Reddit will still be left empty-handed when Reddit licenses their content to third-parties.
  • The “value” of online communities may become more visible and monetized rather than assumed to be free, possibly leading to new platforms developing a revenue-sharing model (similar to YouTube) for users who generate the content.
  • Platforms may implement more visible policies about how user content may be used in AI training.

Outlook: Watch Points and What to Monitor

Here are the key developments to watch:

  1. Court rulings: How the court handles these legal claims: decisions on scraping via search results, indirect access, circumventing technological security measures, unfair competition, and the role of robots.txt/crawling protections.
  2. Licensing agreements: Whether more platforms follow Reddit’s path and demand paid licenses for AI training data.
  3. AI company responses: Will AI developers change their data?acquisition strategies, increase transparency, or negotiate deals proactively?
  4. Regulatory action: Legislators or regulators may step in to clarify rights around data, training, consent, and attribution for AI systems. Federal legislation is needed to preempt a hodgepodge of ill-considered state laws. 
  5. Technical safeguards: Platforms may strengthen anti?scraping measures, auditing of access logs, and contracts with data brokers and scrapers.

Closing Thoughts

The Reddit v. Perplexity lawsuit is not just another tech dispute. It could reshape how platforms resell your content, how AI systems are trained, and how the internet handles user?generated content in general. If you are a creator, business owner, platform manager, or AI developer, you would be well-advised to follow this case closely.  

Related posts

Posted in AI, Artificial Intelligence, Internet Law, Iowa Law, Litigation, social media. Tagged with , , , , , , , , , , , , , , , , , , , , .