Reddit sues data scrapers and Perplexity over unauthorized content access

Reddit filed a comprehensive federal lawsuit on October 22, 2025, naming four defendants accused of circumventing technological controls to access and scrape its content. The complaint, filed in the United States District Court for the Southern District of New York, targets SerpApi LLC, Oxylabs UAB, AWMProxy, and Perplexity AI, Inc., seeking damages and injunctive relief under the Digital Millennium Copyright Act.

The platform's 41-page complaint describes how the defendants allegedly bypassed two layers of security protection. First, the companies evaded Reddit's own anti-scraping measures. Second, they circumvented Google's SearchGuard system to scrape Reddit content directly from Google search engine results pages. "Defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead," the complaint states.

Subscribe PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.

Industrial-scale data extraction documented

During a two-week period in July 2025, three defendant companies accessed nearly three billion search engine results pages containing Reddit content through automated processes. SerpApi accessed 784 million SERPs between July 1-6 and 1.05 billion between July 7-13. Oxylabs accessed 333 million and 448 million SERPs during the same periods. AWMProxy accessed 217 million and 264 million SERPs.

Reddit maintains over 100 million daily active users engaging in discussions across hundreds of thousands of interest-based communities. This corpus of authentic human discourse has become increasingly valuable to AI companies. "Reddit is a 'top-cited source' of data for these companies," the complaint states, citing industry analysis describing Reddit data as "like manna from heaven" for AI development.

The platform has entered partnership agreements with select AI companies, including Google and OpenAI, that include contractual guardrails to protect user privacy and intellectual property rights. Reddit and Google announced their partnership on February 22, 2024, enabling programmatic access to Reddit content for use in Google's products and services.

Follow on Google, Google News, X, LinkedIn, Mastodon, Bluesky, or via RSS

Technical circumvention methods detailed

The complaint describes specific techniques the defendant scrapers employ to bypass technological control measures. SerpApi advertises its "Ludicrous Speed Max" feature as using four times the server resources to create multiple parallel requests that "reject bad HTMLs, CAPTCHA and error pages, and other abnormalities." The company's website tells users they "don't need to care" about technological control measures including "HTTP requests, parsing HTML files to JSON, or captcha, IP address, bots detection, maintaining user-agent, HTML headers, [or] being blocked by Google."

Oxylabs operates what it calls "the world's largest ethical proxy network" and offers over 62,000 IP addresses located in New York. The company's website explicitly states its scraping service is meant to "bypass" restrictions, noting that "Oxylabs' Google Search API is constructed to bypass the technical challenges Google has implemented."

AWMProxy, previously operated by a Russian entity in connection with a cybercriminal botnet shut down in 2021, has apparently resumed operation selling access to proxy services that conceal location and identity. The company re-established its proxy network to resume data-scraping operations.

Perplexity accused of marked data tracking

Reddit employed digital tracking techniques to confirm Perplexity was using Reddit data acquired through scraping of Google SERPs. The platform created a "test post" that could only be crawled by Google's search engine and was not otherwise accessible anywhere on the internet. Within hours, queries to Perplexity's answer engine produced the contents of that test post. "The only way that Perplexity could have obtained that Reddit content and then used it in its 'answer engine' is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content," the complaint states.

Reddit sent a cease-and-desist letter to Perplexity in May 2024, demanding the company stop scraping Reddit data. At that time, Perplexity told Reddit it was not using Reddit content to train any AI models and would respect Reddit's robots.txt directives. However, after Reddit sent its cease-and-desist letter, the volume of Reddit data cited by Perplexity increased forty-fold.

Perplexity publicly lists itself as a customer of SerpApi on the scraping company's website. The AI firm acknowledged in an August 2025 blog post that "Reddit has emerged as the most cited domain across AI models globally."

Legal claims under DMCA and unfair competition

The complaint brings five counts against the defendants. Count I alleges all four defendants violated 17 U.S.C. § 1201(a)(1)(A) by circumventing technological measures that effectively control access to copyrighted works. Counts II and III target SerpApi and Oxylabs specifically for trafficking in technology designed to circumvent technological measures under 17 U.S.C. § 1201(a)(2) and § 1201(b).

Count IV alleges unfair competition by all defendants under state common law of New York, claiming they misappropriated Reddit's labor, skill, expenditures, and goodwill while displaying bad faith. Count V alleges unjust enrichment by all defendants at Reddit's expense. Count VI alleges civil conspiracy between SerpApi and Perplexity.

Reddit has implemented multiple anti-scraping systems including robots.txt files, IP rate limits, captcha bot protection, and anomaly-detection tools. The platform's User Agreement explicitly prohibits "scraping the Services without Reddit's prior written consent." Reddit's robots.txt file states, "Reddit believes in an open internet, but not the misuse of public content."

Google likewise prohibits unauthorized automated access to its search engine results pages through technological control measures including SearchGuard. This system is designed to prevent automated systems from accessing and obtaining wholesale search results while allowing individual users access to Google's search results.

Buy ads on PPC Land. PPC Land has standard and native ad formats via major DSPs and ad platforms like Google Ads. Via an auction CPM, you can reach industry professionals.

Learn more

Financial implications and prior litigation

Reddit's economic model depends increasingly on content licensing revenue. The platform achieved 31 percent year-over-year growth in daily active users to 108.1 million in Q1 2025, with advertising revenue increasing 61 percent to $358.6 million. The company went public in 2024.

The complaint notes Reddit previously filed a lawsuit against Anthropic on June 4, 2025, in San Francisco Superior Court (Case No. CGC-25-625892) for breach of contract, unjust enrichment, trespass to chattels, and unfair competition. That case alleged the AI company scraped platform data without permission to train Claude chatbot models.

The current lawsuit emphasizes privacy protection mechanisms absent from unauthorized scraping. "By bypassing all of the lawful means for obtaining this content, Defendants also bypass all of the strict restrictions placed on Reddit's licensed business partners that protect Redditors and their anonymity and privacy," the complaint states.

Reddit is forced to invest significant resources into hardware, software, and personnel to improve its technical security systems, surveillance, and anti-scraping efforts. The complaint seeks to stop the defendants' conduct to prevent ongoing harm to Reddit's reputation, user trust, and licensing business.

Context in broader AI data disputes

The lawsuit emerges as legal battles intensify across the AI industry over training data rights. Cloudflare data released in August 2025 revealed stark imbalances between how much content AI platforms crawl versus the traffic they refer back to publishers. Perplexity's crawl-to-refer ratio increased 256.7 percent relative to referrals throughout 2025, climbing from 54 crawls per referral in January to 195 by July.

Earlier in 2024, Cloudflare noted that "Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives." The company's CEO described Perplexity's practices as resembling "North Korean hackers" rather than a reputable AI company.

Google has taken its own steps to protect search results from scraping. On September 14, 2025, the search giant eliminated its n=100 SERP parameter, forcing SEO tools to make 10 separate requests instead of one to retrieve 100 search results. This change increased operational costs tenfold for businesses dependent on comprehensive search result analysis.

The complaint details that Reddit data is particularly well-suited to training large language models because it provides "real-time access to evolving and dynamic topics" and "constantly grows and regenerates as users come and interact with their communities and each other in genuine and authentic ways."

Reddit seeks injunctive relief to stop defendants from accessing or using Reddit's website and Google's website for unlawful data scraping, developing or distributing circumvention technology, and using any Reddit data previously obtained through circumvention. The platform also seeks actual, statutory, or compensatory damages; disgorgement of defendants' profits; pre-judgment and post-judgment interest; and attorneys' fees.

The defendants have not yet filed formal responses to the lawsuit. The case is assigned Case No. 25-cv-8736 with a jury trial demanded.

Subscribe PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.

Timeline

February 22, 2024: Reddit and Google announce partnership enabling programmatic access to Reddit content for use in Google's products and services
May 2024: Reddit sends cease-and-desist letter to Perplexity demanding it stop scraping Reddit data
June 4, 2025: Reddit files lawsuit against Anthropic for unauthorized use of Reddit data to train Claude AI models
July 1-13, 2025: Defendants access nearly three billion Google SERPs containing Reddit data during two-week period
August 2025: Perplexity's crawl-to-refer ratio reaches 195 crawls per referral, up from 54 in January
September 14, 2025: Google eliminates n=100 SERP parameter to protect search results from scraping
October 22, 2025: Reddit files lawsuit against SerpApi, Oxylabs, AWMProxy, and Perplexity AI in U.S. District Court for the Southern District of New York

Subscribe PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.

Summary

Who: Reddit Inc. filed the lawsuit against four defendants: SerpApi LLC (a Texas company providing web-scraping tools), Oxylabs UAB (a Lithuanian data scraper), AWMProxy (a former Russian botnet now registered in California), and Perplexity AI Inc. (an artificial intelligence company based in San Francisco).

What: The lawsuit alleges the defendants circumvented technological control measures implemented by Reddit and Google to access and scrape Reddit content on an industrial scale. During a two-week period in July 2025, the three scraper defendants accessed nearly three billion search engine results pages containing Reddit data through automated processes that bypassed security restrictions.

When: The complaint was filed on October 22, 2025, in the United States District Court for the Southern District of New York. The alleged circumvention activities occurred throughout 2024 and 2025, with specific data collection documented during July 1-13, 2025.

Where: The lawsuit was filed in the Southern District of New York (Case No. 25-cv-8736). The defendants operate from various locations: SerpApi is based in Austin, Texas; Oxylabs operates from Vilnius, Lithuania; AWMProxy is registered in California; and Perplexity AI maintains offices in San Francisco, California, and New York, New York.

Why: This matters for the marketing community because it addresses fundamental questions about data access, AI training practices, and the value of user-generated content in the artificial intelligence era. Reddit's content licensing revenue has become an increasingly important part of its business model, while AI companies desperately need access to high-quality, current data to support their ambitions. The case could establish important precedents for AI training data rights as the technology industry faces increasing litigation over content usage, potentially affecting how marketing professionals access data for competitive intelligence and how platforms monetize their content through licensing agreements versus allowing unauthorized scraping.