Publishers rally against AI scraping at IAB Tech Lab summit

Mediavine CRO Amanda Martin joined Google, Meta, and 80+ media executives to establish technical standards forcing AI platforms to respect publisher consent and compensation.

Media executives unite against AI content scraping, with Google and Meta towers amid digital barriers.
Media executives unite against AI content scraping, with Google and Meta towers amid digital barriers.

Over 80 media executives gathered in New York during the week of July 30, 2025, under the IAB Tech Lab banner to address what many consider an existential threat to digital publishing. Mediavine Chief Revenue Officer Amanda Martin joined representatives from Google, Meta, and numerous other industry leaders in confronting AI companies that scrape publisher content without consent or compensation.

The meeting addressed systematic content extraction by AI platforms for model training and attention capture. Notably absent from the gathering were the AI companies at the center of the controversy: OpenAI, Anthropic, and Perplexity. Their non-attendance spoke volumes about the industry divide, according to attendees.

According to Martin, "Industry bodies cannot dictate commercial decisions by its members, but it can create buy-in and adoption of the standard, or the line in the sand that we're all going to act within." The statement highlights the collaborative approach needed to address what publishers increasingly view as exploitation.

The gathering aimed to develop a comprehensive LLM Content Ingest API that would fundamentally alter how AI platforms access publisher content. Unlike voluntary guidelines that many AI companies have ignored, this proposed framework would enforce publisher consent from the outset.

Many AI platforms currently disregard existing publisher guidance including robots.txt files and crawl without explicit consent. This systematic behavior has prompted publishers to seek technical solutions rather than relying on industry goodwill.

The proposed technical framework centers on four core principles: control, consent, credit, and compensation. Publishers would gain the ability to govern precisely how, when, and by whom their content gets used for AI training or inference.

Martin emphasized that enforcement represents only part of the solution. "It's not about airtight enforcement. It's about collective leverage and long-term sustainability for content creators," she explained during the announcement.

Industry standards gain momentum

The initiative builds on previous efforts by individual companies to address AI content access. Cloudflare launched its pay-per-crawl service on July 1, 2025, allowing content creators to charge AI crawlers for access through HTTP 402 Payment Required responses.

PPC Land covered the Cloudflare announcement, noting how the system affects major AI crawlers including Amazonbot (Amazon), Applebot (Apple), ByteSpider (ByteDance), CCBot (Common Crawl), ChatGPT-User (OpenAI), ClaudeBot (Anthropic), and others.

Mediavine expressed enthusiasm about Cloudflare's approach, noting their partnership through BigScoots hosting plans. "Cloudflare's extensive reach–powering nearly 20% of the internet–suggests that this feature is likely to be adopted as a standard by numerous AI companies almost immediately," Mediavine stated.

The company's partnership with BigScoots provides Mediavine publishers with Cloudflare Enterprise CDN integration, making implementation of pay-per-crawl features accessible to their network of more than 18,000 publishers.

Publisher resistance intensifies

The July 30 announcement follows a pattern of increasingly aggressive publisher actions against unauthorized AI content usage. PPC Land reported in September 2024 that Mediavine terminated multiple publisher accounts over AI-generated content concerns, demonstrating the company's commitment to maintaining content quality standards.

Mediavine's policy explicitly states it "does not monetize low-quality, mass-produced, unedited or undisclosed AI content that is scraped from other websites." The company also refuses to "tolerate publishers using AI to create untested recipes or any other form of low-quality content that devalues the contributions of legitimate content creators."

Research data supports growing publisher resistance to AI crawlers. Over 35% of top websites now block OpenAI's GPTBot, according to industry tracking. HUMAN Security documented 107% year-over-year increases in scraping attacks, highlighting the scale of unauthorized content extraction.

The resistance extends beyond blocking crawlers. Publishers are implementing sophisticated detection systems to identify AI-generated content and protect their platforms from low-quality mass-produced material.

Technical framework details

The LLM Content Ingest API would operate as a standardized interface allowing publishers to set specific terms for content access. The system would likely integrate with existing content management systems and web infrastructure.

Publishers would gain granular control over content licensing, including the ability to specify different terms for training data versus inference requests. The framework would also support dynamic pricing based on content value and usage patterns.

The technical approach addresses limitations of current solutions. While robots.txt files provide basic crawling guidance, they lack enforcement mechanisms and offer limited granularity. The proposed API would create binding agreements between publishers and AI platforms.

Implementation would likely involve industry-standard authentication and authorization protocols. Publishers could configure access controls at the domain, subdomain, or individual content level depending on their business requirements.

Economic implications

The financial impact on publishers has become a central concern driving the coalition efforts. According to Mediavine, "Attribution isn't optional. It's table stakes. Licensing must work for independent publishers, not just the top 1%. Scraping without permission is exploitation."

Current AI training practices effectively transfer economic value from content creators to AI platform operators. Publishers invest resources in content creation while AI companies monetize that content through their platforms.

The proposed framework would establish new revenue streams for publishers while potentially increasing costs for AI companies. This economic rebalancing could affect AI development costs and pricing structures across the industry.

Independent publishers face particular challenges in negotiating individual licensing agreements with major AI platforms. The collective approach through industry standards could provide smaller publishers with negotiating leverage they lack individually.

Industry response and adoption

The initiative represents a significant shift from previous industry approaches to AI content usage. Earlier efforts focused primarily on voluntary compliance and industry self-regulation.

The gathering brought together publishers, platforms, and technology vendors positioned between the competing interests. Google and Meta's participation signaled acknowledgment of publisher concerns, despite their own AI initiatives that rely on web content.

Meta's participation also signals acknowledgment of publisher value in AI development. Both companies have faced criticism for using web content in AI training without explicit permission or compensation.

The 80+ media executives involved represent a significant portion of digital publishing decision-makers. Their collective participation indicates broad industry consensus about the need for formal standards.

Implementation challenges

Establishing industry-wide standards faces several technical and commercial hurdles. Different AI platforms use varying approaches to content access and processing, making universal implementation complex.

Enforcement mechanisms remain unclear. While technical standards can specify requirements, ensuring compliance across global AI platforms requires international cooperation and potentially regulatory involvement.

Pricing structures present another challenge. Publishers need mechanisms to establish fair market rates for content licensing while AI companies require predictable cost structures for business planning.

Cross-border legal frameworks add complexity. AI platforms and publishers operate across multiple jurisdictions with different intellectual property and contract laws.

Market context

The initiative emerges during a period of rapid AI capability expansion and increasing regulatory scrutiny. European Union AI Act provisions and similar regulations in other jurisdictions are creating compliance requirements for AI companies.

PPC Land's analysis of programmatic advertising trends shows 72% of marketers planning to increase programmatic investment in 2025, highlighting the continued importance of digital advertising revenue for publishers.

Connected TV advertising growth, with budgets projected to double from 14% in 2023 to 28% in 2025, demonstrates publishers' need to diversify revenue sources beyond traditional display advertising.

Privacy-focused advertising changes, with mobile advertising facing particular challenges as 54% of mobile impressions now lack identifier coverage, add pressure on publishers to establish new monetization approaches.

Future implications

The coalition's success could establish precedent for other industries facing similar AI disruption. Publishing represents an early test case for balancing AI development needs with creator compensation.

Successful implementation might encourage similar standards development in other content-heavy industries including music, video, and software development.

The framework could influence AI company business models. Current approaches often assume free access to training data, while formal licensing requirements would necessitate budget allocation for content acquisition.

Publishers achieving sustainable AI licensing revenue might invest more heavily in content quality and production, potentially benefiting both creators and AI training data quality.

Timeline

  • July 1, 2025: Cloudflare launches pay-per-crawl service for AI content access
  • July 2, 2025: Mediavine announces support for Cloudflare's initiative and integration plans
  • July 3, 2025: Google ends Recipe Quick View pilot program following publisher concerns, as reported by PPC Land
  • July 30, 2025: More than 80 media executives meet in New York under IAB Tech Lab banner, with Mediavine CRO Amanda Martin participating alongside Google and Meta representatives
  • March 2024Mediavine terminated publisher accounts over AI-generated content violations
  • September 2024Industry analysis reveals growing publisher resistance to unauthorized content usage

Key terms explained

AI platforms: Technology companies developing artificial intelligence systems that require massive amounts of training data, often sourced from web content. These platforms include OpenAI, Anthropic, Google, Meta, and others that create large language models and other AI applications. The term encompasses both the companies themselves and their technological infrastructure used for content extraction and model training.

Content scraping: The automated process of extracting text, images, and other data from websites without explicit permission from content owners. AI companies use sophisticated crawling systems to gather training data from across the internet, often ignoring publisher guidance like robots.txt files that indicate crawling preferences.

IAB Tech Lab: The Interactive Advertising Bureau's technology laboratory that develops technical standards for digital advertising and publishing. The organization serves as a neutral forum where industry stakeholders can collaborate on solutions to emerging challenges, including AI content usage and publisher compensation frameworks.

LLM Content Ingest API: The proposed technical standard that would create a formal interface between publishers and AI companies for content access. This application programming interface would enforce publisher consent, control access parameters, ensure proper attribution, and facilitate compensation for content usage in AI training and inference.

Publishers: Content creators and media companies that produce original articles, videos, images, and other digital materials. This category includes individual bloggers, major news organizations, niche websites, and digital media companies that rely on advertising revenue and content licensing for sustainability.

Cloudflare: A web infrastructure company providing content delivery network services, security solutions, and other internet services to approximately 20% of websites globally. The company launched a pay-per-crawl service allowing publishers to charge AI crawlers for content access, representing a significant technical development in publisher monetization.

Consent mechanisms: Technical and legal frameworks that require explicit permission before AI systems can access and use publisher content. These mechanisms move beyond voluntary compliance to create binding agreements that specify terms of use, compensation rates, and attribution requirements for content usage.

Attribution requirements: Standards mandating that AI systems properly credit original content sources when using publisher materials for training or generating responses. Publishers argue that attribution represents a fundamental requirement rather than an optional courtesy, essential for maintaining content creator recognition and driving traffic back to original sources.

Revenue compensation: Financial arrangements that provide publishers with payment for AI companies' use of their content in model training and inference. This concept addresses the current practice where AI platforms extract economic value from publisher content without sharing revenue generated through AI applications and services.

Technical standards: Industry-agreed specifications and protocols that govern how AI systems interact with publisher content. These standards would create uniform approaches to content access, licensing, attribution, and compensation across different AI platforms and publishing systems, ensuring consistent implementation and enforcement mechanisms.

Summary

Who: Mediavine CRO Amanda Martin, Google representatives, Meta executives, and 80+ media industry leaders at IAB Tech Lab

What: Development of LLM Content Ingest API standards to enforce publisher consent, control, credit, and compensation for AI content access

When: Coalition meeting occurred the week ending July 30, 2025, with ongoing development of technical standards

Where: New York meeting under IAB Tech Lab facilitation, with global implications for digital publishing and AI development

Why: Publishers seek protection from unauthorized AI content scraping and compensation for content used in AI training and inference, addressing what they characterize as systematic exploitation by AI platforms