Researchers have published last month a comprehensive comparative analysis of how data protection authorities worldwide frame the legal basis for artificial intelligence training under data protection law - and their findings expose a regulatory system that converges on paper while fracturing in practice.

The paper, titled "How the Legal Basis for AI Training is Framed in Data Protection Guidelines and Interventions: Comparative Perspectives and the Prospect of Global Convergence," was published in International Data Privacy Law by Oxford University Press. Its authors - Wenlong Li of Zhejiang University, Yueming Zhang of Ghent University, Qingqing Zheng of Shandong University, and Aolan Li of Queen Mary University London - examined 19 regulatory guidelines issued between 2020 and 2024 alongside a series of public enforcement actions against AI providers globally.

The paper was downloaded from Oxford Academic on 29 March 2026, the day before its formal publication date. Vadym Honcharenko, a privacy engineer at Google, shared the research on LinkedIn, noting that the degree to which regulators were aligned or not aligned in their risk mitigation views was the most interesting aspect of the analysis.

The central finding: agreement in name only

According to the authors, a functional ordering has emerged across jurisdictions that privileges legitimate interest as the dominant legal basis for AI training - but this apparent consensus dissolves when examined at the level of operational requirements. Guidelines rarely resolve deeper procedural and substantive ambiguities. Enforcement interventions often default to minimal safeguards. The result, according to the paper, is that legitimate interest risks becoming "little more than a formality."

The choice of legal basis matters enormously. Under data protection law, if no valid lawful basis can be established, the entire processing operation is rendered unlawful. For AI model training - which involves large-scale ingestion of data, opaque model architectures, and difficulty in identifying clear purposes - establishing lawful basis is particularly difficult to operationalize.

The debate has narrowed, in practice, to a binary between consent and legitimate interest for private-sector AI developers. Consent is normatively appealing because it respects individual autonomy. But the paper describes it as "widely recognized as difficult to operationalize in large-scale training pipelines, particularly in the context of third-party data reuse or web-scraped content." Withdrawal of consent creates cascading compliance challenges when datasets are dynamic.

Legitimate interest, meanwhile, offers a structured pathway. It requires a three-part assessment: whether there is a genuine interest being pursued, whether the processing is necessary for that interest, and whether the controller's interests override the fundamental rights of data subjects. The EDPB clarified how this three-part test applies to AI model development in Opinion 28/2024, adopted on 17 December 2024, stating that controllers may have a legitimate interest in developing AI systems to detect fraudulent content or improve threat detection. But, as PPC Land has noted, that opinion did not resolve how the test is operationalized in mass data collection contexts.

What the 19 guidelines show

The dataset of 19 guidelines covers authorities across Europe, Asia, Australia, and the Americas. European regulators dominate. The French CNIL and the UK's ICO published detailed, actionable guidance. Others, such as Germany's Bavarian DPA, contributed blog posts. The level of formality varies significantly, and so do the substantive positions.

All 19 guidelines recognize legitimate interest as a relevant basis. But the specifics diverge. According to the paper, only France's CNIL and Germany's LfDI adopted a list-based approach, naming specific scenarios they deem sufficient to establish legitimate interest - fraud detection, threat prevention, or scientific research. Other authorities use broader categories such as "improving the performance of a service or product," which the authors describe as lacking systematization and risking inconsistency.

On consent, four distinct regulatory stances are visible. The EDPB affirms consent retains a critical role, while the EDPS and ICO raise significant concerns about its feasibility in large-scale processing. Authorities in Asia, including ASEAN and Singapore, rhetorically prioritize consent as a default while recognizing legitimate interest in practice. Australia stands apart by expressly requiring explicit consent for sensitive data - photographs and recordings - with no flexibility on that point.

The necessity test receives similarly inconsistent treatment. Most guidelines reiterate the conceptual framework without providing actionable depth. The UK ICO's position shifted visibly between guidance iterations. In an earlier document, the ICO downplayed the need to rigorously justify necessity, arguing alternative approaches to AI development were not practical. In later guidance, the ICO reversed that position, emphasizing that AI developers must justify their chosen approaches - particularly when web scraping is involved - because web scraping constitutes a high-risk processing activity. That shift, according to the paper, "could fundamentally reshape the development pathways for AI systems." German data protection authorities issued their first guidelines on AI and privacy in May 2024, also requiring lawful data sourcing and transparency - a position that aligned with the ICO's later stance.

Enforcement: Italy leads, Brazil surprises, DeepSeek globalizes

The paper catalogues a series of enforcement interventions, most reaching procedural maturity between 2023 and early 2025. Italy's Garante stands out. On 1 April 2023, it suspended ChatGPT, forcing OpenAI to revise privacy policies, implement age verification, and clarify legal bases for data processing. The suspension was lifted by the end of April 2023. Garante's investigation continued, and a January 2024 update adjusted the scope, adding failures related to data breach notifications. At the close of 2024, Garante issued its final decision against OpenAI, imposing a €15 million fine along with mandated public awareness campaigns.

A separate Garante decision, issued on 29 November 2024, targeted the GEDI group - a media organization that supplied OpenAI with training content. Garante scrutinized GEDI's reliance on legitimate interest, finding inadequacies in its Data Protection Impact Assessment and its handling of sensitive data. This pairing of enforcement against both the AI developer and its data supplier is significant: it signals that the legal basis question extends throughout the data supply chain, not just to model operators.

Brazil's National Data Protection Authority provided perhaps the most operationally specific enforcement of the cases reviewed. In July 2024, the ANPD suspended Meta's AI training, citing deficiencies and threatening a daily fine of BRL 50,000 - approximately €8,389 - for non-compliance. The suspension was lifted by 30 August 2024 after Meta complied. Compliance requirements included reducing the number of steps needed to object to data processing from eight clicks to three, simplifying objection forms, and pledging to exclude data from individuals under 18 from training datasets. A subsequent December 2024 decision against X ordered it to cease processing minors' data for AI training. X was also required to ensure that its Data Protection Impact Assessments and Legitimate Interest Assessments, originally designed for GDPR, were explicitly adapted to Brazilian law. Brazilian regulators, in other words, demanded not merely a transplant of European frameworks but a contextual recalibration.

The UK ICO and Ireland's DPC took a softer approach with Meta. Meta voluntarily paused AI training following its June 2024 privacy policy update. The ICO's engagement culminated in Meta announcing a collaboration to develop a compliance framework anchored in legitimate interest, though the ICO issued a measured statement and refrained from granting formal regulatory approval. The European Commission has since proposed sweeping GDPR amendmentsthrough its Digital Omnibus initiative, circulated in November 2025, that would explicitly establish AI training as legitimate interest under Article 6(1)(f) - a move that would partially codify what enforcement has been working around in practice.

DeepSeek's regulatory trajectory demonstrates how the dynamics have shifted since the OpenAI era. Italy's Garante banned DeepSeek on 30 January 2025, finding deficiencies in transparency and the transfer of personal data from the EU to China. The Berlin Data Protection Commissioner went further on 27 June 2025, declaring DeepSeek's operations unlawful and requesting that Apple and Google delist the application from their stores. South Korea's PIPC initiated an investigation, directed removal from domestic app stores, and required the company to revise its privacy policy. Canada and Australia banned DeepSeek on government devices. India banned both DeepSeek and ChatGPT for specific governmental IT infrastructure. The paper notes that, unlike earlier cases shaped by early-stage doctrinal uncertainty, the DeepSeek episode evidences a convergence of cross-border action driven by heightened sensitivity to data flows toward China.

Where regulators agree, where they don't

The paper identifies three tiers of convergence. At the highest level, regulators broadly agree on the right to opt out. Meta revised its European privacy policy under regulatory and judicial pressure to streamline opt-out flows. The ANPD's specific requirement - reducing objection steps from eight to three - is the most operationally precise iteration. Most authorities assert the opt-out should not be overridden by the interests of data controllers, though framing varies: the EDPB implies it is essential, Spain's AEPD frames it as a best practice.

At a moderate level, there is partial convergence on reasonable expectations and the relevance of consent. Regulators increasingly differentiate between first-party and third-party data processing, with Germany's LfDI noting that individuals are far less likely to expect their data to be shared by third parties when there has been no direct interaction. The CNIL has explicitly flagged that mass web scraping for AI model training typically fails the reasonable expectation test under GDPR.

At the least convergent end, data minimization and the necessity of web scraping remain contested. Most guidelines instruct developers to process only strictly necessary data, but enforcement rarely penalizes over-collection. Benchmarks are absent. Hallucination - where AI models generate inaccurate or fabricated outputs - intersects with data protection accuracy principles, but regulatory treatment is "embryonic." Italy's Garante initially required technical controls to address hallucinations in ChatGPT outputs, but those measures were later diluted. According to the paper, common practices default to data deletion rather than meaningful rectification.

Privacy-enhancing technologies: mentioned, not mandated

Techniques such as differential privacy and federated learning appear repeatedly in the reviewed guidelines as potential safeguards. They are rarely analyzed in operational terms. The Singapore authority suggests that deployment of such techniques should be regarded as "basic controls." The ICO notes that current consultations lack sufficient verifiable evidence to assess the practical efficacy of these safeguards beyond theoretical stages. A June 2025 study by University of Tübingen researchers established that large language models memorize between 0.1 and 10 percent of training data verbatim, which directly challenges claims that AI models can be treated as anonymous even with privacy-enhancing measures in place.

What this means for the marketing industry

For marketing professionals and advertising technology companies, the practical implications are substantial. The EDPB's damning digest on how legitimate interest fails in practice - published just yesterday - reinforces the paper's core finding: generic invocations of legitimate interest are not sufficient. Controllers must demonstrate a specific, documented purpose, a necessity case, and a balancing assessment that confronts the actual risks to data subjects. Companies that have built AI-powered advertising tools on the assumption that legitimate interest provides a reliable, low-friction legal basis may find enforcement catching up with them, particularly in jurisdictions following the Italian or Brazilian enforcement model.

The proposed GDPR amendments from the European Commission would, if adopted, create an explicit legitimate interest basis for AI training under Article 88c. But those proposals remain politically contested. Estonia, France, Austria, and Slovenia have formally opposed changes to the GDPR. Germany, in an October 2025 policy document, pushed for even broader AI training exemptions than those proposed by the Commission. The outcome of these negotiations will determine whether the ambiguity described in today's paper is resolved through legislative clarity or persists through continued enforcement uncertainty.

Until that clarity arrives, the paper's conclusion holds: the invocation of legitimate interest risks becoming a façade, one that obscures rather than enables meaningful accountability. For companies processing personal data to train AI models, current regulatory expectations - however inconsistently enforced - require more than conceptual justification.

Timeline

  • February 2020 - Spain's AEPD publishes its introduction to GDPR compliance for AI processings, one of the earliest DPA guidelines on the topic.
  • 1 April 2023 - Italy's Garante suspends ChatGPT; OpenAI implements remedial measures and the suspension is lifted by end of April 2023.
  • November 2023 - Liechtenstein's Datenschutzstelle publishes guidelines on data protection for chatbots.
  • January 2024 - Germany's BayLDA publishes its AI and data protection blogpost; UK ICO publishes guidance on lawful basis for web scraping to train generative AI models.
  • 29 January 2024 - Garante notifies OpenAI of further breaches, adjusting the scope of its investigation to include data breach notification failures.
  • 1 March 2024 - Singapore's PDPC publishes advisory guidelines on personal data in AI recommendation and decision systems.
  • 2 February 2024 - ASEAN publishes its Guide on AI Governance and Ethics.
  • 6 May 2024 - Germany's DSK issues orientations on AI and data protection. Covered by PPC Land.
  • 3 June 2024 - EU's EDPS publishes orientations on generative AI and the EUDPR.
  • 10 June 2024 - France's CNIL publishes two guidance documents: one on legitimate interest for AI development, one on web scraping.
  • 14 June 2024 - ICO commences engagement with Meta following Meta's privacy policy update; Meta voluntarily pauses AI training.
  • 14 June 2024 - 11 DPAs initiate coordinated action against Meta's AI training under GDPR.
  • 2 July 2024 - Brazil's ANPD suspends Meta's privacy policy and AI training practices, threatening daily fines of BRL 50,000.
  • July 2024 - US NIST publishes its Generative AI Profile.
  • 6 August 2024 - Ireland's DPC takes High Court action against X over Grok data use.
  • 12 August 2024 - Nine DPAs initiate coordinated action against X's Grok training under GDPR.
  • 30 August 2024 - ANPD lifts Meta suspension after compliance with requirements.
  • 13 September 2024 - Meta announces UK AI compliance framework in collaboration with ICO.
  • 25 October 2023 - Hong Kong's PDPC intervenes in LinkedIn AI training; LinkedIn pauses use of Hong Kong users' data.
  • 8 October 2024 - EDPB adopts Guidelines 1/2024 on processing based on Article 6(1)(f).
  • 17 October 2024 - Germany's LfDI publishes discussion paper on legal basis for AI (Version 2.0).
  • 21 October 2024 - Australia's OAIC publishes two guidance documents on privacy and generative AI.
  • 28 October 2024 - UK ICO publishes guidance on AI and other products.
  • 29 November 2024 - Garante issues decision against GEDI group over AI training data supply.
  • 17 December 2024 - EDPB adopts Opinion 28/2024 on AI models. Covered by PPC Land.
  • 17 December 2024 - ANPD orders X to cease processing minors' data for AI training.
  • 18 December 2024 - Belgium's DPA publishes information note on AI systems and GDPR; ICO publishes response to generative AI consultation series.
  • December 2024 - Belgium's Gegevensbeschermingsautoriteit publishes information note on AI and GDPR.
  • 20 December 2024 - Garante issues final decision against OpenAI, imposing €15 million fine.
  • 30 January 2025 - Garante bans DeepSeek from operating in Italy.
  • 7 February 2025 - South Korea's PIPC investigates DeepSeek, directs app store removal.
  • 18 June 2025 - University of Tübingen research establishes LLMs qualify as personal data. Covered by PPC Land.
  • 27 June 2025 - Berlin Data Protection Commissioner declares DeepSeek unlawful, requests delisting from Apple and Google stores.
  • 22 July 2025 - France's CNIL finalizes AI development recommendations under GDPR. Covered by PPC Land.
  • November 2025 - European Commission circulates draft GDPR amendments through Digital Omnibus, proposing explicit legitimate interest basis for AI training. Covered by PPC Land.
  • 29 March 2026 - Paper downloaded from Oxford Academic ahead of publication.
  • 30 March 2026 (today) - Paper published in International Data Privacy Law by Oxford University Press; shared on LinkedIn by Vadym Honcharenko.

Summary

Who: Wenlong Li (Zhejiang University), Yueming Zhang (Ghent University), Qingqing Zheng (Shandong University), and Aolan Li (Queen Mary University London), with the research shared publicly today by Google privacy engineer Vadym Honcharenko on LinkedIn.

What: A peer-reviewed academic paper published in International Data Privacy Law analyzing 19 data protection authority guidelines issued between 2020 and 2024 and a series of global enforcement actions concerning the legal basis for AI model training. The paper finds that regulators formally converge on legitimate interest while diverging sharply on how it should be operationalized, and warns that the current framework leaves significant gaps in protection and operational clarity.

When: The paper was published today, 30 March 2026, by Oxford University Press. The guidelines and enforcement actions it covers span from February 2020 through June 2025.

Where: The regulatory developments analyzed span the European Union (EDPB, EDPS, France, UK, Germany, Spain, Belgium, Liechtenstein), Latin America (Brazil, through the ANPD), Asia-Pacific (Singapore, ASEAN, Australia, South Korea, Hong Kong), and the United States (NIST). Enforcement actions involved OpenAI, Meta, X (Grok), DeepSeek, the GEDI group, Snap, and LinkedIn.

Why: As AI-specific regulation such as the EU AI Act faces uncertainty, and as the GDPR remains the de facto governance baseline for AI across jurisdictions lacking operational AI laws, the legal basis question has become the primary regulatory chokepoint for AI model development globally. Marketing and advertising technology companies building or deploying AI systems on personal data face direct compliance obligations shaped by these divergent standards - and current enforcement, according to the authors, is not resolving the ambiguity.

Share this article
The link has been copied!