Lords demand AI firms disclose training data or face UK licensing freeze

The UK's House of Lords Communications and Digital Committee last week published a wide-ranging report calling on the government to reject a broad commercial text and data mining exception for artificial intelligence, mandate statutory transparency obligations on AI developers, and create the conditions for a functioning licensing market - warning that failure to act risks ceding the country's creative economy to opaque, overseas AI systems.

The report, the 4th Report of Session 2024-26, was ordered to be printed on 24 February 2026 and published on 6 March 2026 under the title AI, Copyright and the Creative Industries (HL Paper 267). Chaired by Baroness Keeley, the Communications and Digital Committee comprises thirteen members and drew evidence from 21 witnesses across seven oral evidence sessions, supplemented by 29 pieces of written evidence.

The document sets out a stark binary. According to the committee, the UK must choose between becoming "a world-leading home for responsible, licensing-based artificial intelligence development" or drifting toward "tacit acceptance of large-scale, unlicensed use of creative content and long-term dependence on opaque models trained overseas, with most benefits accruing to a small number of US-based firms."

AI, copyright and the creative industries

4th Report of Session 2024–26

AI.pdf

1 MB

A £124 billion economy at stake

The economic backdrop matters. According to the report, the UK's creative industries contributed £124 billion in gross value added in 2023 and employed 2.4 million people. The government has identified this sector as having high-growth potential, with GVA forecast to reach £141 billion by 2030. Against this, the UK's dedicated AI firms generated £2.2 billion in GVA in 2024, while the AI sector as a whole contributed £11.8 billion and employed around 86,000 people.

These figures are not simply background context. They define the stakes of the policy choice the committee believes the government is currently failing to make. Several witnesses described the UK's copyright framework as the international "gold standard." The committee accepts that characterisation and argues the existing legal structure does not need to be weakened - it needs to be enforced, and complemented by mandatory transparency requirements that currently do not exist.

What the government's consultation found - and didn't resolve

The government launched a consultation on AI and copyright in December 2024, framed around four options: maintaining the status quo, strengthening copyright by requiring licences in all cases, creating a broad commercial text and data mining (TDM) exception, or introducing a commercial TDM exception with a rights-reservation opt-out mechanism. The consultation, which ran until February 2025, received more than 11,500 responses, including 10,112 submitted through the Citizen Space online survey.

The numbers were unambiguous. According to the committee, 88% of Citizen Space respondents supported licensing in all cases. Only 3% favoured a new commercial TDM exception with opt-out and transparency measures. Just 0.5% supported a broader commercial exception with no rights-reservation at all.

Despite these results, the government still has not published a final decision. The committee is direct about the cost of this delay: "Mixed public messaging and an extended consultation period have undermined trust and stalled licensing and investment." The committee calls on the government to publish its final position within the next year and, in the meantime, to state clearly that it will not introduce a TDM exception with an opt-out mechanism.

How generative AI intersects with copyright - technically

The report dedicates considerable space to explaining the mechanics of the generative AI lifecycle, distinguishing four copyright-relevant stages identified by Dr Hayleigh Bosher of Brunel University and Dr Andres Guadamuz of the University of Sussex.

Data collection and pre-processing involves assembling large volumes of material through automated "bots" that crawl publicly accessible websites, scrape content into separate databases, and convert it into machine-readable or tokenised formats. Multiple copies of protected works are typically created during indexing, cleaning, deduplication and tokenisation - steps that may no longer leave the original expression recognisable.

Model training feeds these prepared datasets iteratively to machine learning models. The internal parameters, or "weights," are repeatedly adjusted so the model captures statistical patterns. The result is numerical parameters rather than accessible copies of individual works - though models can in some cases "memorise" and later reproduce particular passages or images.

Inference integrates the trained model into products and services such as chatbots and search tools. Many services also use retrieval-augmented generation (RAG), where the system retrieves external, up-to-date information at the point of use - a process that can involve further copying, caching and processing of protected works beyond the training stage.

Output generation produces text, images, audio or other media. The report notes that most outputs are novel compositions synthesising patterns from many sources, but outputs that reproduce a substantial part of a protected work may still engage copyright restrictions, particularly where "memorised" text or images are recreated at a user's request.

Dr Guadamuz told the committee that "the act of gathering, copying, and processing works to build a training dataset will, in almost every case, involve the making of a copy of a work at some point in the process," concluding that "the unauthorised copying of protected works for AI training is capable of constituting infringement" under the Copyright, Designs and Patents Act 1988 (CDPA).

The Independent Society of Musicians put it plainly in written evidence: "In principle, UK copyright law already gives rightsholders the exclusive rights they need to control the use of their works, including reproduction, communication to the public, and adaptation. The problem is not the law's scope, but the lack of effective enforcement in the AI context."

The committee also notes that there is currently no ruling in the UK on whether training generative AI models on copyrighted works without a licence infringes the reproduction right. The UK High Court's November 2025 ruling in the Getty Images v Stability AI case addressed only secondary infringement relating to model weights, not the primary question of whether training itself constitutes infringement - and even that secondary ruling is now subject to a permission to appeal granted in January 2026.

The transparency deadlock - and a proposed way through

One of the report's central arguments concerns transparency. Large AI developers currently provide limited, voluntary disclosure about what training data they use. The committee argues this must change and become a statutory obligation.

The government's December 2025 progress statement - published under Section 137 of the Data (Use and Access) Act 2025 - confirmed strong support for mandatory transparency measures among creative sector consultation respondents. Under the Data (Use and Access) Act, the Secretary of State is required to publish, by 19 March 2026, a report on the use of copyright works in AI development and an economic impact assessment of the four policy options.

The report sets out what granular transparency disclosures might look like. Evidence submitted to the committee included calls for AI developers to disclose: the categories of data used in training; which rights reservation signals were respected; the specific legal basis for each dataset; the weighting or percentage contribution of each data source; the region where training took place; details of any third-party data providers; and the types of automated tools used to acquire content, including third-party crawlers.

Technology companies pushed back hard. Roxanne Carter, Global IP Lead in the Government Affairs and Policy team at Google, told the committee that the scale of such disclosures would be "too vast, even for Google," explaining that URL-level disclosure would amount to "summarising the internet." Meta warned that such requirements "fail to account for the massive scale of data involved in training AI models and the technical realities of this process." OpenAI raised trade secrecy concerns: "AI developers invest heavily in refining their data and training processes, and publicising this knowledge would alter competitive dynamics between AI developers."

The committee's proposed resolution is a confidential disclosure mechanism - modelled in part on the EU's General-Purpose AI Code of Practice, which includes a non-public "model documentation form" allowing developers to share detailed training information with regulators rather than publicly. This approach would give rightsholders a route to query, via a regulator, whether their content has been used in AI training, without requiring full public disclosure of training datasets.

The committee also flagged the territorial problem. Because large-scale AI training does not typically take place in the UK, domestic transparency rules could simply push developers to train models elsewhere. Matthew Sinclair of the Computer and Communications Industry Association put it bluntly: "No one is training here anyway, so no one is filling out our transparency forms." Antony Walker of techUK asked rhetorically what legal counsel would advise if a developer had to disclose fundamental IP in the UK but not in the US.

The committee's answer is that transparency obligations should apply to any model placed on the UK market, regardless of where it was trained - but that requirements must be designed carefully to avoid incentivising developers to withhold new model releases from UK users.

Rights reservation, technical tools and the limits of opt-out

Chapter 4 of the report examines the technical tools that could allow rightsholders to signal preferences about how their content is used. The committee identifies two broad categories.

Site-level or location-level controls include robots.txt files and the Robots Exclusion Protocol, which allow website operators to instruct web crawlers on which content may be accessed. The committee notes, however, that compliance by AI developers is inconsistent and that some developers have been documented using "stealth, undeclared crawlers" that ignore such signals. Publishers and media executives have been seeking technical standards to enforce consent from AI companies, with IAB Tech Lab hosting more than 80 executives in New York in July 2025 to develop a framework. IAB Europe has separately published technical standards requiring AI platforms to compensate publishers for content ingestion, including specifications for content access controls, discovery protocols and monetisation APIs.

Unit- or asset-level controls such as C2PA (Coalition for Content Provenance and Authenticity) standards allow individual files to carry embedded metadata indicating their provenance and rights status. The committee notes these standards remain in development and adoption is uneven. The report calls on the government to champion the creation of open, interoperable and globally aligned standards for rights reservation, data provenance and the labelling of AI-generated content - and to be "prepared to legislate where necessary."

The report also addresses the existing opt-out mechanism proposed under the EU model, which the government had initially favoured. The committee is sceptical: a broad TDM exception with an opt-out places the burden on rightsholders to actively remove their content from training datasets rather than requiring developers to obtain permission. Given that most rightsholders - particularly individual creators - lack the technical means or commercial leverage to deploy effective opt-outs at scale, the committee concludes this approach would predictably harm the creative sector. The US Senate introduced the TRAIN Act in July 2025, creating a subpoena mechanism for copyright owners to identify their works in AI training datasets - a different approach to the same transparency problem.

The emerging licensing market and the Creative Content Exchange

The committee's fifth chapter turns to the developing market for licensing creative content for AI use. Several AI companies have already concluded licensing agreements with publishers and content owners. The Financial Times announced a content licensing deal with Google in February 2026, covering a series of AI pilot projects. The UK's Competition and Markets Authority proposed conduct requirements in January 2026 targeting Google's handling of publisher content in AI features, including the ability for publishers to opt out of content usage in AI Overviews or AI model training.

The committee acknowledges the government's Creative Content Exchange (CCE) - a pilot marketplace initiative designed to facilitate licensing between rightsholders and AI developers. But the committee is cautious about relying on a single platform solution. According to the report, different rightsholders will pursue different licensing models: some will negotiate direct commercial agreements, others will prefer collective licensing through collective management organisations (CMOs) similar to those operating in the music industry. The government's role should be to create conditions that allow multiple models to coexist, rather than funnelling activity through one marketplace.

The committee also raises a practical question about who ultimately receives licensing revenue. Even where collective licensing structures exist, ensuring that remuneration reaches individual creators rather than being absorbed at an institutional level requires explicit design. The report recommends that the government explore mechanisms, in collaboration with existing CMOs, to ensure individual creators are paid.

Identity, style and the "personality right" gap

One of the more novel elements of the report concerns protection for digital identity and style imitation. The committee found that the UK currently has no "personality right" or specific protection for digital likeness. This means that creators and performers cannot challenge AI-generated outputs that imitate their distinctive style, voice or persona - even where those outputs directly substitute for their human-made work in commercial contexts.

The report calls on the government to introduce protections against unauthorised digital replicas and harmful "in the style of" AI outputs, giving creators and performers clear control over commercial exploitation of their identity. This recommendation goes beyond copyright reform - the committee explicitly notes that copyright does not protect style, only expression - and would require new legislation.

Sovereign AI as a fallback

The committee also endorses a strategy of building domestically governed AI capability as a hedge against overreliance on opaquely trained US models. The government has committed £2 billion in investments to this end, including the establishment of a Sovereign AI Unit. The committee recommends that the government's sovereign AI efforts should foster models that deliver "enhanced transparency and respect for copyright" - essentially creating a domestic alternative that demonstrates licensing-first development is viable.

This matters for the broader policy debate. If the UK government can demonstrate that responsibly trained AI models are commercially successful, it undermines the central argument from technology companies that weakening copyright law is necessary for AI competitiveness.

What this means for the marketing and advertising sector

For marketing professionals, the House of Lords report has direct and indirect implications. The intersection of AI and copyright has been tracked closely on PPC Land since the US Copyright Office began its own multi-part examination of the issue in 2024, and the UK committee's recommendations represent a significant attempt to resolve legal uncertainty that has paralysed licensing investment on both sides of the Atlantic.

Advertising agencies and brands that use AI-generated creative assets face ongoing uncertainty about the legal standing of content produced by models trained on unlicensed materials. The committee's emphasis on output labelling - requiring AI-generated content to be identifiable as such - has direct implications for creative production workflows, disclosure requirements and audience trust.

Publishers whose content feeds training datasets stand to benefit from mandatory licensing frameworks if they are properly enforced. Declining organic search traffic from AI features has already reshaped publisher economics. A functioning licensing market could create a new revenue stream, but only if transparency requirements are strong enough to establish what content was used and by whom.

Stanford researchers published findings in January 2026 showing that major AI models can reproduce substantial portions of copyrighted books when prompted, including 95.8% of a Harry Potter novel by Claude 3.7 Sonnet. That finding, combined with Anthropic's $1.5 billion settlement for unauthorised use of pirated books, illustrates that the risks identified by the House of Lords committee are not theoretical.

Whether the UK government acts swiftly on the committee's six core recommendations - or continues what the report describes as a policy drift characterised by "mixed public messaging" - will shape the terms on which AI operates in one of the world's most significant creative economies.

Timeline

December 2024: UK government launches consultation on AI and copyright, initially presenting a commercial TDM exception with opt-out as its "preferred" option.
December 2024 - February 2025: Government consultation runs; over 11,500 responses received, with 88% of Citizen Space respondents supporting licensing in all cases (option 1).
January 29, 2025: US Copyright Office releases Part 2 of its AI report, addressing copyrightability of AI-generated outputs.
February 12, 2025: US Copyright Office publishes economic implications report on AI and copyright policy.
March 11, 2025: Minister for Creative Industries Sir Chris Bryant tells the House of Commons that the Data (Use and Access) Bill is not the right vehicle for AI transparency obligations.
May 2025: US Copyright Office releases Part 3 of its AI report, addressing whether AI training on copyrighted works constitutes fair use.
May 12, 2025: Over 400 creative industries stakeholders write to the Prime Minister urging enforcement of copyright law in the context of AI training.
July 24, 2025: US Senator Peter Welch introduces the TRAIN Act (S.2455), creating an administrative subpoena mechanism for copyright owners to identify their works in AI training datasets.
July 30-August 2025: Over 80 media executives gather under IAB Tech Lab to develop technical standards for AI content access and publisher consent.
August 10, 2025: UK Law Commission publishes discussion paper examining AI legal challenges, including copyright and data protection.
September 10, 2025: Encyclopædia Britannica and Merriam-Webster file federal lawsuit against Perplexity AI for copyright infringement.
September 14, 2025: IAB Europe publishes technical standards requiring AI platforms to compensate publishers for content ingestion.
November 4, 2025: UK High Court dismisses Getty Images' secondary copyright infringement claims against Stability AI; primary infringement claims had already been dropped.
December 15, 2025: UK government publishes progress update under the Data (Use and Access) Act, confirming over 11,500 consultation responses and noting strong support for mandatory transparency among creative sector respondents.
January 6, 2026: Stanford researchers publish findings showing major AI models can reproduce substantial portions of copyrighted books, with Claude 3.7 Sonnet reproducing 95.8% of Harry Potter and the Sorcerer's Stone in a single run.
January 28, 2026: UK Competition and Markets Authority proposes conduct requirements targeting Google's handling of publisher content in AI features, including publisher opt-out rights over AI Overviews and model training.
February 11, 2026: Financial Times announces a content licensing deal with Google covering a series of AI pilot projects, at the FT Strategies conference in London.
24 February 2026: House of Lords Communications and Digital Committee orders the AI, Copyright and the Creative Industries report to be printed.
6 March 2026: Report published as HL Paper 267, 4th Report of Session 2024-26.

Summary

Who: The House of Lords Communications and Digital Committee, chaired by Baroness Keeley and comprising thirteen members, produced this report. Evidence was gathered from 21 witnesses including representatives from Google, Meta, OpenAI, the News Media Association, the Independent Society of Musicians, and several academic experts in intellectual property law.

What: The committee published HL Paper 267, AI, Copyright and the Creative Industries, recommending that the government reject a broad commercial text and data mining exception, mandate statutory transparency obligations for AI developers, close legal gaps around digital identity and style imitation, create conditions for a fair and inclusive licensing market, back technical standards for rights reservation and AI content labelling, and prioritise sovereign AI development.

When: The report was ordered to be printed on 24 February 2026 and published on 6 March 2026. It feeds into the government's forthcoming publications under the Data (Use and Access) Act 2025, including an economic impact assessment and a report on copyright and AI training due by 19 March 2026.

Where: The report was produced by the House of Lords, Westminster, and addresses policy for the United Kingdom, with significant reference to developments in the European Union, United States, Japan and Singapore.

Why: The committee acted because the UK government's consultation on AI and copyright, launched in December 2024, produced clear public support for licensing-based solutions but has not yet resulted in a policy decision. The committee found that this delay is undermining trust, stalling investment in licensing infrastructure, and leaving individual creators without meaningful enforcement tools as generative AI systems that depend on human-created content become more powerful and commercially dominant.