Four Canadian privacy regulators today published the findings of a joint investigation into OpenAI OpCo, LLC, concluding that the company violated federal and provincial privacy laws when it developed and deployed its ChatGPT service. The 128-page report, released on May 6, 2026, under citation 2026 BCIPC 41, covers seven distinct legal issues spanning data collection, consent, accuracy, transparency, and data retention - areas that sit at the heart of how AI companies build their products.

The investigation was jointly conducted by the Office of the Privacy Commissioner of Canada (OPC), the Commission d'accès à l'information du Québec (CAI), the Office of the Information and Privacy Commissioner for British Columbia (OIPC-BC), and the Office of the Information and Privacy Commissioner of Alberta (OIPC-AB). The four offices examined whether OpenAI's practices, specifically relating to the GPT-3.5 and GPT-4 models that powered ChatGPT at the time the investigation launched, were consistent with Canada's Personal Information Protection and Electronic Documents Act (PIPEDA), Quebec's Act Respecting the Protection of Personal Information in the Private Sector, British Columbia's Personal Information Protection Act (PIPA-BC), and Alberta's Personal Information Protection Act (PIPA-AB).

What set off the investigation

The process began in April 2023, when the OPC launched an inquiry following a complaint that alleged OpenAI had collected, used, and disclosed a complainant's personal information without consent. A month later, in May 2023, the OPC decided to close that initial complaint and instead initiated a broader investigation under section 11(2) of PIPEDA. The three other provincial and territorial offices joined in parallel, triggering investigations under their respective statutes. The offices chose to work jointly to avoid duplicating effort and to pool expertise.

According to the report, investigators conducted extensive written exchanges with OpenAI through its legal counsel, held in-person interviews at the company's headquarters in San Francisco as well as virtually and at OPC offices in Gatineau, and ran their own hands-on testing of ChatGPT versions 3.5 and 4 inside the OPC's technical laboratory between November 2023 and May 2024. That testing included examining the account creation process, the data export tool, accuracy notifications, and general user interactions.

The scale of OpenAI's operation

The report provides detailed financial context. According to the report, as of March 31, 2025, the OpenAI group of companies was valued at US $300 billion. In October 2025, several media outlets reported a valuation of US $500 billion following a secondary share sale. More recently, OpenAI announced US $110 billion in new investment at a US $730 billion pre-money valuation. The company also completed the recapitalization of its for-profit enterprise on October 28, 2025, which became a public benefit corporation called OpenAI Group PBC. According to the report, as of January 2024, there were several million monthly active ChatGPT users in Canada, with several hundred thousand paid subscribers.

How the training data was assembled

The technical section of the report explains how OpenAI built the datasets that trained GPT-3.5 and GPT-4. According to the report, the pre-training datasets include trillions of tokens drawn from two main sources: publicly accessible websites crawled by OpenAI's GPTBot tool, and public repositories such as Common Crawl. The report notes that the GPTBot was only launched in August 2023. That timing matters: when OpenAI collected the majority of the publicly accessible information used to train GPT-3.5 and GPT-4, the opt-out mechanism via robots.txt was not yet available.

Training proceeds in two broad stages. Pre-training exposes the model to the vast unstructured dataset to develop general language capabilities. Fine-tuning then refines the model using a smaller, curated subset of data including human trainer-generated conversations and a portion of user interactions. According to the report, OpenAI confirmed that it did not screen out social media websites, websites aimed at children, or websites that may contain sensitive information from its pre-training data. Investigators found that the resulting datasets inevitably contained personal information of varying sensitivity levels - health conditions, political views, and information about children among them.

To reduce risk, OpenAI described several mitigation measures applied during pre-training: tokenization, which converts text into numerical representations; deduplication, which removes repeated content; and a masking process that detects personal identifiers such as phone numbers, email addresses, home addresses, and private individuals' names and social media handles. According to the report, OpenAI developed an internal tool to detect and mask additional categories of personal information prior to training. During fine-tuning, user interactions are disassociated from user accounts, filtered with a third-party tool, and reviewed by contractors instructed to exclude personal content.

The central legal finding on consent separates into two streams: data collected from publicly accessible websites and licensed datasets, and data collected from users' direct interactions with ChatGPT.

On the first stream, investigators found that OpenAI did not have implied consent for its collection and use of personal information from public websites and licensed third-party sources to train GPT-3.5 and GPT-4. According to the report, OpenAI argued it relied on individuals' implied consent. The OPC, OIPC-AB, and OIPC-BC rejected this position. The OIPC-BC and OIPC-AB went further, concluding that OpenAI's models are based on scraped data for which OpenAI has not obtained, and cannot obtain, consent under their respective statutes.

The second stream concerns users' interactions with ChatGPT. Investigators found that OpenAI should have obtained express consent - not just implied consent - for using personal information from user conversations to fine-tune GPT-3.5 and GPT-4. The reasoning: the practice involved sensitive personal information and was outside what users could reasonably expect. According to the report, OpenAI displayed a notice only once, at account creation, stating: "Don't share sensitive info. Chats may be reviewed and used to train our models." The report found this one-time notification insufficient.

A specific design issue attracted separate attention. According to the report, until April 2024, users who wished to opt out of having their interaction data used for training had to simultaneously disable their chat history. The report describes this as a "deceptive design pattern" of "forced action" - requiring users to accept data training use in order to keep access to their chat history. OpenAI later decoupled these controls.

There was also an issue with mobile users who accessed ChatGPT without an account. According to the report, the opt-out option did not consistently appear in signed-out settings on mobile web for some time, and OpenAI confirmed this problem was fixed on November 4, 2025.

Accuracy and hallucinations

Issue 4 of the investigation examined whether OpenAI took reasonable steps to ensure that personal information generated by ChatGPT was accurate, complete, and up to date. The report found that OpenAI did not, during the development and deployment of GPT-3.5 and GPT-4.

According to the report, investigators quoted a statement attributed to an OpenAI co-founder regarding ChatGPT's launch in November 2022: "Our biggest concern was around factuality, because the model likes to fabricate things. But (...) other large language models are already out there, so we thought that as long as ChatGPT is better than those in terms of factuality and other issues of safety, it should be good to go. Before launch we confirmed that the models did seem a bit more factual and safe than other models, according to our limited evaluations, so we decided to go ahead with the release." The report notes that OpenAI did not dispute this statement was made.

ChatGPT responses are not drawn from a database of facts that can be easily corrected. According to the report, they are based on a vast number of statistical correlations between words, which can in certain circumstances yield inaccurate personal information. The report calls for a qualified independent third-party assessment to establish the general level of accuracy of personal information in ChatGPT outputs, along with more prominent and explicit disclaimers within the interface. Investigators also recommend that OpenAI systematically provide source links for personal information in model outputs, and highlight facts for which no source is available.

In a development since the investigation began, OpenAI described a new measure: using web search capabilities in real time to retrieve up-to-date publicly accessible information before responding to prompts about individuals. According to the report, the OPC accepted this reduces the risk of inaccurate personal information appearing in outputs, and found the complaint on this point conditionally resolved under PIPEDA.

Access, deletion, and data retention

Investigators found OpenAI did not provide individuals with adequate mechanisms to access, correct, and delete their personal information. When someone requests deletion, OpenAI explained that reversing the training of a model is not technically feasible. According to the report, models are trained through repeated adjustments of billions of weights over successive runs, and the influence of any given data point is not preserved in isolation but diffused and compounded across model weights through subsequent adjustments that cannot be isolated. As a practical alternative, OpenAI uses blocklists to prevent verified personal information from appearing in outputs, and filters the same information from future training runs.

On retention, the investigation found that OpenAI did not have a formal retention and deletion policy when it developed and deployed GPT-3.5 and GPT-4. According to the report, while OpenAI had specific retention periods for some data categories - such as deleting user accounts within 30 days of deletion, and storing de-identified conversation data used for training for up to three years - it had no retention schedule for the raw, unstructured data collected from publicly accessible websites. That data is stored "as long as necessary to train successive iterations of OpenAI's models," in OpenAI's description. The report notes that raw datasets may include web pages removed from the internet, pirated content, content from adult websites, and other material that would subsequently be filtered out.

Accountability failures

Issue 7 of the report addresses accountability more broadly. The four offices found that OpenAI launched ChatGPT without having established the level of accuracy of personal information in its outputs, without having developed a retention policy for personal information, and without having fully addressed known privacy risks. According to the report, this failure exposed individuals to risks including privacy breaches, inaccuracy of information, and discrimination.

The report acknowledges that OpenAI has since created a Governance Risk and Compliance team, designated an external Privacy Officer, established security procedures, and conducted red-teaming using its internal and external expert network - the Red Teaming Network, announced in September 2023. These are recognized as positive steps, but the report concludes they came after, not before, deployment.

OpenAI's response and commitments

OpenAI disagreed with the findings. According to the report, the company asserted that it was compliant with the Acts in most respects through existing practices and communications. Nonetheless, it engaged with the process and committed to several additional measures.

Concurrently with today's publication, OpenAI committed to publishing a Canadian blog post explaining its privacy practices and taking steps to promote it. Within three months, it will expand a key article on model development to include plain-language explanation of training data sources, and will add a notice to the signed-out ChatGPT web experience - before the user inputs their first prompt - stating that chats may be reviewed and used for training. Within six months, it will provide personal information in a more accessible format through its data export tool, finalize its retention controls for deprecated datasets, test protections for minor family members of public figures, and provide quarterly reports to the four offices.

According to the report, OpenAI also confirmed that GPT-3.5 and GPT-4 have been deprecated - retired - and that the new mitigation measures, including the internal filtering tool, were applied throughout the development of the current models powering ChatGPT.

Different outcomes across jurisdictions

The four authorities reached different formal conclusions. The OPC found the complaint well-founded and conditionally resolved under PIPEDA, accepting OpenAI's commitments as sufficient to address the identified contraventions. The OIPC-BC and OIPC-AB took a stricter position: their statutes contain more specific requirements on consent, and they found that OpenAI's models are based on scraped data for which consent was not, and cannot be, obtained under those laws. Both offices joined the joint recommendations and will monitor implementation, but did not resolve the matter as completely as the OPC did. The CAI considered issues related to appropriate purposes, individual rights, and accountability as well-founded and conditionally resolved, but found the consent and retention issues under Quebec law well-founded and unresolved, and issued additional province-specific recommendations.

Context for the marketing and ad tech sector

The report lands at a moment when OpenAI is simultaneously facing pressure from multiple directions on privacy. OpenAI updated its U.S. privacy policy on April 30, 2026 to formalize advertiser data-sharing and user targeting arrangements - a set of structural changes that arrived as the company prepares for a public offering. The GDPR's AI training legal battle, documented by PPC Land in March 2026, maps 19 regulatory guidelines and enforcement actions across jurisdictions, revealing that surface consensus among data protection authorities masks deep disagreement on legal bases. European regulators clarified privacy rules for AI models in December 2024, establishing that AI models trained on personal data cannot automatically be considered anonymous. And Italy's Garante imposed a 15 million euro fine on OpenAI at the close of 2024, alongside mandated public awareness campaigns.

For marketing professionals relying on ChatGPT for campaign planning, competitive research, or content production, the Canadian report underscores two specific risks. Studies have already documented that one in five AI responses for PPC strategy contain inaccuracies, with ChatGPT generating incorrect answers 22% of the time in one WordStream test. The Canadian investigation now adds formal regulatory weight to the accuracy concern, finding that OpenAI deployed ChatGPT without having assessed the accuracy of its personal information outputs. Second, the finding on consent for data use in model training is directly relevant to enterprise users whose conversations with ChatGPT may contain client data, campaign strategies, or other sensitive business information.

OpenAI must also still turn over 20 million anonymized ChatGPT conversations to The New York Times under a federal court order from November 2025, adding to a picture of mounting legal exposure for the company's data practices.

The Offices stated their expectation that OpenAI will employ a privacy-by-design approach to refining its current product line and developing future services. According to the report, "while this report aimed at addressing and mitigating the risk to privacy associated with the development and deployment of LLMs, we recognize that this technology raises many other questions and challenges, including societal and ethical ones, which regulators, academics and courts around the world are currently trying to assess and address."

Timeline

  • November 2022 - OpenAI releases ChatGPT, powered by GPT-3.5, without a formal retention policy for personal data and without an accuracy assessment for model outputs
  • April 2023 - The Office of the Privacy Commissioner of Canada launches an investigation into OpenAI following a complaint about the use of personal information without consent
  • May 2023 - The OPC closes the initial complaint and initiates a broader investigation under PIPEDA; the CAI, OIPC-BC, and OIPC-AB join to form a joint investigation
  • August 2023 - OpenAI launches GPTBot, its publicly disclosed web crawler, giving website owners the first opportunity to opt out of scraping - after most GPT-3.5 and GPT-4 training data had already been collected
  • September 2023 - OpenAI announces the global Red Teaming Network, bringing in external experts across fields including privacy
  • November 2023 to May 2024 - The OPC conducts in-house testing of ChatGPT versions 3.5 and 4 in its technical laboratory
  • April 2024 - OpenAI decouples the training opt-out from the chat history setting, allowing users to opt out without losing their conversation history; also makes GPT-3.5 available without an account
  • December 2024 - European Data Protection Board clarifies privacy rules for AI models, finding they cannot automatically be considered anonymous
  • December 2024 - Italy's Garante issues final decision against OpenAI imposing a 15 million euro fine, as covered in the GDPR's AI training legal battle report on PPC Land
  • November 4, 2025 - OpenAI confirms it fixed the issue that prevented signed-out mobile web users from accessing the training opt-out
  • October 28, 2025 - OpenAI completes recapitalization, converting its for-profit entity into a public benefit corporation called OpenAI Group PBC
  • November 13, 2025 - Federal judge denies OpenAI's request to pause production of 20 million ChatGPT conversation logs to The New York Times
  • February 27, 2026 - OpenAI announces US $110 billion in new investment at a US $730 billion pre-money valuation
  • April 30, 2026 - OpenAI updates its U.S. privacy policy to formalize advertiser data sharing and user targeting through marketing partners
  • May 6, 2026 - Four Canadian privacy regulators publish Investigation Report 26-03 on OpenAI OpCo, LLC, finding breaches of federal and provincial privacy laws across consent, accuracy, transparency, access, retention, and accountability; OpenAI commits to a series of corrective measures

Summary

Who: OpenAI OpCo, LLC, the operating subsidiary of OpenAI Inc. headquartered in San Francisco, California; and four Canadian privacy regulators: the Office of the Privacy Commissioner of Canada, the Commission d'accès à l'information du Québec, the Office of the Information and Privacy Commissioner for British Columbia, and the Office of the Information and Privacy Commissioner of Alberta.

What: A joint investigation finding that OpenAI violated federal and provincial Canadian privacy laws across seven issue areas in connection with the development and deployment of ChatGPT, specifically its GPT-3.5 and GPT-4 models. Violations cover consent for data collected from public websites and user interactions, transparency about model practices, accuracy of personal information in outputs, individual rights to access and correct information, data retention policy gaps, and accountability failures. OpenAI committed to a range of corrective measures including transparency improvements, access format upgrades, retention controls, and quarterly compliance reporting. GPT-3.5 and GPT-4 have been deprecated.

When: The investigation was initiated in May 2023 and the final report was published on May 6, 2026, under citation 2026 BCIPC 41. The investigation covered practices relating to ChatGPT at the time of the inquiry's launch, focusing on GPT-3.5 and GPT-4.

Where: OpenAI's operations under investigation relate to its treatment of Canadian users across all provinces. The investigation was conducted by regulators in Ottawa, Gatineau, Quebec City, Victoria (British Columbia), and Edmonton (Alberta), with evidence gathering including interviews at OpenAI's San Francisco headquarters.

Why: The complaint originated from a Canadian user alleging OpenAI collected personal information without consent. Regulators expanded the inquiry because of the significant privacy impact of generative AI on all Canadians. The findings matter beyond Canada because they document, for the first time through a formal multi-jurisdictional investigation in North America, the specific technical and legal gaps that existed when a major AI company built one of the most widely used AI products in history - and provide a compliance benchmark against which future AI deployments will be measured.

Share this article
The link has been copied!