Data

European Data Protection Board scrutinizes data practices of Large Language Model ChatGPT

The European Data Protection Board (EDPB) this week published a report by its ChatGPT taskforce. This taskforce was established to investigate the data collection and processing practices of ChatGPT, a large language model (LLM) chatbot developed by OpenAI.

Luis Rijo

May 26, 2024 • 2 min read

Large Language Model

edpb_20240523_report_chatgpt_taskforce_en

edpb_20240523_report_chatgpt_taskforce_en.pdf

343 KB

LLMs are a type of artificial intelligence (AI) that are trained on massive amounts of text data. This allows them to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. However, the vast amount of data required to train these models raises concerns about data privacy and user rights.

The EDPB report focuses on several key areas regarding data protection and ChatGPT:

Data Collection Practices: The report examines the legality of how OpenAI collects data to train ChatGPT, specifically focusing on a technique called "web scraping." Web scraping involves automatically extracting data from websites. The EDPB is concerned about whether this practice complies with data protection regulations, especially when it involves personal data.

Transparency and User Rights: The EDPB emphasizes the importance of transparency for users interacting with ChatGPT. The report highlights the need for users to understand the probabilistic nature of the chatbot's responses and the potential for bias or factual inaccuracies in the generated text.

Data Security and Accuracy: The EDPB stresses the importance of data security measures to protect the data used to train ChatGPT from unauthorized access or breaches. Additionally, the report emphasizes the need for measures to ensure the accuracy of the data used to train the model, as this can directly impact the quality and reliability of the outputs generated by ChatGPT.

Exercising User Rights: The EDPB underscores the importance of users being able to effectively exercise their data protection rights regarding the data used to train ChatGPT. This includes rights to access, rectify, erase, and restrict processing of their data. The report includes a sample questionnaire developed by the taskforce as a potential tool for data protection authorities (DPAs) to gather information from OpenAI.

It's important to note that the EDPB report is part of an ongoing investigation into ChatGPT. The report itself does not make any conclusive determinations about legal compliance. Instead, it highlights potential areas of concern and suggests avenues for further investigation by individual DPAs within the European Union.

Impact on the development and use of LLMs

The EDPB's report on ChatGPT is a significant development for the field of large language models. It signals the growing focus of data protection authorities on the data practices of AI developers.

The findings of the report could potentially impact the development and use of LLMs in several ways. Developers may need to implement more transparent data collection practices and ensure compliance with data protection regulations. Additionally, stronger data security measures and tools for users to exercise their data rights may become standard practice in the LLM industry.

The EDPB report is just one example of how data protection authorities are grappling with the challenges posed by new and emerging technologies like LLMs. As AI continues to develop, it is likely that data protection regulations will need to evolve to address the specific risks associated with these technologies.