Large Language Model sparks privacy concerns in the EU

The privacy rights advocacy group noyb has filed a complaint against OpenAI, the creators of the large language model ChatGPT, with the Austrian Data Protection Authority (DPA). The complaint centers on concerns regarding the model's potential to generate inaccurate information about individuals and OpenAI's apparent inability to address these inaccuracies.

ChatGPT, launched in November 2022, gained significant traction as a chatbot capable of responding to user prompts. However, noyb highlights a key concern: OpenAI itself acknowledges that ChatGPT primarily generates responses by predicting the statistically most likely subsequent words based on its training data. This raises questions about the model's ability to consistently provide factually accurate information. In essence, noyb argues that generative AI like ChatGPT is prone to "hallucination," meaning it can fabricate answers without verification.

The potential for inaccurate information about individuals is particularly problematic within the context of the European Union's General Data Protection Regulation (GDPR). The GDPR guarantees individuals the right to accurate personal data and the ability to rectify or erase inaccurate information (Article 16). Additionally, Article 15 grants individuals the right to access data companies hold about them and understand its source.

OpenAI Complaint_EN_redacted

OpenAI Complaint_EN_redacted.pdf

745 KB

Maartje de Graaf, a data protection lawyer at noyb, emphasizes the seriousness of ChatGPT potentially generating false information about individuals. She highlights the conflict between current technology limitations and legal requirements. de Graaf argues that technology must adapt to legal frameworks, not vice versa, and that systems incapable of generating accurate and transparent results should not be used to process personal data.

The complaint filed by noyb uses a public figure as a case study. When prompted for this individual's birthday, ChatGPT consistently provided incorrect information instead of acknowledging a lack of data. OpenAI reportedly refused to rectify or erase the inaccurate data, citing an inability to modify information within the model. While OpenAI offers the option to filter or block data for specific prompts (such as the complainant's name), this approach could inadvertently prevent ChatGPT from displaying any information about the individual at all.

The complaint further criticizes OpenAI's alleged shortcomings in responding to access requests under the GDPR. OpenAI reportedly failed to disclose details regarding the complainant's data, its sources, or recipients when presented with an access request. de Graaf emphasizes the legal obligation for companies to comply with access requests and maintain records of training data to facilitate source identification.

The rise of ChatGPT has triggered increased scrutiny from European privacy watchdogs. The Italian Data Protection Authority (DPA) addressed concerns regarding the model's accuracy by temporarily restricting data processing in March 2023. Furthermore, the European Data Protection Board (EDPB) established a task force specifically dedicated to ChatGPT to foster coordinated efforts across national authorities.

noyb's complaint urges the Austrian DPA to investigate OpenAI's data processing practices and the measures in place to ensure accurate personal data within its large language models. The complaint additionally requests that the DPA compel OpenAI to comply with the access request submitted by the complainant and ensure alignment with GDPR regulations. Finally, noyb seeks a potential fine from the DPA to incentivize future compliance. Given the EU framework for data protection cooperation, the case is likely to involve collaboration between various data protection authorities.

This case raises significant questions regarding the development and deployment of large language models within the legal landscape of the EU. The outcome of the complaint could influence how developers address data privacy concerns and ensure the responsible use of generative AI technologies.