FTC Warns: Hashed data not anonymous, companies risk deceptive practice claims

The Federal Trade Commission (FTC) yesterday published a blog post titled No, hashing still doesn't make your data anonymous, reaffirming its stance on data anonymization and hashing practices. This announcement serves as a stark warning to companies that claim hashed personal information is anonymized. The FTC's message is clear: such claims may be considered deceptive, and the commission will take action against companies that misrepresent their data handling practices.

According to the FTC's blog post, the commission routinely evaluates companies' privacy representations against their actual data handling practices. When discrepancies arise, incorrect assertions about data identification are often to blame. The FTC emphasizes that data is only truly anonymous when it can never be associated back to an individual. If data can be used to uniquely identify or target a user, it can still cause harm to that person, regardless of the form it takes.

The concept of hashing, which is at the center of this warning, is a mathematical process used to transform data into a fixed-size string of characters. For example, a phone number like "123-456-7890" might be hashed into "2813448ce6316cb70b38fa29c8c64130". While the hashed version appears meaningless and difficult to reverse, the FTC argues that it still serves as a unique identifier that can be used to track individuals over time.

This warning is not new. The FTC references a 2012 blog post by former Chief Technologist Ed Felten, which first addressed the misconception that hashing renders data anonymous. Despite this long-standing guidance, some companies have continued to rely on hashing as a means of data protection, leading to several FTC enforcement actions over the years.

The timing of this renewed warning is significant, as it comes amid growing concerns about data privacy and the increasing sophistication of data tracking and analysis techniques. The advertising technology industry, in particular, has been using hashed identifiers as part of various user ID modules and audience matching processes. Popular platforms like Google and Amazon have their own hashing schemes for customer matching and audience targeting.

One of the key issues highlighted by the FTC is the persistent nature of hashed identifiers. Even though a hashed email address or phone number may not be immediately recognizable, it still creates a unique signature that can be used to track a person or device over time. This capability undermines claims of anonymity and raises significant privacy concerns.

The FTC's blog post cites several past enforcement actions to illustrate its point. In 2015, the commission brought a case against Nomi Technologies for surveilling consumers within stores using hashed MAC addresses. The complaint emphasized that while hashing obfuscated the MAC address, the result was still a persistent unique identifier.

Another case mentioned is the 2022 action against BetterHelp, an online counseling service. The FTC alleged that BetterHelp shared consumers' sensitive health data, including hashed email addresses, with Facebook. The complaint noted that Facebook could "undo the hashing" to reveal the email addresses, effectively negating any privacy protection the hashing was supposed to provide.

More recent cases have focused on other forms of persistent identifiers. In 2023, the FTC filed a complaint against Premom for collecting and sharing users' unique advertising and device identifiers with third parties. Similarly, in January 2024, the commission took action against InMarket for allegedly collecting data associated with unique mobile device identifiers without proper consent.

These cases underscore the FTC's position that the method of identification is less important than the fact that users can be identified and tracked. Whether through hashed email addresses, advertising IDs, or device identifiers, the ability to persistently recognize and track individuals raises significant privacy concerns.

The technical aspects of hashing are worth exploring to understand why the FTC takes this stance. Hashing is a one-way function, meaning it's computationally infeasible to reverse the process and obtain the original input from the hash. However, this irreversibility doesn't prevent the hash from being used as an identifier. If a company consistently hashes the same piece of information (like an email address), it will always produce the same hash. This consistency allows for tracking and correlation of user activities across time and platforms.

Moreover, while it's true that a single hash cannot easily be reversed to reveal the original data, there are techniques that can be used to deduce the input, especially for common types of data like email addresses or phone numbers. One such method is a "rainbow table," which is a precomputed table of hashes for all possible inputs within a certain range. For example, a rainbow table could be created for all possible email addresses following common patterns, allowing for quick lookup of a hash to find the corresponding email address.

The advertising technology industry has been particularly reliant on hashed identifiers as a way to balance personalized advertising with privacy concerns. Many ad tech companies have implemented user ID modules that use hashed email addresses or other personal information as a basis for tracking and targeting. The FTC's warning suggests that these practices may come under increased scrutiny.

For consumers, the FTC's stance highlights the complexity of data privacy in the digital age. Many people might assume that if their personal information is "hashed" or otherwise transformed, it's no longer linked to them. The FTC's message is clear: this is not the case. Consumers should be aware that even data that doesn't look like personal information can still be used to track and target them.

The implications of this warning for the tech industry are significant. Companies that have relied on hashing as a privacy-preserving measure may need to reevaluate their practices and the claims they make about data anonymization. This could lead to substantial changes in how user data is handled, stored, and shared across the digital ecosystem.

The FTC's blog post also raises questions about the future of targeted advertising and personalized services. If hashed identifiers are not considered anonymous, what alternatives are available for companies that want to provide personalized experiences while respecting user privacy? This challenge may spur innovation in privacy-preserving technologies and could potentially accelerate the development of truly anonymous data handling methods.

From a regulatory perspective, the FTC's warning signals a continued focus on data privacy and the potential for increased enforcement actions. Companies that handle user data will need to be increasingly cautious about the claims they make regarding data anonymization and privacy protection.

The issue of data anonymization extends beyond just the tech industry. As more sectors of the economy become data-driven, from healthcare to finance to retail, the ability to properly anonymize data while maintaining its utility becomes increasingly important. The FTC's stance on hashing may have ripple effects across various industries that rely on data analysis and sharing.

In conclusion, the FTC's warning about hashed data and anonymity serves as a crucial reminder of the complexities surrounding data privacy in the digital age. As technology continues to evolve, so too must our understanding of what constitutes truly anonymous data. Companies handling user information must be vigilant in their practices and honest in their representations, or risk facing regulatory action. For consumers, this warning underscores the importance of understanding how their data is being used and the limitations of current privacy-preserving techniques. As the digital landscape continues to evolve, the balance between data utility and privacy protection remains a critical challenge for technologists, policymakers, and consumers alike.

Key facts

The FTC issued a warning on July 24, 2024, stating that hashing does not make data anonymous.

The warning reaffirms a stance first taken by the FTC in 2012.

Hashing is a mathematical process that transforms data into a fixed-size string of characters.

Even hashed data can serve as a unique identifier to track individuals over time.

The FTC has taken enforcement actions against companies for misrepresenting hashed data as anonymous.

Recent cases have focused on various forms of persistent identifiers, including advertising IDs and device identifiers.

The advertising technology industry has been particularly reliant on hashed identifiers for user tracking and targeting.

The FTC's warning signals potential increased scrutiny of data handling practices across industries.

Companies may need to reevaluate their data anonymization claims and practices in light of this warning.

The issue of data anonymization extends beyond the tech industry and affects various sectors of the economy.