Businesses understand that they must embed data protection at every stage of the data processing journey to comply with data protection laws. As you have explored how to implement this in your business, you may have encountered data hashing and wondered whether it could deliver the holy grail— anonymized data.
Privacy laws do not usually cover anonymized data, making it an appealing way to process data without being affected by privacy legislation. The question is, does data hashing truly deliver anonymized data?
To answer that, we need to explore how data hashing works and how it fits into the complex world of privacy law-compliant data processing.
Our Privacy Policy Generator makes it easy to create a Privacy Policy for your business. Just follow these steps:
-
At Step 1, select the Website option or App option or both.
-
Answer some questions about your website or app.
-
Answer some questions about your business.
-
Enter the email address where you'd like the Privacy Policy delivered and click "Generate."
You'll be able to instantly access and download your new Privacy Policy.
- 1. Defining Data Hashing
- 1.1. How hashing is used
- 1.2. Is hashed data truly anonymous?
- 2. How Data Hashing Works
- 2.1. Defining salting and peppering
- 2.2. The major limitation of data hashing
- 2.3. False sense of security
- 2.4. How hashing compares with data encryption
- 3. Hashing and Data Privacy Laws
- 3.1. GDPR Perspective
- 3.2. CCPA Perspective
- 3.3. FTC Perspective
- 4. Complying with Data Privacy Regulations
- 4.1. Implement robust data protection measures
- 4.2. Security assessments
- 4.3. Privacy Policy
- 5. Summary
Defining Data Hashing
At its most basic level, data hashing involves taking a piece of information, such as a name, phone number, or Social Security number, and processing it using a hash function. A hash function is a mathematical function that takes a piece of data of any length and turns it into a fixed-length string of numbers and letters.
A key feature of data hashing is that the same input will always result in the same output. So, a small variation in the original data, such as a slightly different spelling, would result in a completely different string of characters.
A hashed dataset uses a fixed amount of storage, making it an efficient way of storing large amounts of data that is easy to verify. As the definition from the Cybersecurity & Infrastructure Security Agency (CISA) below shows, hashing is also a one-way function. It's impossible to reverse the hashing to produce the original data-although this does not mean hashed data is uncrackable:
How hashing is used
Hashing can be used whenever you need to encode sensitive data and store it in a form that is very difficult to decode. Two common use cases that impact compliance with data protection laws include:
- Password storage: Passwords must be encoded in some format to make them more secure. As data hashing always produces the same output when data is entered, when a user enters their password, it can be hashed and compared with the stored hashed code on the databases. If it matches, the password is validated, and the user can gain access to the restricted area.
- Digital signatures: As seen in the excerpt below, also from the CISA, data hashing can be used to create digital signatures. Digital signatures are used to validate the authenticity of documents, messages, and digital transactions. The original message is hashed, and the hash is encrypted using the sender’s private key. The recipient then decrypts the digital signature and uses the hash value to validate the message. The digital signature uses hashing to prove the message has not been altered, as an altered message would not return the same hash value.
Is hashed data truly anonymous?
In a word, no. And this has big implications for data privacy.
As shown in the excerpt below, the Federal Trade Commission (FTC) has long maintained a firm stance that hashing data does not make it anonymous. While the FTC acknowledges that it is very hard to guess the original data from a hash, the data is obscured and not anonymous:
In a July 2024 post, the FTC specifically called out companies that "claim that hashing allows them to preserve user privacy." While hashing can be part of the solution, it is not a silver bullet.
How Data Hashing Works
As mentioned earlier, data hashing uses an algorithm to transform meaningful data into a string of meaningless characters. As the excerpt from Science Direct below shows, there are several common algorithms that have different use cases:
Flaws have been discovered in LANMAN and MD-5, which means they are no longer as common. SHA-2, a suite of algorithms that succeeded SHA-1, are now commonly used. SHA-256 is now one of the most commonly used data-hashing algorithms in the world. As NordVPN shows below, that's because it is so difficult to reverse engineer the original input from the hashed output;
Defining salting and peppering
Salting takes hashing to the next level. It involves adding random data (salt) to the input before it is hashed to make it harder to crack. When data is salted, even if many users have the same password, each hash would be unique.
Peppering takes this a step further. A secret value (pepper) is added to the original data before it is hashed. The pepper is stored separately, acting as a key needed to rehash and verify the data the next time it is inputted.
Salting and peppering can strengthen hashed data against attack, but they do not eliminate risk entirely.
The major limitation of data hashing
The fundamental limitation of data hashing is that it does not make data anonymous. Anonymized data cannot be traced back to the individual-it is impossible to decode. Yet, it is possible to identify someone using hashed data if you have access to the original inputs because the output will always be the same.
The FTC highlights the example of Social Security Numbers (SSNs). While each person's social security number is unique because they are made up of nine digits, there is a finite number of potential social security numbers-1 billion possible combinations, to be precise. This means that hashed data is susceptible to a brute-force attack.
In this context, a brute force attack is an attempt by a bad actor to decode hashed information by running all the possible options through a hashing algorithm and looking for matches. Rainbow tables-pre-made sets of scrambled data used by hackers to guess hashed passwords-are another way hashed data can be compromised.
Returning to the Social Security number example, someone could run all 1 billion potential SSNs through a hashing algorithm and look for matches to decode your hashed list. Think that would take a long time? As shown below, the FTC warned it would take your laptop less time to crack the list than it takes you to make a cup of coffee. And that was back in 2012:
So, data hashing is not the solution to data privacy. Any business claiming it is could find itself in serious trouble.
False sense of security
One of the biggest dangers of data hashing is that many people misunderstand it. Many believe that hashing anonymizes data and makes it impossible to trace back to individuals. It does not, and whenever data can be linked to a private individual, it could be used for criminal purposes.
This false sense of security could cause businesses to become lax in other aspects of data protection, even feeling that certain data privacy laws do not apply to them.
How hashing compares with data encryption
As we have seen, hashing scrambles data, so you cannot unscramble it to find the original input. The goal is not to decode the data but to prove that the new input matches the hashed data and has not been changed. It allows customers to store large amounts of personal data without having access to it themselves.
Encryption also scrambles data, but unlike hashing, it can be decoded with a key. The end user uses the key to convert the data back to its original form. So, while data hashing is useful for protecting stored data, encryption is more useful for storing data that is on the move.
Let's take a password as an example. If you want to send a password and allow a user at the other end to see it, it will need to be encrypted. But if you just want to store a password so you can check it matches the one your customer just entered into your website, hashing is the way to go.
Both data hashing and encryption are crucial for compliance with data privacy laws. However, neither is a foolproof solution in its own right.
Hashing and Data Privacy Laws
Hashing can be a strong protective measure for personal data, but it does not fully anonymize it. However, different data privacy laws may interpret hashed data differently. Let's explore how key data privacy laws address hashing.
GDPR Perspective
The GDPR is a comprehensive data privacy law that applies to EU countries, plus Norway, Iceland, and Liechtenstein. The UK has also adopted the GDPR since its exit from the European Union. Crucially, its requirements apply to any business that holds data about citizens or residents in any of these countries, regardless of where the company is based.
Does the GDPR view hashed data as personal data? The excerpt below from the EU's Article 29 Working Party's "Opinion 05/2014 on Anonymisation Techniques" shows that hashing pseudonymizes data but does not anonymize it.
Pseudonymizing is the process of replacing personal data with fictional data. While this makes the data more secure, pseudonymized data is still vulnerable to attack and compromise. This is true even when the hashed data is salted:
Therefore, under the GDPR, hashed personal data is still considered personal data. This makes it subject to the same laws as any other personal data.
However, this does not mean that data hashing is not important for GDPR compliance.
The takeaway? Data hashing can help your business comply with the GDPR, but it does not reduce your legal obligations.
CCPA Perspective
The California Consumer Privacy Act (CCPA) is a state privacy law that was enhanced in 2023 with the addition of the California Privacy Rights Act (CPRA).
As the excerpt from the Office of the California Attorney General shows below, CCPA applies to any business that collects personal information about California residents, no matter where the business is based, as long as it meets one of the following criteria:
Under the CCPA, an input is no longer considered personal information if it has been deidentified. Note the CCPA's definition of deidentified data below:
Can hashed data be classified as deidentified under the CCPA? Possibly, if your business takes the following steps to ensure it is not reidentified:
- Technical safeguards: Measures could include salting or choosing the strongest possible algorithms to make the data more difficult to unscramble.
- Internal policies: Prohibit reidentifying hashed data and prevent employees from disclosing hashed values or attempting to reidentify them.
So, if your business targets California customers, you must ensure your data collection and processing meets the standards set out in the CCPA.
FTC Perspective
The Federal Trade Commission (FTC) is responsible for enforcing data privacy regulations in the United States. It takes action against businesses that do not adequately protect consumers' data or are deceptive about their use of handling of it.
The FTC's long-held stance is that hashed data is not anonymous. It states, "Companies should not act or claim as if hashing personal information renders it anonymized." The FTC is committed to taking action against companies that make deceptive claims about the privacy of user data.
This has resulted in high-profile cases, such as the 2022 case against BetterHelp, an online counseling service. The FTC alleged that BetterHealth had disclosed customers' personal data, including email addresses and health questionnaire information, to Facebook and other companies for advertising purposes.
This proved to be an expensive mistake, as can be seen in the FTC's order below:
Clearly, no business wants to suffer the financial and reputational damage that comes from breaking data privacy laws. Let's look at how your business can safeguard itself.
Complying with Data Privacy Regulations
Data hashing can form an important element of your business's compliance with data privacy regulations, but it is just one piece of the puzzle. Here are a few steps every business must take.
Implement robust data protection measures
Most data privacy regulations require that businesses collect the minimum data needed to provide their services. From the moment you collect and store personal data, it must be protected. In addition to hashing, options include:
- End-to-end encryption
- Tokenization
- Access controls, such as multi-factor authentication
Security assessments
No business can afford to be complacent. Privacy laws are constantly evolving, and hackers are using more sophisticated techniques to attempt to access personal data. As threats emerge, ensure your security protocols are updated to respond to the challenge.
Privacy Policy
Every business needs a robust Privacy Policy, which sets out the data it will collect and how it will process and store it. It must also clearly show how customers can access their data, withdraw consent for your business to use it, and delete or anonymize such data.
It is crucial your Privacy Policy does not make inaccurate claims about data security. Create a transparent policy that outlines the data security protocols you use and the administrative and physical safeguards you have in place to protect personal data.
Summary
Hashed data-especially data hashed with salting and peppering-is a useful tool business can use to keep personal data private. However, it does not anonymize data or make it immune to attack.
According to many data privacy laws, hashed data is still identifiable, personal data. Therefore, every business needs to implement appropriate safeguards to protect hashed data, such as robust Privacy Policies and additional technical measures, to meet its legal obligations.
The first step to compliance: A Privacy Policy.
Stay compliant with our agreements, policies, and consent banners — everything you need, all in one place.