Last updated on 01 July 2022 by William Blesch (Legal and data protection research writer at TermsFeed)
Europe's General Data Protection Regulation (GDPR)'s Anonymization and the California Consumer Protection Act (CCPA)'s de-identification requirements are both ways to protect the privacy of data subjects.
De-identification is a process that can be used in the U.S. for compliance with the CCPA. In contrast, GDPR anonymization is used as an alternative to CCPA de-identification in Europe for compliance with GDPR regulations.
The two processes are comparable, but have some crucial differences in protecting personal information from disclosure by limiting access and use of identified or identifiable information.
The CCPA defines "de-identified information" as:
"Data, which cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer."
This means the personal identifiers have been removed with the intent that they will not be associated with a specific individual again. If a business uses de-identified information, it must take four organizational and operational steps to ensure that data is neither reidentified nor distributed.
On the other hand, the GDPR's concept of anonymization is stricter than the CCPA's de-identification requirement since the GDPR demands that an individual's identifiable information be "irreversibly prevent[ed]" from being used.
At the same time, the CCPA only compels businesses to "reasonably" remove identifying data.
Another clear difference is that under the CCPA, aggregated data also cannot "reasonably" be linked to an individual or small group while the GDPR requires "pseudonymization," which results in a longer list of information that businesses must irreversibly prevent from being connected with specific individuals.
The article below will discuss both the GDPR's anonymization requirements and the CCPA's de-identification demands. It will then go into what business owners must do to satisfy both.
"...information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable."
The process of removing indirect and direct personal identifiers that could lead to someone being identified is called "anonymization."
The following kinds of information are all considered direct identifiers under the GDPR:
Businesses and third parties can use indirect identifiers under the GDPR together with other sources of information to identify an individual. They can be (but aren't limited to) things like:
Once the information has been completely anonymized, it no longer falls under the requirements of the GDPR. It, therefore, becomes data that is much easier for businesses to use.
What customers say about TermsFeed:
This really is the most incredible service that most website owners should consider using.
Easy to generate custom policies in minutes & having the peace of mind & protection these policies can offer is priceless. Will definitely recommend it to others. Thank you.
- Bluesky's review for TermsFeed. Read all our testimonials here.
With TermsFeed, you can generate:
However, some feel that anonymized data is not as valuable and is no longer useful for specific purposes. If your business operates in the EEA or EU, you should carefully consider what you'll be using data for and whether anonymization is worth your organization's time and effort.
You can find more information and guidance on anonymization techniques through the U.K.'s Information Commissioner's Office (ICO).
The CCPA regards information, which cannot reasonably identify a specific consumer as de-identified information. The caveat is that the organization must have implemented business processes and technical safeguards that will prevent its re-identification.
Additionally, the business must also have implemented processes to prevent reidentified data from being disseminated. Finally, employees of the organization are prohibited from attempting to re-identify that information.
Your business must do the following to de-identify data:
Data in which individuals can be identified is extremely valuable. Yet, it can be easily misused and abused if it falls into the wrong hands.
Therefore, protecting that data through anonymization or de-identification is a must in today's world. This is even more true because data breaches are more common than ever.
Companies that do business in the European Union (EU) or European Economic Area (EEA) and the United States of America must have a process wherein data is rendered useless to information thieves if a breach occurs.
Moreover, business owners should consider that there are heavy financial penalties if a regulatory body finds that they were grossly negligent during a breach. Therefore, removing data points or adding noise to a dataset so that it cannot be associated with an individual is vital.
These methods include deletion, generalization, encryption, data masking, pseudonymization, and others. Keep in mind, however, that ways of re-identification already exist. Much anonymized data can be unraveled and decoded given the correct skillset, time, and tools.
In fact, as time goes on, anonymity and de-identification requirements under the GDPR and CCPA will likely have to be updated as technologies for re-identification improve.
For example, back in 2007, there was significant controversy over Google Maps' ability to identify faces from street-level images, even if Google blurred out individuals' bodies. Actions on the part of tech giants like Google are just one reason requirements for data anonymization and de-identification were written into law in the first place.
However, since then, re-identification technology has vastly improved. Many are now fearful of the effects artificial intelligence (A.I.) could have on the ability of some to re-identify anonymized data. Indeed, with the aid of A.I., it's thought that re-identification could occur with relatively little effort.
First, anonymization helps protect your business against a potential loss of trust and market share. Consumers want to know that their data is safe in your hands. By anonymizing and de-identifying it, you can assure all that your company understands and enforces its duty to secure highly sensitive, confidential information against theft and misuse.
Secondly, de-identification and anonymization can help prevent insider exploitation of data. Again, this goes to the heart of maintaining the public's trust in your organization.
To prevent all parties from singling out a specific person within a dataset, inferring any information from that dataset, or linking two or more records within that set, you need an effective anonymization solution.
As suggested previously, a few of these methods are encryption, generalization, data masking, and deletion.
Below are six anonymization solutions you should be aware of and understand.
Data masking is the process of manipulating a dataset so that all personal identifiers are replaced with general values. These should be unique for every record, but they shouldn't have any specific significance to an individual or group within your population set.
An example will help illustrate: If you were looking at data about college students (name, gender, address, telephone number, etc.) and wanted to protect the privacy of your participants, you would replace all those personal identifiers with generic examples.
In this case:
Pseudonymization is an approach to data protection similar to masking that replaces private identifiers with pseudonyms (a.k.a., fake names), and it's most often used in conjunction with encryption techniques like hashing, tokenizing, or masking the original content so it can't be identified by its true name.
A pseudonymized dataset might contain fields such as gender (m/f), Admissions Date (MM-DD-YYYY), Admitted Credential Level ('Undergraduate,' 'Graduate,' etc.), Admitted Degree Plan Type ('Bachelors,' 'Masters,' etc.).
If desired, a third party could create a database to match records belonging together based on overlapping attributes.
There are many different types of generalization techniques. Generalization is the process by which personal identifiers, including names and other information that someone could use to identify a person (e.g., Social Security Number), are removed from data while still preserving its relevance for an analysis or research purpose.
It can also refer to aggregation techniques such as summarizing or averaging values in a dataset. The term anonymizer may sometimes be used specifically with reference to algorithms intended for use on large datasets where some records might contain sensitive personal information, but not all do.
Data swapping is a technique used to swap sensitive data, such as names and social security numbers, with pseudonyms or randomized values. They are the central components of anonymization techniques that help protect against inadvertent disclosure without sacrificing accuracy.
Data swaps can also be used for other purposes, including fraud detection, risk management prevention, compliance enforcement, and more.
For example: If your company has two databases; one containing information about customers who have recently applied for credit cards and another containing information on all employees in the organization (including salary details), you might want to use data swapping when accessing both sets of data so there's no chance an employee could inadvertently pull up their own personnel records if they were looking at customer profiles.
Data perturbation is the process of adding noise or other data to a dataset so that it can't be uniquely identified.
This way, even if your company was breached and an intruder got their hands on all the information you store about customers (names, addresses, income levels) as well as employees' salaries and phone numbers, there would still be no easy way for them to find out who anyone is.
Synthetic data is created from a range of different variables such as age, income levels, gender, and other characteristics. This can be combined with the real-world data about your customers to create combinations that don't exist in reality.
As part of the re-identification risk assessment, you should conduct a motivated intruder test. This is a type of penetration test where the attacker knows that they are not permitted to access certain areas or information.
When setting up the test, ask yourself if a motivated intruder might successfully reidentify anonymized information. For example, would it be possible for an unauthorized person to gain access to data by finding out which pieces have been anonymized and then matching those with public records?
When conducting the test, ensure that the tester (the person simulating the actions of a hacker or data thief) is someone who:
The idea is to simulate a typical breach attempt by someone motivated to re-identify information, but who is not a professional in that regard.
You will need to create boundary rules around what can be accessed through your API in order to prevent others from gaining too much insight into your business. You should also perform motivated intruder tests at regular intervals (e.g., every six months) if you plan on using synthetic data rather than real-world data in training models.
Adopting a governance structure will help you ensure your organization has the appropriate policies and procedures to protect data privacy at all times, including de-identifying sensitive customer records before storing them in databases or transmitting them over networks.
To maintain awareness of any changes made as part of the anonymization process, it is advisable to assign responsibility for reviewing such changes within your business.
You may accomplish this through the following steps:
The standard for anonymization under the GDPR is not quite the same as the standards for the CCPA's de-identification. However, the two are remarkably similar.
As with most privacy requirements, the GDPR is more strict and demands that any information which is anonymized becomes so, irreversibly.
In contrast, the CCPA demands that data only be "reasonably" de-identified.
With that said, advantages of anonymizing or de-identifying data include protection against the loss of market share or trust, safeguarding against misuse by insiders as well as outside parties, and increasing compliance with standards to ensure data security.
This article is not a substitute for professional legal advice. This article does not create an attorney-client relationship, nor is it a solicitation to offer legal advice.
01 July 2022