Principles of the GDPR for Developers

We've mentioned the principles of the GDPR a few times throughout the preceding chapters. It's important to recognize that these principles aren't abstract philosophical notions - they are directly applicable to your operations as a developer.

Illustration: Principles of the GDPR for Developers

Lawfulness, Fairness, and Transparency

The principle of "lawfulness, fairness, and transparency" requires that you:

  • Always comply with the GDPR and any other applicable laws
  • Process personal data in a way that people would reasonably expect
  • Always be honest about your activities, and provide as much information as people need

Some practical steps you can take towards complying with this principle include:

  • Creating a Privacy Policy
  • Identifying and demonstrating your legal basis for every act of data processing
  • Making sure you're complying with specific rules around particular types of data and activities

Creating a Privacy Policy

Creating a Privacy Policy is essential wherever you're acting as a data controller.

If you've developed an app or website, you're almost certainly going to be acting as a data controller in respect of anyone using that app or website, even if your company's primary role is as a data processor for other companies.

Your Privacy Policy will be totally unique to your circumstances, but there are some mandatory sections that every Privacy Policy must cover.

As we work through this chapter, we'll be seeing how these mandatory Privacy Policy requirements tie in with the principles of the GDPR.

We've looked closely at the two legal bases that are probably going to be most commonly relied upon for developers: Consent and legitimate interests.

Consent and legitimate interests are the most relevant legal bases for operations that depend on advertising. If your app, software or service operates under a "paid" model - either a one-off payment or a subscription, you may need to rely on the legal basis of "contract."

In any case, complying with the principle of fairness, lawfulness and transparency means determining your legal basis for every act of processing of personal data that you do, and demonstrating that you've done this in your Privacy Policy.

There are two approaches to providing this information in your Privacy Policy.

Firstly, there is a broad approach of simply listing the various legal bases on which you rely.

Here's an example of an approach you can take:

Generic Privacy Policy: Legal basis clause

Secondly, there is a more detailed approach, wherein the ways in which you process personal data are listed alongside the legal basis that underpins this.

Here's an example from Shelter:

Shelter Privacy Policy: Legal basis for processing personal data clause excerpt

Shelter then goes into more specific detail about its legitimate interests in processing personal data:

Shelter Privacy Policy: Legal basis for processing personal data clause excerpt-2

Obeying Specific Rules and Other Data Protection Laws

The principle of lawfulness, fairness, and transparency requires that you obey all relevant rules and laws when processing personal data.

This book is all about obeying the law. The provisions of the GDPR are all important, and failing to comply with any one of them could result in serious problems. And you must also be aware of other relevant laws that may impose rules over and above those set out in the GDPR.

For example, under the GDPR, you may only process health data in accordance with special rules set out in Article 9 of the GDPR.

When processing health data, you may also need to obey other laws, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States.

And under Article 8 of the GDPR, there are special rules around processing the personal data of children in order to offer or provide them with online services. You must get the consent of their parent or guardian, and you must also take "reasonable steps" to confirm it was their parent or guardian that consented.

And in this context, you may also need to obey other laws, such as the ones noted earlier in the book:

  • The Colorado Privacy Act (CPA)
  • The Virginia Consumer Data Protection Act (VCDPA)
  • The California Online Privacy Protection Act (CalOPPA)
  • The California Consumer Privacy Act (CCPA) and its amendments known as the California Privacy Rights Act (CPRA).
  • Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada
  • The Enhancing Privacy Protection Act (Privacy Act) in Australia
  • Several Southeast Asian countries
  • The Children's Online Privacy Protection Act (COPPA) in the United States

These are just some of the many rules that may or may not apply to your project. Reading this book means that you've made a great start towards legal compliance. But you should also read the GDPR itself, and be aware of other local laws.

Purpose Limitation

The principle of "purpose limitation" means that you only process personal data when you have a specific purpose for doing so.

Some practical steps you can take towards complying with this principle include:

  • Reviewing all your data processing methods and determining your purpose for each one
  • Demonstrating your purposes to your users wherever you collect their personal data
  • Ensuring that you don't collect personal data for one purpose and then use it for incompatible further purposes

Reviewing Your Purposes

An essential part of GDPR compliance involves becoming aware of what personal data you're processing. But in addition to what you're processing, you also need to consider why you're processing it.

You can apply the principle of purpose limitation to each method of personal data processing you engage in.

Consider:

  • Why do you need to process personal data in this way?
  • Is this a good enough purpose to justify the processing?
  • How do your users know what your purposes are?

Demonstrating Your Purposes

There are three key ways you can go about demonstrating your purposes to your users, and your Data Protection Authority (if you ever need to do so).

You should disclose your purposes in:

  1. Your Privacy Policy
  2. Information you provide when collecting personal data
  3. Your data processing records

When collecting personal data, it's important to tell your users about the purpose for which you require it. If people understand why you need to collect their personal data, they'll be more likely to provide it.

This is evident in the context of mobile app development. Users are more likely to grant permission for an app to access their personal data if they know that the app requires such access for a good reason.

This was confirmed in a 2014 study by Lin et al, which revealed that:

"a user's willingness to grant a given permission to a given mobile app is strongly influenced by the purpose associated with such a permission."

In iOS, developers can add a purpose string to accompany their permission requests. This is explained by Apple in its guidance for developers:

Apple Developer Guide: Requesting Access to Protected Resources - Provide a purpose string section

Avoiding "Purpose Creep"

Where personal data is collected for a specific purpose, it must not be used for any further purposes that are "incompatible" with the purposes for which you collected it.

There are some types of "further processing" that will be considered compatible by default:

  • Archiving in the public interest
  • Scientific or historical research
  • Certain statistical purposes

Recital 50 of the GDPR suggests that the following factors should be taken into account when assessing whether further processing will be compatible with your original purpose:

  • The nature of the personal data
  • The data subject's reasonable expectations
  • Any safeguards you can take to protect privacy

Data Minimization

The principle of "data minimization" requires that you only collect the personal data that you actually need - no more, no less.

Web and software developers are at the forefront of applying this principle. Systems developed to collect and process personal data must have data minimization built into their functioning.

Developers who are integrating such systems into their websites and software must ensure that the optional collection of personal data is "off" by default.

Some practical steps you can take towards complying with this principle include:

  • Never asking your users to provide personal data unless you need it
  • Ensuring that you are not collecting personal data in server logs (unless you need to)
  • Ensuring that you are not unnecessarily harvesting personal data via web analytics

Minimizing Data Collection in Web Forms

Most of the personal data you collect is probably going to be provided directly by your users. You must only ask for what you need. This ties in closely with the principle of purpose limitation. "Necessary" can be interpreted quite broadly, to mean necessary to achieve a specific purpose.

Let's look at an example. Ecommerce company The Glass Box Co provides a monthly newsletter. Here's the sign-up form:

The Glass Box Co Newsletter sign up form

Users are asked to provide not only their email address, but also their first and last name, and date of birth.

This seems a little unnecessary at first. Is a date of birth, or even a name, really required to provide someone with a newsletter?

Helpfully, The Glass Box Co does provide accompanying information about why it needs this personal data, directly below where the information is requested:

The Glass Box Co Newsletter sign up form - Your Data section

A date of birth, then, is used to offer newsletter subscribers special deals around their birthday. This could, in theory, represent a compelling reason to request this data, if appropriate safeguards are employed to keep it safe.

Minimizing Data Collection in Server Logs

Your server logs must not be a repository for personal data.

In theory, log files can contain just about anything, including personal data. However, any personal data you hold should be limited, well-organized and secure. This means that server logs are not a good means of collecting or storing it.

One way to end up with personal data in your logs is by collecting data such as IP addresses by default. There is often little need to collect the IP address of everyone who visits your site.

If you do need to collect your visitors' IP addresses, you probably don't need to collect the whole thing.

Writing for the Internet Engineering Task Force, Amelia Andersdotter recommends that providers of Internet-facing servers should:

"keep only the first two octets (of an IPv4 address) or the first three octets (of an IPv6 address) with remaining octets set to zero, when logging."

This would result in the following:

IPv4 address IPv6 address
69.29.31.236 A624:71D3:2C80:EE02:0029:EC2A:002B:EB73
69.29.00.000 A624:71D3:2C80:0000:0000:0000:0000:0000

Andersotter also recommends that you should not:

"log unnecessary identifiers, such as source port number, time stamps, transport protocol numbers or destination port numbers."

All of these types of data could qualify as personal data in certain circumstances. Unless you have a good reason with a suitable legal basis, do not log them.

Minimizing Data Collection in Analytics

It's important to understand that personal data is collected by web analytics. Any information about a person's activities on a website could constitute personal data if it can theoretically be linked to them.

As with log data, one way to minimize the amount of personal data collected by analytics software is to anonymize IP addresses.

IP address anonymization is a feature of many analytics suites. For example, in Google Analytics, the last octet of IPv4 addresses and the last 80 bits of IPv6 addresses can be set to "zero."

Here's an extract from Google's guidance on how to anonymize IP addresses in Google Analytics:

Google Analytics IP Anonymization: For all hits and for single hit sections

Google's method of IP anonymization allows you to still gain meaningful insights into the use of your website, but goes some way to protecting your users' identities. It is worth noting however, that Google does not remove as much of the IP address as is recommended by Intarea (above).

Matomo Analytics allows you to go further, and mask the last two or even three octets of an IPv4 address:

Matomo Anonymize Visitors IP Addresses options screen

Like many aspects of data protection, this is an exercise in balancing your own interests against the risks to your users' privacy.

Accuracy

The principle of "accuracy" requires you to keep all the personal data you process accurate and up-to-date.

Some practical steps you can take towards complying with this principle include:

  • Keeping any personal data in your possession up-to-date where practical
  • Complying with requests under the right to rectification
  • Allowing your users the right to maintain their own personal data through account controls

Whilst it is important for all data controllers to ensure the accuracy of their personal data, the extent to which this is relevant will depend on the nature of the product that you're developing.

Inaccurate personal data can cause big problems. False information recorded about a person can cause reputational damage. Inaccurate contact details can mean that the wrong person is targeted with direct marketing, or sent correspondence containing another individual's personal data.

Storage Limitation

The principle of "store limitation" requires that you do not keep personal data for longer than you need it.

Some practical steps you can take towards complying with this principle include:

  • Automatically scheduling the erasure of personal data in your server logs
  • Automatically scheduling the erasure of other personal data in collected by analytics
  • Drawing up a Retention Schedule to demonstrate your data storage practices

Scheduling Erasure of Log Data

As we have seen, you must avoid collecting personal data in your log files wherever possible. Where it is necessary to collect, for example, IP addresses, you should ensure that it is automatically erased at regular intervals.

The Internet Engineering Task Force suggests that IP addresses in server logs, even if they have been subject to anonymization techniques as described above, should not be retained for longer than three days.

Your web server provider may provide a function for automating log data deletion. Let's take the example of Amazon Web Services (AWS) which offers a centralized logging solution.

Log data is stored as objects in a centralized "bucket." The object lifecycle management allows users to set automated expiry dates for particular classes of object (e.g. IP addresses).

A lifecycle rule can be created via the "Management" tab in the AWS console:

Amazon Web Services Console: Management tab - Add lifecycle rule option highlighted

After defining the name and scope of the object, users can set its expiration period in days:

Amazon Web Services Console: Lifecycle rule screen

Alternatively, the logrortate utility can be used to automatically delete log files in Linux. You can also use the shred function to ensure that log files are not readable post-deletion.

Scheduling Erasure of Analytics Data

We've looked at how you can minimize the amount of personal data you collect via analytics. Any data you do collect should be kept only as long as you need it.

If you have your users' consent for analytics, and you have explained the implications in your Privacy Policy, you might be justified in retaining analytics data for longer than you keep log data.

Analytics providers will generally allow their users to set custom retention periods. The focus is often extending retention periods, sometimes for an additional fee. However, good data protection practice would obviously advocate reducing the retention period.

Adobe Analytics provides an FAQ about data retention periods:

Adobe Analytics Data Retention Policy: FAQ section excerpt

Google Analytics allows you to set your retention period for all identifiers (e.g. user IDs, advertising IDs, DoubleClick cookies) at 14, 26, 38 or 50 months.

Here are some instructions from Google on how to do this:

Google Analytics Help: Data Retention - Set the options section

Creating a Retention Schedule

The GDPR requires that you provide details on your storage period as part of your Privacy Policy. This will be relevant for any data controller. Even where a data processor ultimately carries out the deletion of personal data, the data controller should be establishing the retention period.

Creating a Retention Schedule is a good way to demonstrate that you are taking the necessary steps to comply with the principle of storage limitation. It also serves to ensure that you take a systematic approach to managing the personal data stored in your systems.

A Retention Schedule can be set out in a table. It can provide information about:

  • The categories of personal data you collect
  • The retention period
  • The purpose and rationale for keeping the data for this period
  • The action taken at the end of that period (e.g. erasure, anonymization)

You may also need to include the "trigger" that starts the clock ticking on a retention period. For example, you may need to retain a user's account data for a particular period after they have closed their account. In this case, the trigger would be "account closure."

Integrity and Confidentiality (Security)

The principle of "integrity and confidentiality" requires that you take technical measures to ensure security and prevent data breaches at every stage of processing personal data.

Some practical steps you can take towards complying with this principle include:

  • Creating procedures for assessing and responding to risk
  • Pseudonymization of personal data
  • Encryption of personal data

Assessing and Responding to Risk

In order to determine what would be an appropriate level of security for a given project, it's important to take a systematic approach to assessing risk.

In some cases, it is legally mandatory to conduct an in-depth assessment. This is known as a Data Protection Impact Assessment (DPIA).

A DPIA is a process that:

  • Describes a data processing project in detail
  • Assesses whether the project is necessary, and the methods are proportionate
  • Identifies the associated risks
  • Considers how to mitigate those risks

Under Article 35 of the GDPR, a DPIA is mandatory if you're undertaking a project which:

  • Uses new technology to process personal data, or applies existing technology in a novel way
  • Presents a high risk to people's right to privacy

When considering the need for a DPIA, you can take into account such factors as:

  • The nature of the personal data you're processing (for example, how sensitive it is)
  • The scope of the project (for example, how many people it will affect)
  • The context of the project (for example, whether your users would expect you to process their personal data in this way)
  • The purpose of the project (for example, the benefits that it might produce)

Bear in mind that even if you're not legally required to conduct a DPIA, it is a good way to protect against data breaches in any particularly complex project involving personal data.

The minimum requirements for what a DPIA is set out at Article 35 (7) of the GDPR and in guidance by the Article 29 Working Party.

A DPIA should document:

  • A description of your project
  • A description of the reasons why you need to carry out the processing and why the methods you have chosen are appropriate
  • An assessment of the risks
  • Details of the safeguards and other measures you have taken to mitigate those risks

You may need to consult with your Data Protection Authority if you find that there are risks that you can't adequately address.

Pseudonymization

Pseudonymization is a measure that replaces identifiers within a dataset with non-identifying alternatives. The remaining non-personal data can remain intelligible.

The pseudonymized data can be rendered identifiable again with reference to additional information. This means that it can be fairly easy to work with pseudonymized personal data. But equally, certain methods of pseudonymization may not be very secure.

Here's a very basic example of pseudonymization. This is just for context - please do not consider this a secure example.

Below is the original data set, which includes categories, identifiers and non-identifiers.

Name Username Mailing Address Payment Status Due Date
Thom Yorke singer198 9 Bends Street Paid -
Jonny Greenwood lead885 5 Pablo Honey Unpaid 20/11/23
Ed O'Brien rhythm992 9 Palo Alto Paid -
Colin Greenwood bassist555 29 Rainbow Road Incomplete 25/03/23
Philip Selway drummer692 5 King Limb Unpaid 25/12/23

Here is the same data set after pseudonymization:

Name Username Mailing Address Payment Status Due Date
}^+_ ;+{(€ [&-%€{@/? / "€-$[ [}{€€} Paid -
*+--; %{€€-:++$ )€!$??< > =!")+ ^+-€; Unpaid 20/11/23
€$ +'"{&€- {^;}^_//' / =!)+ !)}+ Paid -
£+)&- %{€€-:++$ "![[&[}>>> '/ {!&-"+:[ {+!$ Incomplete 25/03/23
=^&)&= [€):!; ${]__€{,/' > (&-% )&_" Unpaid 25/12/23

Each letter in the identifying information has been replaced by a special character. Unless it was decoded, the data is no longer intelligible without reference to additional information (i.e. a reference key). But the non-identifying information data (the categories of data, payment status, and due date) remains intact.

Pseudonymization can result in a data set that is both secure and usable if the method used is sophisticated enough. However, such data should still be treated as personal data. It must be stored securely, with access limited to those who need it. Any additional information used to interpret the data should be kept separately (and securely).

Encryption

Encryption encodes an entire data set. It turns a given set of "plaintext" into "ciphertext" without discriminating between personal data and non-personal data. A key is required to decode the data.

There are several options for encryption of personal data:

  • Application level - The encryption is performed on any data controlled by a given application, for example, a database program.
  • Individual file level - Specific files can be individually encrypted and then stored or transferred.
  • Full disk - All data on a given disk is encrypted. Operating systems such often offer such functionality, for example, Windows features BitLocker and Mac OS features FileVault.

It is possible that all three methods will apply to your operations in different contexts.

For example, individual files can be encrypted before they are emailed as attachments. Or encryption of the email message body can be achieved by an application running OpenPGP or via Transport Layer Security (TLS/SSL).

Accountability

Alongside the six data processing principles set out above, the GDPR provides a separate, seventh principle of "accountability." This requires that you are accountable for your compliance with all of the GDPR's principles and requirements.

Some practical steps you can take towards demonstrating your compliance with the principles of the GDPR include:

  • Producing internal policies such as a:

    • Data Protection Policy
    • Data Retention Schedule
    • Data Breach Notification Policy
  • Maintaining data processing records
  • Appointing a Data Protection Officer
  • Ensuring you have Data Processing Agreements in place, where required

Key Takeaways From This Chapter

The principles of the GDPR must permeate all aspects of your data processing operations:

  • Always process personal data in a lawful, fair and transparent way
  • Only ever process personal data in connection with a specified purpose
  • Only collect the personal data that you actually need
  • Ensure you keep personal data accurate and up-to-date
  • Only store personal data for as long as you need it
  • Ensure that personal data is processed securely

Lastly, you need to be accountable, and able to demonstrate your compliance with these principles.