August 21, 2017

03/07/2017 Forget-me-not

As many of us know, the right to erasure of ‘personal data’ is one of the key elements within GDPR.

But that leaves a big question mark over exactly what ‘personal data’ is, and hence what data is to be erased.

Back in 2007 the European Data Protection Working Party published Article 39, and defined personal data as ‘any information relating to an identifiable natural person’.

They went on to say that ‘a natural person can be identified when, within a group of persons, he or she is distinguished from other members of the group’.

The ICO recently wrote ‘the more expansive [GDPR] definition provides for a wide range of personal identifiers to constitute personal data’. Here I assume that they are not changing the definition itself,  but rather just adding in new forms of identifier, like cookie ID.

So, are we being asked to deduce that if a data item is not an identifier relating to a natural person, it is not personal data?

Assuming that this is the case, there are then two important questions that follow and which need to be answered: when is an item of data definitely not relating to a natural person, and hence something that does not need to be erased and within the remaining population of possible identifiers, how uniquely do identifiers need to point to a single individual to be classed as a true identifier?

Personal data does also need to be about an individual in order to relate to it. The value of a house is not personal data until you relate it to an owner, in which case it tells us something about how rich he is.

Many ordinary transactions for instance would not on their own relate to or identify an individual, but certain ones might do.

To take an extreme example, a sale record in the Land Registry for Buckingham Palace would certainly identify the transaction as being associated with the Queen. But less extreme examples like a pattern of phone calls made at certain times of day could also point to a unique individual.

And although a number on its own is clearly not an identifier, when you put it in the category of a list of customers, each with a customer number, then it definitely can become one.

So the question of whether an item of data does or does not point to an individual will depend on its context.

However,I suspect that GDPR is not expecting us marketers to use GCHQ level intercepts to trace links to an individual, which would then make almost every item of data a potential identifier, but rather expect us to employ simpler and more straight forward ones like an IP address, name or email.

But, at this point a further problem arises, which is that many of these ‘straightforward’ identifiers, like forename and surname, can point to more than one person. When I Googled 192.com I found that there are 50 people called Julian Berry in the UK, so Julian Berry in itself is not a unique identifier until it is associated with other information like his address.

Another experiment we ran was to look at whether through knowing data of birth, outbound postcode and gender you could uniquely identify an individual from a UK wide lifestyle database. And the unexpected answer was that in 70%+ cases you can.

This shows that although each of these data items on their own do not uniquely identify an individual, in combination they do.

So where is this taking us?

I would suggest that;

– data items on their own may not be relating to an individual or unique identifiers, and hence personal information, but that they may become so when associated with other data items. A forename and surname plus an address becomes a unique personal identifier. This means that, when deciding what are personal data, we will need to identify either individual data items that point uniquely to an individual like an email, or groups of data items that in combination may do so.

– data items that either on their own, or in association with other data items on a database, definitely do not relate to or point to an individual, are not personal information. Broadly speaking this will mean that when erasing personal data, we can leave areas like ‘transactions’or ‘donations’ untouched.

This implies that we will need to do a review of all the data held by an organisation to define what could be, and what could not be, personal data, before setting up the technology to erase that personal data when requested to do so.

And when erasing it, we will need not just to erase the personal data held in downstream systems like a single customer view, but also upstream in source data systems. So this implies knowing where all the personal data about an individual has originated, and where it is currently being stored.