New blog post! Personal Data and 33 bits of Entropy

The concept of personal data revisited – the mathematical approach of identifying a person

As a lawyer, one of the fundamental and most profound questions when working with data protection and the GDPR is: what is personal data? If you are a company or an organisation that started implementing the necessary groundwork to be in compliance with the GDPR, surely, you have pondered over the same question.

Which information constitutes personal data and how much information is required to identify a person? As it turns out, there is an objective way to approach this question.

Personal data

First, let us look at the basics. According to the GDPR, “personal data means any information relating to an identified or identifiable natural person […]”. While this provides some guidance to the criteria of determining whether specific identifiers, alone or jointly, constitutes information able to correctly deduce the identity of a unique individual, it is clear that the definition still leaves ample space for interpretation. Whether the identifiers presented are enough to identify a single person highly depends on the context they are used in. For example, one of the most common identifiers is the name of a person. If you use the name Oscar, which is a relatively common name, it may not always be enough to identify a specific individual. However, if you add additional identifiers such as: address, telephone number etc., the possibilities of it being anyone else than a single person quickly decreases. In addition, it is also possible to use other identifiers to identify an individual. If I say that a certain person is working at Amazon, this in itself is not sufficient information to identify anyone as Amazon has over 500 000 employees. If I tell you that a certain person is a CEO, this is also not sufficient as there are literally millions of companies with CEOs in the world. However, couple the fact that the person working at Amazon also is the CEO there, and it would be easy to deduce that I am talking about Jeff Bezos.

Usually, the situations presented are not as clear-cut and there are many complex situations subject to interpretation where bits and pieces of information are sporadically available. While each piece of given information might be partially revealing about a person, one might wonder whether it would be possible to measure exactly how much information one would need in order to identify someone? To determine such a thing, one could argue, would resemble the act of determining how many grains of sand you would need to build a sandcastle.

Well, it seems to be the case that there is a way to measure the exact amount of information you need, and the information hides behind 33 bits of entropy.

33 bits of entropy

There is a mathematical quantity called entropy which is measured in bits (if you are a lawyer, like me, you might be squirming uncomfortably in your seat right now). Entropy can be thought of as the number of possibilities a random variable can generate. If there are two possibilities, the entropy is one bit. If there are four possibilities the entropy is two bits, and the number of possibilities grows exponentially with each bit of entropy added. As there are around seven billion people on this planet, the entropy would be around 33 bits (e.g. 2 to the power of 33 which gives us around seven billion possibilities). In plain language, this means that you need 33 bits of entropy (footnote 1) to objectively and definitely identify a specific individual. In the same way, identifiers such as name, address and birthday etc., carry with them bits of entropy that may be partially revealing about a person’s identity. By using a mathematical formula (footnote 2), you are able to deduce how many bits of information you might gather from certain factors. Someone’s unique birthday is worth 8,51 bits of information while a certain ZIP code might be worth 10-20 bits of information depending on the area of the ZIP code. According to mathematical theory, if the bits of information are truly unique information bits, by adding the bits of entropy together it is possible to identify a specific person without fail.

With that said, information that does not provide new information e.g. if you know that someone lives in Stockholm, the information that they live in Sweden does not constitute new information, and hence cannot be counted towards the bits of entropy.

Is it really that simple?

In accordance with above, it seems to be possible to simply gather different bits of information, insert them into a mathematical formula and get an answer of whether the accumulated information is enough to identify an individual. Well, turns out it is not that simple. In theory there is no dispute that, this is how you can effectively identify someone. In practice, however, there are several concerns that might be addressed. Take the fact that it is difficult to understand how much information a certain identifier might present. The example above, that the city of Stockholm belongs to Sweden and hence does not bring forth new information, presumes that specific knowledge. Thus, it is not easy to distinguish already known information from new information which leads to an incorrect estimation of the information provided.

It is also necessary to understand that above-mentioned approach must be put in a legal context and therefore discern it from a purely mathematical approach. According to the GDPR, a criterion for the individual to be identifiable is that account should be taken to all the reasonable means at the controller’s or any other person’s disposal. This includes factors such as cost, amount of time and technical means amongst other things. Distinguishing between the objective way of being able to identify a person and a relative way of doing the same provides two different results. While blood, fingerprints and other types of unique biological samples might contain all the bits of entropy required to objectively identify a person, in most contexts there is simply no way of identifying the specific person behind the biological sample. Hence, although identifiers may contain all the necessary information on an objective level to identify a specific individual, in most cases it would not, judicially speaking, count as personal information.

With that said, it seems that privacy lawyers need to be around for a while longer in order to strike the correct balance of what constitutes personal data and not. If it is within a legal context, that is.

[1] The number is closer to 32,84 in reality as the population today is 7,7 billion, but for simplicity’s sake we will round it up to 33.
[2] ΔS = – log2 Pr(X=x), where ΔS is the reduction in entropy and Pr(X=x) the probability of a fact being true e.g. someone’s unique birthday would be 1/365.

This blog post is written by Kenny Chung, lawyer at Synch. Kenny is passionated about privacy issues beyond the ordinary. Read his thoughts about Personal Data and 33 bits of Entropy.

News and Insights
Blog Posts

Responsibility of online platforms and the regressive opinion of the Advocate General

08/12/2020

This blog post was written by My Byström, lawyer at Synch A comment on the opinion of the Advocate General in joined cases Youtube (C-682/18) and Cyando (C‑683/18) In December, the CJEU is expected to deliver its judgement in the joined cases Youtube and Cyando, where questions regarding the liability of platforms for user uploaded materials have […]

News

Who’s Who Legal  

22/12/2020

Once again, Synch’s lawyers have become ranked in Who’s Who Legal. Among 28 000 lawyers and experts in 35 different areas, has Who’s Who legal identified the foremost legal practitioners and consulting experts in business law. It is impossible to buy entry into the publication We are happy to announce that the following lawyers of Synch are ranked: Anders Hellström: (National Guide) Category: Sweden – […]

Press release

SYNCH NEW LEGAL PARTNER TO VNTRS

18/12/2020

VNTRS Consulting AB (”VNTRS”) has chosen Synch as its new legal partner for its future effort of helping entrepreneurs and intrapreneurs to build digital products and services for the ever growing digital market. VNTRS is a consulting company and an early-stage investor that works in the area between digital product development and investments in technology […]

Press release

SYNCH AND WESTERMARK ANJOU MERGE

16/12/2020

The law firms Synch and Westermark Anjou merge under the Synch brand. The focus will continue to be on tech and digital business, while the offer is broadened. The merger means that Westermark Anjou adds expertise in Synch’s existing core areas, but also strengthens our offer within capital market law. “The vision we have had of following […]

Press release

Synch has acted as legal adviser to CovR Security AB

08/12/2020

Synch has acted as legal advisor to CovR Security AB in connection with the company’s recently completed financing of around SEK 20 million. Patrik Malmberg, co-founder and CEO. “Synch has assisted us with legal support within capital raising, commercial agreements, regulatory and intellectual property issues for many years. Synch’s expertise and experience, combined with their understanding of the challenges that growth companies are […]

Press release

Magnus Sundqvist and Léonard Van Rompaey and are speaking at the Nordic Legal Tech Day 2020

17/11/2020

Magnus Sundqvist, Head of Digital Services at Synch & Maigon AB, and Léonard Van Rompaey, postdoc & Research Consultant at Synch, are speaking at the Nordic Legal Tech Day, on November 19 in Copenhagen.