Google policies mandate that no data be passed to them that could be recognized as personally identifiable. This post aims to provide an easy-to-follow, structured approach to identifying Personally Identifiable Information (PII) that might exist in your or your client’s Google Analytics account, as well as different methods for preventing further collection of such information. In this post I will outline what constitutes as PII, and how to avoid potentially passing this information to Google when implementing Analytics on a property.
The approaches outlined below aim to help alert you that PII is being captured. Ultimately however, Google requires that:
“You will not and will not assist or permit any third party to, pass information to Google that Google could use or recognize as personally identifiable information.”
This means that if you find PII in your data collection, simply filtering out the data from your Google Analytics property is only half the battle. Ultimately no PII should make it into Google Analytics at all.
Any name, email address, billing information, social security numbers, or other data which can be reasonably linked to such information by Google, or data that permanently identifies a particular device (such as a mobile phone’s unique device identifier), even in hashed form.
“The Google Analytics terms of service, which all Google Analytics customers must adhere to, prohibits sending personally identifiable information (PII) to Google Analytics … Your Google Analytics account could be terminated and your data destroyed if you use any of this information.”
So you suspect that you might be collecting PII, but are not sure of where to look or what to look for? Then this post is for you! Below are some of the major areas where users can run into trouble with PII within their Google Analytics Data. Oftentimes, the inclusion of PII in any of these different areas is unintentional, which is why performing a PII audit is so important.
Looking for PII during the setup and testing phase of your Google Analytics implementation is recommended as a best practice in order to avoid running into any PII collection issues further down the line.
So, now we know where and what to look for in our Google Analytics reporting interface. But before we dive into the various auditing methods, I wanted to take a moment to highlight one of the techniques we will use to assist us in our task. According to Jan Goyvaerts over at www.regular-expressions.info:
“A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids.”
Below you can view an assortment of regular expressions for matching some of the different types of PII. These expressions will allow you to search for some common PII types. There are probably many other variations of these regular expressions or even regular expression types that would fit in here and essentially do the same thing, but these are some of the more common ones:
*Caveat: not every type of PII can be searched for in this way due to the complexity of the text (e.g. a physical home address, or first/last name).
PII Type | RegEx |
---|---|
Email address | ([a-zA-Z0-9_\.-]+)@([\da-zA-Z\.-]+)\.([a-zA-Z\.]{2,6}) |
Social security number | ^\d{3}-?\d{2}-?\d{4}$ |
IP address | ^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$ |
This is an overview of the two main methods you will be using to identify potential PII within the common trouble areas, and their limitations. Here you can use the regular expressions listed above, as well as your own personal sleuthing skills to look for PII. Since regular expressions won’t help you when it comes to things like physical address or first/last name combinations, you will need to manually scan the different reports for those types of PII.
The inline filter method will be your first, and likely best approach for identifying PII in your data. It will allow you to quickly scan your standard reports for the presence of the most common types of PII. As previously mentioned, some of the most common places where PII lives include: query string and event parameters. The most common reports where this auditing technique can be used:
The process is simple, and consists of four easy steps:
Your chosen report will now be filtered to only show you data which includes PII according to which regular expression you have chosen. If you don’t see any records this is GREAT NEWS! It means that your data does not contain the type of PII you are searching for. If you do see results, then this means that your data contains PII and you will need to take some action to address the issue (more on this later).
Figure 1.0
The advanced segment method is similar to the inline filter method with the major difference being that the segment applies to all reports automatically once it is created. We will be using the Regular Expressions listed above to create a segment which will identify any sessions which contained different types of PII.
The example segment setup below (Figure 2.0) looks for sessions which contained pageviews containing PII in the URL, however this approach could also be applied to event parameters (event category, event action, event label), as well as custom dimensions, site search terms, or social events.
Using this approach also displays the number of users and the number of sessions (Figure 2.1) as a percentage of the total.
As with the inline filter approach, the most common reports where your newly created segment will identify PII are:
Figure 2.0
Figure 2.1
So now that you’ve gone through and checked for PII and haven’t found anything then congratulations, you can stop reading here!
If you have found some form of PII, don’t panic. You will just need to take the following steps:
As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…
Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…
Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…
This website uses cookies.