As the use of enterprise Google Analytics 360 data expands to include machine learning, predictive modeling, and multi-touchpoint attribution, data quality has never been more important.
In this post, we’re going to bring it back to the basics and discuss perhaps the most essential go-to filter for your Google Analytics data: filtering out your organization’s own activity by IP address.
We’ll explore what an IP address is, why pulling certain IP addresses out of your data is important, and how to do this within Google Analytics. We’ll also review how to filter out more complex IP addresses that go beyond the traditional v4 format.
What is an IP address?
IP stands for “Internet Protocol”. An IP address is what identifies your device (computer, tablet, mobile, etc) to a network (cable company at home, corporate network at work, wifi network at Starbucks, etc). Every request that you make online – be it browsing a website, sending an email, or placing an order – gets sent out through your network to the Internet. Your device’s current IP address is used to ensure that the responses to those requests get sent back to the correct person (you!). Since an IP address is assigned to you by your network/ISP, it is not permanent…so, you will have one IP address when using your laptop at home, and then another IP address when you take that same laptop to a public café with an open wi-fi hotspot.
Why should you filter IP addresses out of your data?
The reason that most organizations implement analytics on their website is to get an accurate assessment of real user behavior. However, it is not uncommon for companies (particularly larger organizations) to set their employees’ homepages to default to the company website. Depending on the number employees there are and the number of times those employees open a new browser session over the course of a day, that could be an enormous amount of unqualified traffic.
Regardless of if the corporate website is set up as the browser default or not, employees of a company are likely going to interact with the website much differently than the average user would interact with the website. Collecting all of those internal behavior patterns will only serve to distract from the real user behavior that is most desirable to assess.
There are plenty of other good reasons to filter out certain IP addresses, but they all generally boil down to making sure that the data you are collecting about your site is from real external users behaving naturally.
Filtering out IP addresses with Google Analytics
Google Analytics makes it pretty simple to filter out a single IP address or a range of IP addresses from your analytics views. However, there are two main types of IP addresses that you may encounter – IPv4 and IPv6 – and they each need to be handled slightly differently. We will cover how to filter each type of IP address below.
IPv4
An IPv4 address has the format x . x . x . x where x is a decimal value ranging anywhere from 0-255. The first step to tracking most IP addresses is to convert the IP address or range into a regular expression (or, “regex”).
In a standard IPv4 address as described above (or a standard range of several IPv4 addresses), the easiest way to generate a valid regex for use with the Google Analytics filter is to use E-Nor’s own IP Range Regular Expression Builder online. Once you have your regex, apply it to a Custom filter, as shown in the examples below.
Example 1:
SINGLE IP: 123.45.67.89
REGEX: ^123\.45\.67\.89$
Example 2:
RANGE: 123.45.67.89 – 123.45.67.255
REGEX: ^123\.45\.67\.(89|9[0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$
In some cases, you may encounter an IPv4 address with a subnet range (sometimes referred to as a CIDR scheme – Classless Inter-Domain Routing). This will be noted by a / at the end of the IP address, followed by a number ranging from 1-34. In these cases, before using the IP Range Regular Expression Builder tool referenced above, you must first convert the IPv4 address to a valid range. For this, you could use the MX Toolbox Subnet Calculator.
To use this tool, first enter the IPv4 address, then select the subnet value (the /XX), and click View Subnet.
This will output a standard IPv4 address range that you can use to build your regular expression:
Example:
PROVIDED: 123.45.67.89/20
CALCULATED RANGE: 123.45.64.0 – 123.45.79.255
REGEX: ^123\.45\.(6[4-9]|7[0-9])\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$
IPv6
IPv4’s construction of four number groups with three numbers each gives over 4-billion different possible IP addresses. That sounds like a lot, but with new wireless and networked devices popping up all the time, an enhanced protocol was created to handle the possibility of running out of IP addresses…enter IPv6. IPv6 was created in the late 1990’s, and at the time of writing this has roughly 22% adoption worldwide.
A typical IPv6 address has the following format: x : x : x : x : x : x : x : x where each x (called a “segment”) can be any hexadecimal value between 0 and FFFF. Segments that are zero can be left off in a short-form notation, so it is not uncommon to see an IPv6 address formatted as something like x : x : x : x :: (indicating the last four segments were zero).
With IPv6’s construction, there are over 340-undecillion (that’s 36 zeros!) possibilities of unique IP addresses, so we shouldn’t be running out of these ones any time soon. To filter out an IPv6 address, simply use a Predefined filter equaling or beginning with the value provided.
Example 1:
IP ADDRESS: 2600:0C02:1020:2111:FFFF:FFFF:FFFF:FFFF
Example 2:
IP ADDRESS: 2600:0C02:1020:2111::
You may encounter IPv6 addresses that have subnets appended to them (/XX) much like the IPv4 example noted earlier in this post. Unlike with IPv4 addresses that include subnets, however, for IPv6 addresses you can simply leave off the subnet entirely.
Example:
IP ADDRESS: 2600:0C02:1020:2111::/64
Additional Resource
Creating a Google Analytics View Filter for Internal Traffic with regexip