Practical solution: extracting a list of Query Strings from URLs

When working on complex websites, we often end up with tons of URLs that are, in reality, the same page with different query-string arguments.

Why is this important?

  1. Filter out arguments that are creating unnecessary duplicates
  2. Reduce the number of distinct entries in reports and the risks of breaking the number of uniques imposed by most tools
  3. Increase the accuracy of Page Views
  4. Identify rogue query string parameters and wrong arguments (as well as those with mixed upper/lowercase which are likely also considered as distinct ones)

The solution

I love Analytics Canvas, so I thought – why not try to do a Query String analysis using it! The original idea comes from Peter O’Neill, who offers an Excel spreadsheet with macros to do this.

The algorithm looks like this:

  • Fetch data from GA: only paths with URLs, sorted by descending unique page views
  • Split the URL to get only the Query String part
  • Further split the query string to extract value-pairs (up to 5 arguments in this case) and put them in a new column named “arg”
  • Combine everything back but keep only the “arg” column
  • Filter out rows where “arg” is empty (i.e. there were less than 5 arguments)
  • Split out the “arg” column into “var=value” pairs and keep only the “var” column
  • Combine duplicate “var” and create a new column with a total of occurences of each one
  • Sort in alphabetical order
  • Store into Excel
  • Voilà!

 

It worked on a pretty complex website which had nearly 30k URLs with query string parameters and ended up with 195 distinct arguments!

Here’s how the results looks like:

Parameter Count
transit 9960
page 1463
sLang 1454

Once equipped with this information, we can adequately configure Google Analytics by filtering out unnecessary parameters and improve the overall quality of our analytics.

Stéphane Hamel

Director of Strategic Services - Create innovative company-wide solutions and digital measurement strategies to deliver maximum value and insight for Cardinal Path clients.

Share
Published by
Stéphane Hamel

Recent Posts

GA4 and BigQuery: why might data not match?

One of the most common questions we get about GA4 isn’t really about GA4, itself.…

2 weeks ago

GA4 + Optimizely: Integration Overview

Using engagement data to improve website performance is a near-universal use-case for users of Google…

2 months ago

Google Delays Third-Party Cookie Deprecation to 2025

Google announced on April 23 that it will again delay third-party cookie deprecation (3PCD) in…

3 months ago

This website uses cookies.