I was never planning for there to be a part 2 to the Regex and GA post, but following the release of Google Instant the internet was awash in people presenting new ways to analyze keywords using regex based filters. These provide some great RegEx samples to help you understand how RegEx works. So Today, I am going to walk you guys through these filters and help the regex/GA newbies out there understand what each are doing.
How NOT to track Google Instant
A post by Semetrical aimed to extract a partial Google Instant query via GA filters (that’s account filters, not result filters). Of course, it isn’t pulling the correct query parameters out (as he admits in the edit) but it’s still a pretty awesome example of what you can do with regex and GA.
In this particular case, he has us create a new account with the following filter:
What’s going on here?
He has instructed GA to take the referral and match [?|&]oq=([^&]*). Take a look back at my last post, and see if you can figure this out.
Got that list handy? Good, here we go:
This is taking the referring string and matching any term contains ? or & followed by “oq=” followed by a set that contains any character except an & repeated 0 or more times. Also note that since the last part is in brackets it can be recalled later.
Next, Field B takes the campaign term that matches (.*). So that’s anything repeated 0 or more times (so, anything at all). The brackets, again, define it as a set.
Now this is the interesting part, it outputs to campaign term: $B1|$A1, in other words it outputs the first set in brackets, a bar, then the second set in brackets. So when you see the keywords it gives you the keyword that Google is reporting, plus the incomplete term that they had before they clicked on the completed keyword.
Neat, eh? Bet a bunch of you didn’t know you could do that with GA filters.
Michael Whitaker takes another approach with:
Michael is more concerned with length of keywords than in tracking the exact string put into Google. This is an interesting idea, and useful for more than just Google instant (in fact, there is very little about this that is Instant specific).
He has a series of links to a series of custom segments that consist of “keyword” “matches regular expression” and a regular expression. These include:
matches one keyword:
How do we read this? Line starts with a set of characters that contains neither “.” nor a whitespace nor “-” repeated one or more times, then end line.
matches two keywords:
This one starts the same as before, but follows it up with with a string that does contain “.”, a white space, or a “-“, and then another string that doesn’t contain those. So essentially its saying alphanumeric characters followed by a space or dash or period, followed by any combination of alphanumeric characters.
Food for thought: why not just use \w? You’d write it ^[\w]+, much simpler, no? I am going to guess that it’s a logic thing. Defining it as \w means “only match the characters that match \w”, while use of the ^ means “match any character that does not include these”. So his way might produce more false positives, but it wont block any potentially useful data.
matches 3 or more keywords:
This one is a little different. It starts the same, then has the two ranges repeated two or more times. Pretty easy, eh?
I’m sure there’s more expressions out there that people are having trouble with, and if any one is having any trouble with any regular expressions send them in and I’ll see if I can help (provided you don’t mine me tossing them into a blog post).