October 21, 2011
kclark

October 21, 2011 • kclark

Sprechen Du RegEx? A Beginner’s Guide to RegEx

Reading Time: 4 minutes

One of my favorite things to write about, back in the VKI days, was RegEx. An incredibly useful tool for people doing anything from simple find and replace scripts in Notepad++ to server admins redirecting pages, RegEx is one of those tools that you really should be familiar with if you work in our industry.

Sprechen Du Regex?

Regex commands can vary in complexity from simple to brain meltingly complex depending on how much “language” (and more importantly: logic) you use with them. The following is a hefty (but not complete) selection of regex terms:

. : The period is a wild card. It can represent any character what-so-ever.

+ : repeats the previous character 1 or more times.

* : repeats the previous character 0 or more times.

() : Parentheses represent a set of “tokens” or rule elements. For instance, (.+) would match any set of characters. This allows you to apply an operator to an entire group. So for instance, if you wanted to match the word “what” you would type “what”, but if you wanted it to also catch “whatwhat” then you could use “(what)+”.

Parentheses also create a “back reference”, which can be recalled with a special symbol in many regex engines (in Google Analytics, for instance, you would use $).

[] : Square brackets represents a “character class”, and are often used for ranges. For instance [a-t] would match any lower case letter between a and t. You can also have multiple items within a bracket, such as [a-zA-Z0-9s-#”=] which would match any single letter, number, space, hyphen, number sign, quotation, or equals sign. (Yes, this would be better written [ws-#”=], but I was making a point about ranges)

{} : Curly brackets are odd. They define repetition. So (what){2} would only match two repetitions of what (whatwhat). Alternatively (what){2,7} would count between two and seven repetitions of what (including 3 repetitions, 4 repetitions, 5 ,6)

d :Represents any digit

s : Represents any whitespace element (space, tag, etc.)

w : Represents any alphanumeric character or underscore

D S W : Negation of the above, so not a digit, not a white space, etc.

$ : Dollar sign matches the end of a string. In htaccess it can also be used to recall sets that have been previously defined by parenthesis.

^ : The caret has two purposes. It can match the start of a string, but also it can negate characters in characters sets. So ^[a-z]$ will only match a a string that starts and ends with a single lower case alpha character, [^a-z] will match any string that does not contains characters other than a lower case letter. So aaa will not match, aAa will match, and AAA will match.

– : a hyphen creates a range. For instance, a-z would match any character from a to z (though not any uppercase characters)

| : The bar stands for “or”. So a|b will match a or b.

: slash means “literally”. So while “.” would match any character “.” would only match periods. Similarly while “?” would match the end of a sentance, “?” would match a question mark. In certain implementations of regex (eg. Notepad ++) slash can also be used with numbers to repeat areas that have previously been defined by brackets (same as $1, $2, etc. in htaccess).

?: Question marks have a lot of uses. Following an expression it matches a string that does or does not contain this. So for example “[1080 ]? Howe st” would match “1080 Howe st.” or “Howe st.” but not “64 Howe st.” while “64?” would match “6” or “64”. The question mark also has the dual purpose of making an expression “lazy” (normally regex is greedy). Greed and laziness makes my head hurt so I’ll just leave this one to LunaMetrics (good greed and bad greed).

(?i) : I said question marks have a lot of uses. This command turns on case insensitivity. So, oh (?i)my gosh will match “oh my gosh” and “oh MY GOSH“.

(?-i) : Yep, a negative sign. Reverses what (?i) does, turning off case insensitivity (yay double negatives). Think of (?i) and (?-i) as HTML’s <> and </> and you’ll have the idea.

(?=): Matches the the preceeding character that follows the character after the equals sign. So in “oh my g… OH MY GOSH, G(?=O) would match.

Got all of that remembered? No? I doubt anyone does.

Sprechen Sie Regex?

So how can we use this? Here’s a neat trick.

Say you want to know how if there is a behavioral difference between people using longer keyphrases or shorter ones. One might assume that longer keyphrases would convert more, since they are more specific, and there is a greater chance that a user is finding exactly that. But why on earth would you assume when you have analytics?

Fortunately, a commenter on Avinash Kaushik’s blog has a neat trick for doing this using regular expressions.

Make a new advanced segment with ‘keyword’ ‘matching RegEx’ and input one of the following:

^s*[^s]+s*$ – one keyword
^s*[^s]+(s+[^s]+){1}s*$ – two keywords
^s*[^s]+(s+[^s]+){2}s*$ – three keywords
^s*[^s]+(s+[^s]+){3}s*$ – four keywords

So this reads as:

Start of line: matching any white space(s) repeated zero or more times (*) followed by not-a-whitespace ([^s]) one or more times followed by a white space zero or more times, then end line. Then if you want more than one keyword, you put a repeat ({number}). Repeat once for two keywords, twice for three, etc.

You can also do ranges such as:

^s*[^s]+(s+[^s]+){1,4}s*$ – two to five keywords
^s*[^s]+(s+[^s]+){5,}s* – six or more keywords

There you go. Try those out and let us know how you find longer phrases affect site metrics.

Keep an eye on the blog over the next couple of weeks as we post more Regex tips and tricks that you can use both within Google Analytics and other Regex engines.

Author

kclark
View all posts

Our Picks

Google Marketing Platform Hub

Your one-stop-shop for everything Google Marketing Platform, designed to help marketers stay informed and up-to-date on product news, solutions, how-to’s, and more.

Sprechen Du Regex?

Sprechen Sie Regex?

Author

kclark

Locations

Follow Us

Get news & insights from Merkle - Cardinal Path

Locations

Follow Us

Thank you for your submission.

Thank you for your submission.

Thank you for your submission.

Your request has been submitted and a rep will reach out to you shortly.

You may also be interested in...

Message Sent

Thank you for your interest.

Thank you for registering.

You should receive a confirmation email from GoToWebinar with your unique webinar login information. If you do not receive this email or have trouble logging in to the event, please email asmaa.mourad@cardinalpath.com.

Thank you for subscribing!

You're now looped into the world's largest GMP resource hub!

Thank you for your submission.

Thank you for your submission.

Thank you for your submission.

Thank you for your submission.

Thank you for your submission.

Message Sent

Thank you for registering.

Thank you for your submission.

Message Sent

Thank you for registering.

Thank you for registering.​

Paid media spend by Government websites increased a whopping 139% YoY in 2020.

2020 Online Behavior Live Dashboard

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

2020 Online Behavior Live Dashboard

Thank you for your submission.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for your submission.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Thank you for registering.

Cardinal Path is continuing with its series of free training. Next we are conducting training on Google Data Studio. Check it out here.

Cardinal Path hosted a live session to connect with you and answer all your questions on Google Analytics.

Get all the expertise and none of the consultancy fees in this not-to-be-missed, rapid-fire virtual event.

Thank you for submitting the form.

Thank you for submitting the form.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.