Last month, while we were moving our site from our old coldfusion/flat HTML based website to our new Drupal hotness I took it upon myself to learn how to do some htaccess work. In the past we had written about htaccess, redirects, and rewrites, but in playing around a little more I realized that I really didn’t understand a lot of it (hell, I still don’t). So today I’m going to walk through some more intermediary htaccess rules you can use.
You can do a lot with htaccess, from error codes to banning IP’s to password protection to setting MIME types. It’s an incredibly versatile tool. What I am interested in, however, is rewrites.
301’s redirect/rewrite seamlessly send users from one page to another, while sending information to the browser telling it that this page has been permanently redirected. For SEO’s a 301 sends linkjuice from that page to the target. It won’t always send 100% of it, but often it’s nice and simple and clean.
What we’re allowed to use in htaccess
Regular Expressions: htaccess supports a great deal of regular expressions, allowing us to do a lot of different stuff. Here are a few useful ones.
.: The period is a wild card. It can represent any character what-so-ever.
+: + repeats said character 0 or more times.
(): Parentheses represent a group of “tokens” or rule elements. For instance, (.+) would match any set of characters. This allows you to apply an operator to an entire group. So for instance, if you wanted to match the word “what” you would type “what”, but if you wanted it to also catch “whatwhat” then you could use “(what)+”.
Parentheses also create a “back reference”, which can be recalled with a “$” in many regex engines.
\w \s \d: \w matches any alphanumeric character and underscores, \s matches white-space characters (including linebreaks), and \d matches a digit.
^: The carrot matches the start of a string
$: Dollar sign matches the end of a string. In htaccess it can also be used to recall sets that have been previously defined by parenthesis.
–: a hyphen creates a range. For instance, a-z would match any character from a to z (though not any uppercase characters)
|: The bar stands for “or”. So a|b will match a or b.
\: slash means “literally”. So while “.” would match any character “\.” would only match periods.
Rewrites
To start, you’ll have to enable the rewrite engine with the code “RewriteEngine on”. Hard eh? You might also want to add “RewriteBase /” to simplify things.
The Simple Rewrite
The basic rewrite syntax is as follows:
]]>
So, if we want to Rewrite, say, vkistudios.com/google-analytics-training.html to vkistudios.com/training-google-analytics then we would write
://www.vkistudios.com/training-google-analytics [R=301,L]
]]>
Rewrite a folder to a subdomain
We use a subdomain for our blog, blog.vkistudios.com. However, people are logically going to write www.vkistudios.com/blog when they want to reach the blog. For this we do the following:
]]>
Rewrite one folder to another
Another problem moving one domain to another is that often you want to leave certain folders (say folders of javascripts) without writing a Rewrite rule for every file. In our case we had a bunch of files in a folder called “files” and moved them within the “vki_subtheme” folder.
://www.vkistudios.com/sites/all/themes/vki_subtheme/files/$1 [R=301,L]
]]>
To clarify this rule: There are regular expressions in here! (.+) means a set of characters (the parenthesis) that contain any character (the dot) any number of times (the plus) until the end of the line (the dollar sign). Our second line then calls on $1, which references the first “back reference” (area in parenthesis).
Rewrite all traffic to the www version of the site
RewriteRule ^(.*)$ https://www.vkistudios.com/$1 [R=301,L]
]]>
To explain: RewriteCond sets a condition for this rule. %{HTTP_HOST} matches the HTTP host, ! means “does not match”, “www.vkistudios.com” or vkistudios.net. [NC] makes it case insensitive.
Then the rewrite rule then rewrites anything after the slash to “www.vkistudios.com”