Last week I received an interesting book with a list of the Top 500 largest retail websites ranked by annual sales. It was very interesting to see how these huge on-line retailers use SEO to generate revenue, especially the brick and mortar store that uses the web as an extension of its standard business. Usually, owners of these companies don’t focus on SEO as a revenue source and it’s quite sad to see how these online retailers miss great opportunity to increase sales by using SEO.
Here is a good example: SkyMall Inc (www.skymall.com) is a direct marketer with about $75M in sales in 2007 and more than 450K unique visitors per month.
Let’s take a look at their SEO. Some quick research shows obvious problems.
Duplicate Content
Mirrored Subdomains
By googling a quote on their main page in quotation marks (so as to do an exact match) I found that they have two identical copies of the website under two different subdomains:
- https://v5stage.skymall.com
- https://www.skymall.com
The pages in both subdomains are indexed in Google. Just use following search requests in Google to see list of indexed pages for each subdomain
- site:v5stage.skymall.com
- site:www.skymall.com
This creates a duplicated content problem. It is hard to say why the website has two copies of content under different subdomains. Probably v5stage.skymall.com is a development copy of the live website. Anyways, all pages under subdomains v5stage.skymall.com should be blocked from indexing (using rules in robots.txt is the easiest way to do that) so as to avoid duplicate content problems.
Internal URL Structure
Internal URL structure creates pages with identical content under different URLs. Usually it happens when products stay under different product categories.
For example:
It’s one of typical problems for any e-commerce website. Using the canonical meta tag would help solve this problem.
Robots.txt mishaps
The Robots.txt file is the best place to block pages on website from indexing, show the location of XML sitemap, block some unwanted search engine bots, etc. Let’s take a look at https://www.skymall.com/robots.txt
Improper subdomain blocking
There are attempts to block some subdomains from indexing
Unfortunately, these rules are not correct. Each subdomain should has its own robots.txt files with specific rules for each subdomain. For example, blocking subdomain v5stage.skymall.com from indexing would require file https://v5stage.skymall.com/robots.txt with following lines
Too many lines!
There are more than 13,000 lines of code like this sample
It’s possible to use two lines of code and pattern matching to get the same result. For example:
Linking to your XML Sitemap from Robots.txt
It makes sense to put a link to XML sitemap in your robots.txt file. For example:
- Sitemap: https://www.skymall.com/sitemap.xml
Sitemap.xml Mishaps
XML sitemap is a great way to provide search engines with a list of pages for indexing. Unfortunately Skymall’s Sitemap has some problems.
There are more about 3,000 listed URLs with internal search results. For example:
As a result Google has about 18,400 indexed pages with search results. It doesn’t make sense to provide this list of URLs with search results for indexing. It would be better to include them in the XML sitemap a list of pages with product categories and products.
Conclusion
Easy steps like fixing internal URL structure, using correct robots.txt files, creating useful XML sitemap, and other would make this website more spiderable for the search engines. As result, the search engines will provide better results to visitors and the website will get more organic traffic and revenue.