So, how serious has the keyword (not provided) issue become? Answer: Increasingly serious, and some project the “end of data” in September 2014 (see image below).
In part 1 of this blog post, we looked at the long tail distribution of keywords, audience segmentation, and “hot tips” to help you manage keyword (not provided). Now we will:
Credit: http://www.notprovidedcount.com/
Ironically when we had all of our organic keywords, we took this data for granted (partly because long tail keyword analysis is difficult). Currently most sites have about half of this keyword data available (50% of organic keywords are “not provided”), and in the future we may need to mine our older historical data to get SEO insights about our Web presence and our competitive position. In this blog we are also going to examine the reality of this issue, and demonstrate the fact that even with a large proportion of organic keywords (not provided), you can still get high quality keyword insights that are statistically valid.
After the “end of data” we will need to use 1) Webmaster tools and 2) Adwords data to get this “visitor intent” data, and both integrate with Google Analytics so we are all good on that front. A post by Claire Broadley also references a custom Google Analytics filter to replace the “not provide” keywords with the landing page URL (URI) – not a great idea unless you are using an alternate Google Analytics profile with this filter applied, because replacing “not provided” keywords with landing page URIs can “muddy the waters” for the remaining keywords you have (e.g. keywords you bought in PPC/CPC adwords, and the remaining organic keywords). A better alternative is to use a secondary dimension in Google Analytics, so you can see the landing pages where the keyword is not provided (without corrupting the source data).
Landing-page based keyword analysis requires that you first perfect your SEO (technical and semantic) so that search engines algorithmically match visitor queries to each of your landing pages – as contextually appropriate as possible (minimize misaligned traffic). Great idea to use “non-bounced” visits in conjunction with the secondary dimension shown above in order to strip out the “misaligned traffic” (when search engines direct the wrong visitors to our content by mistake – there will always be a percentage of traffic like this).
At the core of SEO analysis we will need to:
Our historical data (before keyword “not provided”) is still a valid “mine” of insights into your business and your customers, but the value of these insights will fade away with time as your site changes, as the competitive landscape changes, and as your customer/audience needs evolve, so we need to act now on the data we have.
What about the “partial” keyword data that we have now from October 2011 through to September 2014 (the projected end of organic Keyword data)? Read on…
So how can we get “a read” on all of our organic keywords that are (not provided) by leveraging the data we do have? Option 1: use the bulleted list above, or Option 2: use our remaining organic keyword data as the basis for estimating the keyword distribution of keywords within the (not provided) set.
WARNING: THE FOLLOWING CONTAINS STATISTICS AND MATH (in case it wasn’t your favorite subject in school :)…
First, a reality check: even if we were living in the perfect world, without the keywords (not provided) issue, the keywords distribution of your entire organic search is still an “estimate”. Below is the reason why. (Note that here we assume that visitors behave in the same way if they are not logged in to Google (keywords propagate in the referral data), or if visitors are logged in to Google (or are on an android mobile device) and therefore the keywords are (not provided). This assumption simplifies the math, but we recognize that keyword patterns may vary from channel to channel (e.g. mobile keywords are generally shorter or may have a different bend). Thanks to Charlotte Bourne for pressing this point home.)
For the statistical calculation of the keyword distribution we need to think multi-Bernoulli distribution. Under this model:
Using multi-Bernoulli distribution and the law of large numbers, the variance of percentage within the entire organic search visit keyword set is:
Credit: Yi Jiang (Nelson), for creating this analytical approach.
Our objective is to minimize the variance and uncertainty above. If we apply L’Hôpital’s rule for this formula, we find that there are several ways to reduce the variance.
Having keywords (not provided) reduces the amount of known keywords for visits and this is fine as long as we have a sufficiently large number of visits with known keywords. Conversely, even if we know all the keywords (0% not provided), but there are only a small number of visits, then we could still have a large variance and a limited ability to extract statistically relevant insights from this data.
Illustration:
Example Scenario 1 Organic search visits (Trials): 10,000Keyword (not provided): 0%Number of Visits for one particular keyword (Event): 1,000 At the 95% confidence interval this keyword as a percentage of all organic keywords is | Example Scenario 2 Organic search visits (Trials): 1,000,000 Keyword (not provided): 50% Number of Visits for one particular keyword (Event): 50,000 At the 95% confidence interval this keyword as a percentage of all organic keywords is |
So how does this apply to your keyword data? The above example shows that even when a large volume of keyword data is “not provided” we may still have a statistically significant data set that enables us to the same degree as having all our keywords. The spreadsheet tool shown below will help you to understand the statistical significance of your keyword data.
Use the spreadsheet tool below with your own keyword data by following these seven steps.
You can now use the calculator “phrases” in cells L3 and L4 in conjunction with the values in columns “L” or “M” for a specific keyword row to estimate the total number of organic keywords, including those hidden in the (not provided) keyword set!
If you want to analyze a group of keywords that match a certain pattern (e.g. all keywords that contain “cat”), then use the inline filter in Google Analytics (the box just above arrow #5 in the above image) to export only this data, as described above (up to 500 rows). Same as before, you can use the calculator cells L3 and L4 in conjunction with the values at the bottom of the calculator (cells L510 or M510) to estimate the total number of organic keywords (e.g. that contain “cat”), including those hidden in the (not provided) keyword set. Here we assume that there is no correlation between the keywords (an assumption that makes the math easier, but still important to keep in mind).
Whenever people use “secure search” with Google, the keyword is (not provided). What we are seeing now is that more and more searches are defaulting to secure search which increases the number of keyword (not provided) instances…
Some search engines do not pass the keyword, and so their traffic appears under “referrals” in Google Analytics, instead of under “search engines”. This is easy to fix by updating your Google Analytics page tag to include the _addOrganic function (analytics.js introduction will move this functionality into the Google Analytics administration interface, and out of the “on page” Javascript). (https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiSearchEngines#_gat.GA_Tracker_._addIgnoredOrganic)
The big news Apple’s iOS6 & Andriodv4+ are no longer providing the referrer at all, so these organic visits now show up in Google Analytics as direct traffic…. (and you thought the keyword “not provided” issue was bad!). With iOS6 and Android4+ sending a null value in the document.referrer field for organic searches, coupled with the growing number of these mobile devices (the mobile Internet is expected to be three times the size of the traditional PC/Laptop Internet) we are moving into a dark period in analytics, where not only is the keyword (not provided), but the medium itself is no longer known for a growing number of organic visits that now are labelled “direct”…
This is ironic because Google is invested heavily in multi-channel funnels, and attribution modelling for Google Analytics Premium clients. As we lose the medium data, this modelling capability becomes less accurate and less valuable. It is conceivable that a large brand might be several million dollars off in terms of evaluating the contribution of organic traffic to its ecommerce and goal conversions. Of course paid search and display advertising should still track faithfully in Google Analytics, and perhaps this is the main concern for advertisers and Google.
Social media traffic also has this “Medium (not provided)” problem in which non-utm tagged links posted to Facebook, Twitter, and other social networks and are then shared via IM, email, apps etc., and this results in a larger volume of traffic that is labelled “direct” in Google Analytics. The real lesson here is that as more of our data becomes “lost” the importance of campaign tagging is growing!
Read more about the so called “dark social” phenomena here:
https://www.theatlantic.com/technology/archive/2012/10/dark-social-we-have-the-whole-history-of-the-web-wrong/263523/
Inflating the proportion of your online traffic categorized as “direct” also has secondary effects:
As Web Analysts and SEOs we need to innovate in the way we do analysis, as the data landscape changes. With the keyword (not provided) issue we lost some data we were used to having, but this post shows that we have other methods. It has been noted that those who loose their sense of sight, for example, get better at using their other senses to return to a high functioning state. Adaptability is resilience.
In all our reporting it is critical to provide a clear qualification of how good the data actually is, and what assumptions we are making in our analysis. Solid statistical analysis will allow us to use confidence intervals to improve our “sensory perception” of the data environment. This approach helps decision makers improve the quality of their understanding and leads to better informed decisions that are more likely to be “on target”. This in turn allows our clients to realize greater value from the insights and recommendations we provide.
As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…
Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…
Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…
This website uses cookies.