Of the several dozen reports built into the Google Analytics navigation, some are more specialized and rarely used, while others – such as Landing Pages – remain one of the core reports accessed most regularly by the whole spectrum of Google Analytics users. Why is landing pages a perpetual favorite?
For these reasons, and perhaps others, the Landing Pages report remains a go-to resource.
Google Analytics has done a great job optimizing reports for web and mobile analytics. With Google exposing its internal technologies as a service, namely Dremel as BigQuery, everyone can build their own analytics platform, around this core component, and customize it as they wish.
Because Landing Pages is so central to Google Analytics, it may cause you some dismay when you link your GA property to BigQuery and start building the equivalent of a Landing Pages report, and find it not straightforward. You won’t even find the Landing Page dimension.
Google made it clear that Google Analytics data in BigQuery will not include all dimensions and metrics available in GA web interface and APIs.
Landing Page is a derived dimension and the best practice in big data world is to get data in “raw” format into the data lake first, then apply transformations. This provides better support for EDA, i.e. exploratory data analysis.
Never fear: we’ll be able to reproduce the Landing Pages report in BigQuery, with the following additional benefits:
Within E-Nor, we had discussions around the correct way to describe Google Analytics data in BigQuery: raw or processed.
The argument for using “raw” was based on the missing dimensions and metrics. In addition, that’s how data scientists call data as it enters the data lake.
Also tipping the scales towards “raw” is the fact that BigQuery data is completely unsampled. While applying a secondary dimension or a segment may cause sampling in the GA UI, no query in BigQuery will ever cause the skewing that sampling may cause in other environments.
The argument for using “processed” was based on hits being grouped into sessions in GA before exporting the data to BigQuery as well as Google Analytics view filters and settings being applied to hits in GA before the export, such as excluding internal traffic or consolidation of page variations.
Google doesn’t use the word “raw,” but many consultants and practitioners do.
With this discussion as background, let’s focus on producing the landing page report.
The following query will retrieve a simplified Landing Pages report:
First, replace the term XXX_VIEWID_YYY with your actual view id, which is also the BigQuery dataset. Second, change the date range as you see fit, then run the query.
SELECT LandingPage, COUNT(sessionId) AS Sessions, 100 * SUM(totals.bounces)/COUNT(sessionId) AS BounceRate, AVG(totals.pageviews) AS AvgPageviews, SUM(totals.timeOnSite)/COUNT(sessionId) AS AvgTimeOnSite, from( SELECT CONCAT(fullVisitorId,STRING(visitId)) AS sessionID, totals.bounces, totals.pageviews, totals.timeOnSite, hits.page.pagePath AS landingPage FROM ( SELECT fullVisitorId, visitId, MIN(hits.hitNumber) WITHIN RECORD AS firstHit, hits.hitNumber AS hitNumber, hits.type, hits.page.pagePath, totals.bounces, totals.pageviews, totals.timeOnSite, FROM (TABLE_DATE_RANGE ([XXX_VIEWID_YYY.ga_sessions_],TIMESTAMP('2016-08-01'), TIMESTAMP ('2016-08-31')))) WHERE hitNumber = firstHit AND hits.type = 'PAGE' AND hits.page.pagePath != '') GROUP BY LandingPage ORDER BY Sessions DESC, LandingPage
We can change the query a little bit and add a secondary dimension, say, source and medium. In GA, we have a limit of 2 dimensions in standard reports and 5 in custom reports, though in BigQuery, we have virtually no limits.
SELECT LandingPage, Source_Medium, COUNT(sessionId) AS Sessions, 100 * SUM(totals.bounces)/COUNT(sessionId) AS BounceRate, AVG(totals.pageviews) AS AvgPageviews, SUM(totals.timeOnSite)/COUNT(sessionId) AS AvgTimeOnSite, from( SELECT CONCAT(fullVisitorId,STRING(visitId)) AS sessionID, Source_Medium, totals.bounces, totals.pageviews, totals.timeOnSite, hits.page.pagePath AS landingPage FROM ( SELECT fullVisitorId, visitId, trafficSource.source +'/'+ trafficSource.medium as Source_Medium, MIN(hits.hitNumber) WITHIN RECORD AS firstHit, hits.hitNumber AS hitNumber, hits.type, hits.page.pagePath, totals.bounces, totals.pageviews, totals.timeOnSite, FROM (TABLE_DATE_RANGE ([XXX_VIEWID_YYY.ga_sessions_],TIMESTAMP('2016-08-01'), TIMESTAMP ('2016-08-31')))) WHERE hitNumber = firstHit AND hits.type = 'PAGE' AND hits.page.pagePath != '') GROUP BY LandingPage, Source_Medium ORDER BY Sessions DESC, LandingPage
Finally, here is something that you can only do in BigQuery and not in GA. In BigQuery you have better granularity than GA and you can examine individual hits.
What if you get a lot of landing pages with value = ‘(not set)’?
BigQuery is better suited to examine the cause of this unexpected behavior. Here is the query to catch the hits that may be sent before the first pageview hits.
SELECT * FROM ( SELECT fullVisitorId, visitId, MIN(hits.hitNumber) WITHIN RECORD AS firstHit, hits.hitNumber AS hitNumber, hits.type, hits.isInteraction, hits.eventInfo.eventCategory, hits.eventInfo.eventAction, hits.eventInfo.eventLabel, hits.eventInfo.eventValue, hits.page.pagePath, FROM (TABLE_DATE_RANGE ([XXX_VIEWID_YYY.ga_sessions_],TIMESTAMP('2016-05-01'), TIMESTAMP ('2016-05-31')))) WHERE hitNumber = firstHit AND hits.type != 'PAGE' AND hits.isInteraction = TRUE
This hit-level analysis in BigQuery solves the (not set) issue that can in some instances appear in the Google Analytics UI or API.
As you will notice from the query, we are looking for the first hit in a session that is not a PAGEVIEW. We also needed to make sure it is not a non-interactive hit, since non-interactive EVENT hits don’t corrupt the landing page dimension.
In an investigation for a client here at E-Nor, we found out that many first hits were video EVENTS. We deduced the user was watching a video and no interaction was happening on the site until the GA session timed out. With the next hit, which was a video EVENT, a new session was started and, naturally, without a landing page.
In another case, we found a custom script that was firing EVENT hits before the pageview, causing the session to lose their landing pages.
But in exchange, we got a lot of flexibility, power, and access to very granular data we didn’t have in Google Analytics.
Each platform has its particular strengths, we can take advantage of both. As with many things in life, it all starts with choosing the right tool for the job.
As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…
Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…
Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…
This website uses cookies.