I am really fascinated by BigQuery. The ease of use and the speed it offers to handle really large amounts of data is just amazing. In this post, I’ll highlight the operational benefit of BigQuery over the on-premise Big Data platforms. We’ll then begin filling our data lake with Google Analytics 360 data as just one of the many data sets for storage in BigQuery as a first step in integration, visualization, and advanced reporting.
As an editorial note, Google Analytics Premium is now called Analytics 360 within the Google Analytics 360 Suite. We’ll alternatively refer to it as Analytics 360 or Google Analytics as the context warrants.
As a Big Data platform, BigQuery offers benefits for multiple stages and roles in the Big Data process:
For marketers and analysts, you can run ad hoc queries and get the results within minutes or seconds. The elusive quest for understanding online and offline attribution, user funnels, and long-term customer value comes within reach.
For data engineers, BigQuery offers a tremendous operational benefit, as outlined in the next section.
With BigQuery, there’s no need to worry about all the infrastructure preparation and the dedicated resources that on-premise solutions would require. In contrast to on-premise platforms such as Hadoop and Spark, BigQuery is a truly no-ops cloud-based solution that requires no resources for maintaining hardware infrastructure, software updates, and automatic adding and removing servers according to usage.
The no-ops benefit of BigQuery by itself makes BigQuery a great choice as a proof of concept for your Big Data platform that can likely evolve into a solution for all your long-term Big Data needs.
For E-Nor, the export of Google Analytics data into BigQuery was a game changer.
Google Analytics was already supporting imported data from other sources, but an inverse approach–exporting Google Analytics data into BigQuery–offers great advantages:
Once you have incorporated all your enterprise data in one repository and you begin using unified tools for exploratory data analysis, the correlations and the insights begin to emerge. If you have a business that could be affected by weather, you could integrate publicly available meteorological data sets with your Google Analytics data sets to correlate weather with purchases and thereby identify the most opportune times to activate marketing campaigns and special offers.
BigQuery is not an add-on to Analytics 360; it’s a separate Google product. More accurately, it’s a component of Google Cloud Platform (GCP). That’s why we start by creating a GCP project and enabling BigQuery API and billing. BigQuery uses a different billing model from Analytics 360. All GCP components follow the pay-as-you-go model. That is, your consumption is measured and billed at the end of every month. BigQuery costs depend on 3 factors: storage, queries, and direct insertion of data into BigQuery.
Google has provided a great tool to estimate and plan your usage and cost: https://cloud.google.com/products/calculator/
As an Analytics 360 user, you get $500 credit per month to use against BigQuery usage. Since BigQuery storage is priced so competitively, this $500 can accommodate a great deal of your enterprise data.
Before we can analyze Google Analytics data in BigQuery, we must get the data into the platform. Fortunately, Analytics 360 provides a very straightforward set-it-and-forget-it nightly export from Google Analytics to BigQuery (Linking between standard Analytics and BigQuery is not supported.) As BigQuery adoption grows in the analytics community, we expect more Google, and non-Google, products will support linking to BigQuery. Google’s Firebase Analytics for mobile app measurement, as one example, has already taken this step of automated export to BigQuery.
For a step-by-step guide to configuring the Analytics 360 export to Big Query, see: https://support.google.com/analytics/answer/3437618
Once you have completed the linking procedure, you should start seeing Google Analytics data in your BigQuery project within 24 hours. Google will also provide a historical export of analytics data. As of the publication of this post, the historical export limit is the smaller between 10 billion hits and 13 months of data. Historical export will happen in parallel with the daily export.
Once you start seeing GA data in BigQuery, you will notice that it is the raw analytics data, at the hit level (that is, at pageview, event, and e-commerce transaction level rather than session or user level).
This granularity is a big advantage if you’re investigating an issue or if you want to build a report that’s totally different than Google’s standard or custom reports.
On the flip side, it also means you have to write queries to calculate some session- and user-scope metrics that are available directly in Google Analytics interface.
In an example of issue investigation, BigQuery allowed us to quickly get to the bottom of an issue we observed in a particular Google Analytics property: many sessions with (not set) as the Landing Page. Experimentation with a few queries revealed that video events were being sent to Google Analytics after session timeout and without a new pageview due to long-duration videos and the session reset at midnight.
In addition to the benefits outlined in previous sections, another reason for exporting Analytics 360 data to BigQuery is to avoid sampling altogether. All hits will be grouped into sessions and exported to BigQuery, so you can be confident that all analysis and visualizations, for any combination of dimensions and metrics, will be based on your complete, unsampled Google Analytics data set.
First, the schema is always evolving and improving, with new fields being added and old fields being deprecated, so be sure to bookmark the link below and keep checking the schema for updates.
Here is a reference to the schema for exported Google Analytics data within BigQuery: https://support.google.com/analytics/answer/3437719
Second, we should note the differences between the data in Google Analytics Interface and BigQuery: some of these differences are minor, such as the hit type is called PAGE in BigQuery and not pageview as in Google Analytics, while others are bigger than expected, as described below.
It may be helpful to start by highlighting some key differences between BigQuery and traditional databases, and why some core concepts and best practices apply to RDBMSs and not BigQuery:
SELECT trafficSource.source + ' / ' + trafficSource.medium AS source_medium, COUNT(DISTINCT CONCAT(fullVisitorId, STRING(visitId)), 100000) AS sessions, SUM(totals.bounces) AS bounces, 100 * SUM(totals.bounces) / COUNT(DISTINCT CONCAT(fullVisitorId, STRING(visitId)), 100000) AS bounce_rate, FROM (TABLE_DATE_RANGE([XXXYYYZZZ.ga_sessions_], TIMESTAMP('2016-05-01'), TIMESTAMP('2016-05-31'))) GROUP BY source_medium ORDER BY sessions DESC
In the query above, replace XXXYYYZZZ with the ID of the Google Analytics view that you are exporting to BigQuery.
A few other points to note about the query syntax:
Here are 2 useful references for the query language:
What results does the query return?
In this post, we’ve only dipped our toe into the BigQuery data lake. We’ve set up a BigQuery project, configured the export of Analytics 360 data to BigQuery, reviewed the Google Analytics schema within BigQuery, and run a basic query on our data..
There are so many other ways to enjoy the BigQuery data lake. In our next posts, we’ll go in up to our knees with more advanced analysis of Google Analytics content data within BigQuery, and then we’ll begin filling up the data lake with additional data sets and really start splashing around.
As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…
Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…
Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…
This website uses cookies.