GMP Hub

Dynamic Event Precedence in New Google Analytics (GA4) with BigQuery and Data Studio

One of the questions that has arisen most frequently in my years of training and consulting for Google Analytics has been this:

What are users doing before they convert?

Or variations such as:

What’s the sequence of behaviors?

What’s happening before what?

It is of course critical that we, as analysts striving to understand and optimize the experience of our end users and to maximize value for our organizations, can answer these questions.


As a happy coincidence with the publication of this blog post, the sample export of New Google Analytics data to BigQuery for the Google Merchandise Store became available just yesterday. Data Studio dashboards based on this dataset appear below.

Behavioral Attribution/Precedence

I sometimes consider these to be questions of behavioral attribution.  While attribution in digital analytics normally refers to traffic sources, behavioral attribution could be considered as the quest to understand which behaviors are supporting other behaviors. It relates more closely to content attribution as methodology for determining which types of digital content are supporting conversions.

Attribution as a term, however, is quite fraught, since chronology does not necessarily indicate causality, so let’s back away a bit and say that we’re trying to solve for objective, observable behavioral sequencing or precedence.

Behavioral Precedence for New Google Analytics

This blog post considers behavioral precedence with New Google Analytics (i.e., Google Analytics 4) data.  Specifically, we’ll generate table-formatted (as opposed to graphical) reporting that demonstrates events and pageviews preceding a target event, whether or not you have designated the target event as a conversion in your Google Analytics configuration.

The solution will take advantage of the following key capabilities:

First, let’s briefly consider behavioral precedence in Google Universal Analytics.

Google Universal Analytics

Google Universal Analytics provides a variety of reporting for behavioral precedence.

Flows and Funnels

Google Universal Analytics reports for understanding the sequence of behaviors include Behavior Flow and Events Flow, Funnel Visualization and Goal flow if you have configured goal funnels, and Ecommerce Shopping Behavior and Ecommerce Checkout Behavior if you are tracking the corresponding ecommerce actions. Google Universal Analytics 360 also allows you to configure Custom Funnels and Funnel Analyses.

These reports serve to illustrate specific behavioral sequences. The funnel reports are based on steps that you configure. The flow reports, while more open-ended and exploratory, also tend to be more useful when you drill down into specific behavioral paths.

Google Universal Analytics provides many visualizations that illustrate the sequence of user actions.

The Reverse Goal Path report, for its part, provides insights into pages viewed before conversion.

The strength of these reports is perhaps also their limitation.  While they allow you to map specific flows and, in some cases, isolate drop-off points, they don’t provide an at-a-glance overview of event precedence expressed numerically.

Page Value

The Page Value metric in Google Universal Analytics does serve as a numeric indicator of behavioral precedence.

Page Value in Google Universal Analytics can indicate the support that pages are providing to conversions, or at least precedence.

While perhaps seeming complex at first, Page Value proposes a simple and quite elegant formula for calculating which pages are providing the greatest conversion benefit:

Page Value thus demonstrates, as a metric, the degree of support that pages are providing to conversions, or at least the degree to which the pages are preceding conversions.  Low page value indicates that the page is viewed repeatedly but conversions don’t follow, which may mean that the page is a dud.  High page value, conversely, likely indicates a winner.

Page Value does have some limitations:

  • It’s session-scope, so a page viewed in session A doesn’t get a Page Value boost for a conversion in a later session by the same user.
  • If you don’t have a value assigned to your goals, the goals aren’t considered in the Page Value calculation.
  • There is no equivalent of Page Value for events. (Event Value is unrelated.)
  • If you have defined multiple conversion types with Revenue or Goal Value enabled in the same Google Universal Analytics view, Page Value will be useful in indicating overall conversion support, but isolating Page Value relative to Ecommerce only or a specific goal will be challenging in the reporting UI.

The solution outlined below for event precedence in New Google Analytics builds upon the formula for Page Value in New Google Analytics.

Event Precedence in New Google Analytics (GA4)

The New Google Analytics also offers a variety of very beneficial funnel and pathing reports, but we can take advantage of the event-level timestamps stored in BigQuery to create some of our own sequencing calculations and visualizations.

Let’s check out one of the Data Studio dashboards and circle back to the logic and setup.

 

SQL Parameters – Test them Out!

You can dynamically pass SQL parameters to the query underlying the dashboard above by entering a different Event Name as the Target Event, changing the Lookback, or selecting a different date range.

Note that you can’t populate a Data Studio filter control through a query; you can either specify each value selection when you define the SQL parameter in the data source, or you can leave the filter open-ended, as is the case here.

Observations, Insights, and Actions

One of the main questions that I was first trying to answer with the dataset was:

Which other events are preceding the sign_up event?

Viewing this dashboard, I can make a couple of quick observations:

  • A significant number of users are trying to log in before signing up.  Since this represents some friction, it would probably make sense to try to improve the user experience.
  • The view_search_results event showed a high PrecedencePercent, but there were very few occurrences of the event overall.  A recommendation would be to make sure that the search function is easily accessible and usable, make UI updates as warranted, and continue to analyze the potential impact of the search function on signups.

These types of observations can help us to generate hypotheses for usability evaluations and A/B testing so we can correlate behaviors with even greater confidence and take steps for optimization.

On the implementation end, it would be better to break unsuccessful logins out as a separate event.  Now that we can observe event sequences more easily, we can identify opportunities to improve the data capture itself.

Steps in BigQuery

While this solution does not require the query to actually reside within BigQuery, it’s much easier to work the query out in the BigQuery UI and then copy and parameterize in Data Studio.

–Need CTE to calculate EventPrecedence, since you can’t do calculation directly on the aliases
WITH
 calc_wrapper AS(
 WITH
   target_event_timestamps AS (
   SELECT
     event_timestamp,
     user_pseudo_id
   FROM
     `tribal-bonito-298022.analytics_257380306.events_*`
   WHERE
     _TABLE_SUFFIX BETWEEN ‘20210101’
     AND FORMAT_DATETIME(“%Y%m%d”,
       CURRENT_DATE())
     AND event_name = ‘sign_up’),
   distinct_event_user AS (
   SELECT
     DISTINCT event_name,
     user_pseudo_id
   FROM
     `tribal-bonito-298022.analytics_257380306.events_*`
   WHERE
     _TABLE_SUFFIX BETWEEN ‘20210101’
     AND FORMAT_DATETIME(“%Y%m%d”,
       CURRENT_DATE()) )
 SELECT
   event_name AS EventName,
   COUNT(*) AS EventsBeforeSignup,
   –subquery for total events by event name, deduped by user
   (
   SELECT
     COUNT(*)
   FROM
     distinct_event_user d
   WHERE
     t2.event_name = d.event_name) AS TotalEvents
     –end subquery
 FROM (
     –nested query for events before signup by event name, deduped by user
   SELECT
     DISTINCT event_name,
     t.user_pseudo_id
   FROM
     `tribal-bonito-298022.analytics_257380306.events_*` t
   JOIN
     target_event_timestamps s
     –only returning events that occurred less than 30 minutes before signup
   ON
     t.user_pseudo_id = s.user_pseudo_id
     AND t.event_name != ‘sign_up’
     AND s.event_timestamp > t.event_timestamp
     AND s.event_timestamp – t.event_timestamp < 30 * (1000000 * 60)
   WHERE
     _TABLE_SUFFIX BETWEEN ‘20210101’
     AND FORMAT_DATETIME(“%Y%m%d”,
       CURRENT_DATE())) AS t2
 GROUP BY
   event_name)
SELECT
 EventName,
 EventsBeforeSignup,
 TotalEvents,
 ROUND(EventsBeforeSignup/TotalEvents,4) AS PrecedencePercent
FROM
 calc_wrapper
ORDER BY
  EventName

This query is designed to generate a Precedence Percent calculation:

Note that this query is based on users rather than sessions.  The lookback window is variable and can encompass one or more sessions.

Timestamps of Batched GA4 Events in BigQuery

The timestamps associated with Google Analytics 4 events exported to BigQuery are based on the batch in which the events were dispatched rather than the more specific times at which the events occurred. The queries in this blog post may thus not be as inclusive as if the timestamps were based on the actual time of the event occurrence. Each typical batch seems to include 1-4 events, all appearing with the same timestamp in BigQuery.
Since some degree of precedence remains obscured, the queries in this blog post indicate event precedence only across different batches.  In reality, there is certainly some additional precedence that cannot be queried.
Learn more about Google Analytics 4 event batching in this help article.

Steps in Data Studio

To visualize the results as a heatmap and to allow dynamic SQL parameters for target event, lookback, and data selection, I took the following steps in Data Studio:

  1. Add the query as a BigQuery > Custom Query data source.
  2. Enable date range SQL parameters in the data source.
  3. Add @event_name and @lookback as SQL parameters.
  4. Save the data source and add it to the following dashboard elements:
    1. heatmap table
    2. input control for target event (specify @target_event as the Control Field)
    3. input control for lookback (specify @lookback as the Control Field)

In a BigQuery custom query data source within Data Studio, you can enable date range parameters and define additional parameters to pass dynamically to BigQuery.

For the SQL parameters to function, you need to replace the corresponding hard values in the query:

–Need CTE to calculate EventPrecedence, since you can’t do calculation directly on the aliases
WITH
 calc_wrapper AS(
 WITH
   –target_event_timestamps now corresponds to any target event selected
   target_event_timestamps AS (
   SELECT
     event_timestamp,
     user_pseudo_id
   FROM
     `tribal-bonito-298022.analytics_257380306.events_*`
   WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_START_DATE))
      AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_END_DATE))
      AND event_name = @event_name),
   distinct_event_user AS (
   SELECT
     DISTINCT event_name,
     user_pseudo_id
   FROM
     `tribal-bonito-298022.analytics_257380306.events_*`
   WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_START_DATE))
      AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_END_DATE)) )
 SELECT
   event_name AS EventName,
   COUNT(*) AS EventsBeforeSignup,
   –subquery for total events by event name, deduped by user
   (
   SELECT
     COUNT(*)
   FROM
     distinct_event_user d
   WHERE
     t2.event_name = d.event_name) AS TotalEvents
     –end subquery
 FROM (
     –nested query for events before signup by event name, deduped by user
   SELECT
     DISTINCT event_name,
     t.user_pseudo_id
   FROM
     `tribal-bonito-298022.analytics_257380306.events_*` t
   JOIN
     target_event_timestamps s
     –only returning events that occurred less than x minutes before signup
   ON
     t.user_pseudo_id = s.user_pseudo_id
      AND t.event_name != @event_name
     AND s.event_timestamp > t.event_timestamp
      AND TIMESTAMP_DIFF(DATETIME(TIMESTAMP_MICROS(s.event_timestamp)), DATETIME(TIMESTAMP_MICROS(t.event_timestamp)),MINUTE)<@lookback
   WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”, @DS_START_DATE))
      AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”, @DS_END_DATE))) AS t2
 GROUP BY
   event_name)
SELECT
 EventName,
 EventsBeforeSignup,
 TotalEvents,
 ROUND(EventsBeforeSignup/TotalEvents,4) AS PrecedencePercent
FROM
 calc_wrapper
ORDER BY
  EventName

SQL Parameter Naming in the Custom BigQuery Data Source

When you’re first creating a SQL parameter and specifying a Parameter Name, make it simple.  The name will be mirrored as the Parameter ID that you can use as the placeholder within your custom query.

Once you have Connected/Reconnected to the data source, you can provide a more descriptive or user-friendly Parameter Name to appear in an associated filter control, as illustrated by Target Event and Lookback (Minutes) in the dashboard embedded above, without changing the Parameter ID.

You can also specify a default value.

When you’re creating a SQL parameter, specify the ID (such as event_name) that you’re going to use within the query.  After you’ve connected to the data source, you can specify a more descriptive or user-friendly parameter name (such as Target Event) that will appear in a filter control for which you specify the parameter as the Control Field.

Pageview Precedence

As a variation of the event precedence example, the following example is based on pages viewed before a target event. (The pages that appear in the dashboard below reside on googleanalyticstest.com, so the page titles themselves are about Google Analytics.)

You can change the SQL parameter values as in the dashboard above to refresh the display.

 

Note that the query variation for pageview precedence required me to unnest page_title as a parameter associated with a page_view event.  If you’re used to querying flattened SQL tables and are not yet familiar with the unnesting process that’s required for many BigQuery queries of Google Analytics data, the extra steps may seem a bit unintuitive at first, but they are pretty easy to incorporate, and certainly essential, given the central role of parameters in the New Google Analytics.

–Need CTE to calculate EventPrecedence, since you can’t do calculation directly on the aliases
WITH
 calc_wrapper AS(
 WITH
   target_event_timestamps AS (
   SELECT
     event_timestamp,
     user_pseudo_id
   FROM
     `tribal-bonito-298022.analytics_257380306.events_*`
   WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_START_DATE))
      AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_END_DATE))
      AND event_name = @event_name),
   distinct_page_user AS (
   SELECT
     paramsDistinct.value.string_value AS Page,
     user_pseudo_id,
     MIN(event_timestamp) AS MinTimestamp
   FROM
     `tribal-bonito-298022.analytics_257380306.events_*`
   CROSS JOIN
     UNNEST (event_params) AS paramsDistinct
   WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_START_DATE))
      AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_END_DATE))
     AND event_name = ‘page_view’ AND paramsDistinct.key = ‘page_title’
   GROUP BY
     Page,
     user_pseudo_id)
 SELECT
   Page,
   COUNT(*) AS PageviewsBeforeSignup,
   –subquery for total events by event name, deduped by user
   (
   SELECT
     COUNT(*)
   FROM
     distinct_page_user d
   WHERE
     d2.Page = d.Page) AS TotalPageviews
   –end subquery
 FROM
   distinct_Page_User d2
 JOIN
   target_event_timestamps s
   –only returning events that occurred less than 30 minutes before signup
 ON
   d2.user_pseudo_id = s.user_pseudo_id
   AND s.event_timestamp > d2.MinTimestamp
    AND s.event_timestamp – d2.MinTimestamp < @lookback * (1000000 * 60)
 GROUP BY
   Page)
SELECT
 Page,
 PageviewsBeforeSignup,
 TotalPageviews,
 ROUND(PageviewsBeforeSignup/TotalPageviews,4) AS PrecedencePercent
FROM
 calc_wrapper
WHERE
 Page IS NOT NULL
AND
 TotalPageViews > 10
ORDER BY
  Page

BigQuery Export for New Google Analytics (GA4) Data

The event and pageview precedence dashboards below are nearly identical in logic and configuration to those above, but as their data source, they use the sample export of New Google Analytics data to BigQuery for the Google Merchandise Store.

Note that the available data range in the dataset is Nov. 1, 2020 through Jan. 31, 2021, as displayed in the dashboards

 

Do you note any data points in the dashboards that would prompt you to investigate further?

Different Approaches, Next Steps

Note that there is usually more than one way to approach a query and always more than one way to present the results.  My colleagues on the Data Science and Analysis & Insights teams at Cardinal Path may have approached this challenge from a different angle and drawn different kinds of insights, as may you and your teams.

I hope, in any case, that my own approach has provided some actionable tips for leveraging BigQuery and Data Studio as you analyze and visualize your data capture in New Google Analytics. 

Please feel free to reuse and adapt the resources below, and be sure to share your creations, learnings, and insights.

Resources

Event precedence query in BigQuery

Pageview precedence query in BigQuery

Event precedence dashboard in Data Studio

Pageview precedence dashboard in Data Studio

Event precedence dashboard in Data Studio – Google Merchandise Store

Pageview precedence dashboard in Data Studio – Google Merchandise Store

Data Studio Playbook

CP Marketing

Share
Published by
CP Marketing

Recent Posts

Optimizing user experiences with Digital Experience Analytics (DXA) platforms

As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…

1 month ago

Enabling Value-Based Bidding with Google Tightlock

Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…

1 month ago

Resolving “Unassigned” Traffic in GA4

Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…

2 months ago

This website uses cookies.