One of the questions that has arisen most frequently in my years of training and consulting for Google Analytics has been this:
What are users doing before they convert?
Or variations such as:
What’s the sequence of behaviors?
What’s happening before what?
It is of course critical that we, as analysts striving to understand and optimize the experience of our end users and to maximize value for our organizations, can answer these questions.
As a happy coincidence with the publication of this blog post, the sample export of New Google Analytics data to BigQuery for the Google Merchandise Store became available just yesterday. Data Studio dashboards based on this dataset appear below.
Behavioral Attribution/Precedence
I sometimes consider these to be questions of behavioral attribution. While attribution in digital analytics normally refers to traffic sources, behavioral attribution could be considered as the quest to understand which behaviors are supporting other behaviors. It relates more closely to content attribution as methodology for determining which types of digital content are supporting conversions.
Attribution as a term, however, is quite fraught, since chronology does not necessarily indicate causality, so let’s back away a bit and say that we’re trying to solve for objective, observable behavioral sequencing or precedence.
This blog post considers behavioral precedence with New Google Analytics (i.e., Google Analytics 4) data. Specifically, we’ll generate table-formatted (as opposed to graphical) reporting that demonstrates events and pageviews preceding a target event, whether or not you have designated the target event as a conversion in your Google Analytics configuration.
The solution will take advantage of the following key capabilities:
First, let’s briefly consider behavioral precedence in Google Universal Analytics.
Google Universal Analytics provides a variety of reporting for behavioral precedence.
Google Universal Analytics reports for understanding the sequence of behaviors include Behavior Flow and Events Flow, Funnel Visualization and Goal flow if you have configured goal funnels, and Ecommerce Shopping Behavior and Ecommerce Checkout Behavior if you are tracking the corresponding ecommerce actions. Google Universal Analytics 360 also allows you to configure Custom Funnels and Funnel Analyses.
These reports serve to illustrate specific behavioral sequences. The funnel reports are based on steps that you configure. The flow reports, while more open-ended and exploratory, also tend to be more useful when you drill down into specific behavioral paths.
Google Universal Analytics provides many visualizations that illustrate the sequence of user actions.
The Reverse Goal Path report, for its part, provides insights into pages viewed before conversion.
The strength of these reports is perhaps also their limitation. While they allow you to map specific flows and, in some cases, isolate drop-off points, they don’t provide an at-a-glance overview of event precedence expressed numerically.
The Page Value metric in Google Universal Analytics does serve as a numeric indicator of behavioral precedence.
Page Value in Google Universal Analytics can indicate the support that pages are providing to conversions, or at least precedence.
While perhaps seeming complex at first, Page Value proposes a simple and quite elegant formula for calculating which pages are providing the greatest conversion benefit:
Page Value thus demonstrates, as a metric, the degree of support that pages are providing to conversions, or at least the degree to which the pages are preceding conversions. Low page value indicates that the page is viewed repeatedly but conversions don’t follow, which may mean that the page is a dud. High page value, conversely, likely indicates a winner.
Page Value does have some limitations:
The solution outlined below for event precedence in New Google Analytics builds upon the formula for Page Value in New Google Analytics.
The New Google Analytics also offers a variety of very beneficial funnel and pathing reports, but we can take advantage of the event-level timestamps stored in BigQuery to create some of our own sequencing calculations and visualizations.
Let’s check out one of the Data Studio dashboards and circle back to the logic and setup.
You can dynamically pass SQL parameters to the query underlying the dashboard above by entering a different Event Name as the Target Event, changing the Lookback, or selecting a different date range.
Note that you can’t populate a Data Studio filter control through a query; you can either specify each value selection when you define the SQL parameter in the data source, or you can leave the filter open-ended, as is the case here.
One of the main questions that I was first trying to answer with the dataset was:
Which other events are preceding the sign_up event?
Viewing this dashboard, I can make a couple of quick observations:
These types of observations can help us to generate hypotheses for usability evaluations and A/B testing so we can correlate behaviors with even greater confidence and take steps for optimization.
On the implementation end, it would be better to break unsuccessful logins out as a separate event. Now that we can observe event sequences more easily, we can identify opportunities to improve the data capture itself.
While this solution does not require the query to actually reside within BigQuery, it’s much easier to work the query out in the BigQuery UI and then copy and parameterize in Data Studio.
–Need CTE to calculate EventPrecedence, since you can’t do calculation directly on the aliases
WITH
calc_wrapper AS(
WITH
target_event_timestamps AS (
SELECT
event_timestamp,
user_pseudo_id
FROM
`tribal-bonito-298022.analytics_257380306.events_*`
WHERE
_TABLE_SUFFIX BETWEEN ‘20210101’
AND FORMAT_DATETIME(“%Y%m%d”,
CURRENT_DATE())
AND event_name = ‘sign_up’),
distinct_event_user AS (
SELECT
DISTINCT event_name,
user_pseudo_id
FROM
`tribal-bonito-298022.analytics_257380306.events_*`
WHERE
_TABLE_SUFFIX BETWEEN ‘20210101’
AND FORMAT_DATETIME(“%Y%m%d”,
CURRENT_DATE()) )
SELECT
event_name AS EventName,
COUNT(*) AS EventsBeforeSignup,
–subquery for total events by event name, deduped by user
(
SELECT
COUNT(*)
FROM
distinct_event_user d
WHERE
t2.event_name = d.event_name) AS TotalEvents
–end subquery
FROM (
–nested query for events before signup by event name, deduped by user
SELECT
DISTINCT event_name,
t.user_pseudo_id
FROM
`tribal-bonito-298022.analytics_257380306.events_*` t
JOIN
target_event_timestamps s
–only returning events that occurred less than 30 minutes before signup
ON
t.user_pseudo_id = s.user_pseudo_id
AND t.event_name != ‘sign_up’
AND s.event_timestamp > t.event_timestamp
AND s.event_timestamp – t.event_timestamp < 30 * (1000000 * 60)
WHERE
_TABLE_SUFFIX BETWEEN ‘20210101’
AND FORMAT_DATETIME(“%Y%m%d”,
CURRENT_DATE())) AS t2
GROUP BY
event_name)
SELECT
EventName,
EventsBeforeSignup,
TotalEvents,
ROUND(EventsBeforeSignup/TotalEvents,4) AS PrecedencePercent
FROM
calc_wrapper
ORDER BY
EventName
This query is designed to generate a Precedence Percent calculation:
Note that this query is based on users rather than sessions. The lookback window is variable and can encompass one or more sessions.
To visualize the results as a heatmap and to allow dynamic SQL parameters for target event, lookback, and data selection, I took the following steps in Data Studio:
In a BigQuery custom query data source within Data Studio, you can enable date range parameters and define additional parameters to pass dynamically to BigQuery.
For the SQL parameters to function, you need to replace the corresponding hard values in the query:
–Need CTE to calculate EventPrecedence, since you can’t do calculation directly on the aliases
WITH
calc_wrapper AS(
WITH
–target_event_timestamps now corresponds to any target event selected
target_event_timestamps AS (
SELECT
event_timestamp,
user_pseudo_id
FROM
`tribal-bonito-298022.analytics_257380306.events_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_START_DATE))
AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_END_DATE))
AND event_name = @event_name),
distinct_event_user AS (
SELECT
DISTINCT event_name,
user_pseudo_id
FROM
`tribal-bonito-298022.analytics_257380306.events_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_START_DATE))
AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_END_DATE)) )
SELECT
event_name AS EventName,
COUNT(*) AS EventsBeforeSignup,
–subquery for total events by event name, deduped by user
(
SELECT
COUNT(*)
FROM
distinct_event_user d
WHERE
t2.event_name = d.event_name) AS TotalEvents
–end subquery
FROM (
–nested query for events before signup by event name, deduped by user
SELECT
DISTINCT event_name,
t.user_pseudo_id
FROM
`tribal-bonito-298022.analytics_257380306.events_*` t
JOIN
target_event_timestamps s
–only returning events that occurred less than x minutes before signup
ON
t.user_pseudo_id = s.user_pseudo_id
AND t.event_name != @event_name
AND s.event_timestamp > t.event_timestamp
AND TIMESTAMP_DIFF(DATETIME(TIMESTAMP_MICROS(s.event_timestamp)), DATETIME(TIMESTAMP_MICROS(t.event_timestamp)),MINUTE)<@lookback
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”, @DS_START_DATE))
AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”, @DS_END_DATE))) AS t2
GROUP BY
event_name)
SELECT
EventName,
EventsBeforeSignup,
TotalEvents,
ROUND(EventsBeforeSignup/TotalEvents,4) AS PrecedencePercent
FROM
calc_wrapper
ORDER BY
EventName
SQL Parameter Naming in the Custom BigQuery Data Source
When you’re first creating a SQL parameter and specifying a Parameter Name, make it simple. The name will be mirrored as the Parameter ID that you can use as the placeholder within your custom query.
Once you have Connected/Reconnected to the data source, you can provide a more descriptive or user-friendly Parameter Name to appear in an associated filter control, as illustrated by Target Event and Lookback (Minutes) in the dashboard embedded above, without changing the Parameter ID.
You can also specify a default value.
When you’re creating a SQL parameter, specify the ID (such as event_name) that you’re going to use within the query. After you’ve connected to the data source, you can specify a more descriptive or user-friendly parameter name (such as Target Event) that will appear in a filter control for which you specify the parameter as the Control Field.
As a variation of the event precedence example, the following example is based on pages viewed before a target event. (The pages that appear in the dashboard below reside on googleanalyticstest.com, so the page titles themselves are about Google Analytics.)
You can change the SQL parameter values as in the dashboard above to refresh the display.
Note that the query variation for pageview precedence required me to unnest page_title as a parameter associated with a page_view event. If you’re used to querying flattened SQL tables and are not yet familiar with the unnesting process that’s required for many BigQuery queries of Google Analytics data, the extra steps may seem a bit unintuitive at first, but they are pretty easy to incorporate, and certainly essential, given the central role of parameters in the New Google Analytics.
–Need CTE to calculate EventPrecedence, since you can’t do calculation directly on the aliases
WITH
calc_wrapper AS(
WITH
target_event_timestamps AS (
SELECT
event_timestamp,
user_pseudo_id
FROM
`tribal-bonito-298022.analytics_257380306.events_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_START_DATE))
AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_END_DATE))
AND event_name = @event_name),
distinct_page_user AS (
SELECT
paramsDistinct.value.string_value AS Page,
user_pseudo_id,
MIN(event_timestamp) AS MinTimestamp
FROM
`tribal-bonito-298022.analytics_257380306.events_*`
CROSS JOIN
UNNEST (event_params) AS paramsDistinct
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_START_DATE))
AND FORMAT_DATETIME(“%Y%m%d”, PARSE_DATE(“%Y%m%d”,@DS_END_DATE))
AND event_name = ‘page_view’ AND paramsDistinct.key = ‘page_title’
GROUP BY
Page,
user_pseudo_id)
SELECT
Page,
COUNT(*) AS PageviewsBeforeSignup,
–subquery for total events by event name, deduped by user
(
SELECT
COUNT(*)
FROM
distinct_page_user d
WHERE
d2.Page = d.Page) AS TotalPageviews
–end subquery
FROM
distinct_Page_User d2
JOIN
target_event_timestamps s
–only returning events that occurred less than 30 minutes before signup
ON
d2.user_pseudo_id = s.user_pseudo_id
AND s.event_timestamp > d2.MinTimestamp
AND s.event_timestamp – d2.MinTimestamp < @lookback * (1000000 * 60)
GROUP BY
Page)
SELECT
Page,
PageviewsBeforeSignup,
TotalPageviews,
ROUND(PageviewsBeforeSignup/TotalPageviews,4) AS PrecedencePercent
FROM
calc_wrapper
WHERE
Page IS NOT NULL
AND
TotalPageViews > 10
ORDER BY
Page
The event and pageview precedence dashboards below are nearly identical in logic and configuration to those above, but as their data source, they use the sample export of New Google Analytics data to BigQuery for the Google Merchandise Store.
Note that the available data range in the dataset is Nov. 1, 2020 through Jan. 31, 2021, as displayed in the dashboards
Different Approaches, Next Steps
Note that there is usually more than one way to approach a query and always more than one way to present the results. My colleagues on the Data Science and Analysis & Insights teams at Cardinal Path may have approached this challenge from a different angle and drawn different kinds of insights, as may you and your teams.
I hope, in any case, that my own approach has provided some actionable tips for leveraging BigQuery and Data Studio as you analyze and visualize your data capture in New Google Analytics.
Please feel free to reuse and adapt the resources below, and be sure to share your creations, learnings, and insights.
Event precedence query in BigQuery
Pageview precedence query in BigQuery
Event precedence dashboard in Data Studio
Pageview precedence dashboard in Data Studio
Event precedence dashboard in Data Studio – Google Merchandise Store
Pageview precedence dashboard in Data Studio – Google Merchandise Store
As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…
Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…
Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…
This website uses cookies.