Click to call

In a world where data shapes decisions, Google Analytics 4 (GA4) has emerged from the ashes of UA as the phoenix of website data analytics — a critical tool for understanding user behavior and optimizing online experiences. You may have also heard whispers about the magic that happens when GA4 joins forces with BigQuery, an exciting prospect for those well-versed in these tools. But if phrases like “data integration” and “SQL” sound more like cryptic codes than comprehensible concepts, you’re not alone — and Odd Dog is here to help.

What unfolds is a tale of unexpected assistance from an AI named ChatGPT. This is not a story about a tech expert; it’s a journey anyone can take. So, if you’ve ever wondered what the fuss around GA4 and BigQuery is all about, join us as we venture into the realm of data integration, with ChatGPT as our guide, proving that you don’t need to speak SQL to make sense of your digital insights.

Integrate GA4 Data in BigQuery Without Knowing SQL

Since the launch of Google Analytics 4 (GA4), analytics experts have highlighted the value of linking GA4 with BigQuery. If you’re anything like me, the thought of integrating BigQuery with GA4 can be daunting, especially if you’re unfamiliar with SQL.

In my constant pursuit of biting off more than I can chew, I decided that 4 p.m. on a Thursday afternoon was the perfect time to delve into linking GA4 and BigQuery. However, I quickly realized a minor oversight: I don’t know SQL, not in the slightest. Regretting my decision to take four darkroom photography classes in college and not a single computer science course, I turned to ChatGPT. After some trial and error, I discovered that by working with ChatGPT, you can query GA4 data in BigQuery without understanding any SQL. In this article, I’ll guide you through setting up a connection between GA4 and BigQuery and demonstrate how to leverage ChatGPT to craft effective SQL statements.

What is BigQuery?

Before we can start querying GA4 data with BigQuery, we must understand what BigQuery is. In short, it is Google’s data warehouse. It allows users to store and manage data from multiple sources. Using SQL, this data can be manipulated for reporting and analysis.

Why Use BigQuery With GA4?

BigQuery provides raw, unsampled data. In contrast, GA4 employs data modeling and machine learning in its reporting, aligning with Google’s privacy-centric approach. However, a notable drawback is that GA4 data often doesn’t become available for 24–48 hours, preventing analysts from drawing insights from the most recent data. The key benefits of integrating GA4 and BigQuery include:

  • Preserving User-Specific Data: GA4 retains data only for 2 or 14 months, depending on your data retention settings.
  • Access to Unsampled Events and User-Level Data: These data points can be subject to sampling in GA4.
  • Data Ownership: You maintain complete control and ownership of your BigQuery data.
  • Advanced Data Manipulation: Craft custom SQL queries to retrieve specific data points.
  • Bypass Looker Studio Quota: Data pulled from BigQuery isn’t subject to the quotas that apply to Google Analytics 4.

How to Integrate GA4 and BigQuery

Before we get to stare blankly at the BigQuery SQL workspace, hoping to magically learn SQL, we have to set up the integration between GA4 and BigQuery.

Set Up BigQuery Project:

  • Before linking, ensure you have a Google Cloud project and BigQuery enabled. If not, go to the Google Cloud Console and create a project. Then, navigate to BigQuery and ensure it’s set up.

Link GA4 to BigQuery:

  1. Navigate to your GA4 property.
  2. Click on Admin (gear icon) at the bottom left.
  3. Under the Property column, click on BigQuery Linking.
  4. Click on the Link to start the linking process.
  5. Choose the appropriate Google Cloud project and click Continue.
  6. Choose the BigQuery dataset where you want to store the GA4 data. You can create a new dataset directly from this interface if required.
  7. Configure data stream and data retention settings as per your requirements.
  8. Click Submit to finalize the link.

Exporting Data to BigQuery:

  • Once the link is set up, GA4 will start sending daily export batches of event data to the linked BigQuery dataset.
  • Additionally, for the first link, GA4 exports a historical backlog of data (limited to the duration you specify or available data, whichever is less).

Accessing the Data in BigQuery:

  1. Navigate to the BigQuery Console.
  2. You should find your dataset under the linked project.
  3. Inside the dataset, you’ll find tables that represent your GA4 data. Typically, they are named with the format events_YYYYMMDD where YYYYMMDD is the date of the data.

Query GA4 BigQuery Data

Now comes the million-dollar question: “I’ve got my data; what’s next?” This situation reminds me of the fish in the final scene of Pixar’s “Finding Nemo”. They executed their escape plan flawlessly, only to end up floating in the ocean inside plastic bags. We’ve established a link between BigQuery and GA4, but manipulating the data requires knowledge of SQL.

The logical step might be to revisit this dataset after a year, having spent that time immersing myself in the linguistic intricacies of SQL. However, as I’ve often said, “When you don’t know something, don’t learn it; ask ChatGPT.” By describing my data needs in my own imperfect English, ChatGPT translated my words into SQL queries. Without any prior experience with SQL, armed with the right questions, and backed by my trusty digital ally, I was crafting coherent SQL statements in no time.

Working With ChatGPT to Craft Effective SQL Queries

The first step to a productive conversation with ChatGPT is to clearly outline your end goal. This involves succinctly and accurately describing what you seek from the dataset. Also, it doesn’t hurt to say ‘please’ and ‘thank you’ — best to stay on the good side of robots, just in case of an AI uprising.

My goal was simple: I wanted to create a table that showed “source/medium” data for “phone_call” events, especially since the CallRail GA4 integration wasn’t attributing this data correctly. With that in mind, I asked my first question:

Hi! Can you write a SQL query to pull all source medium data from the events containing 'phone_call' in its name. This is the data table: client-395400.analytics_290292296.events_20230815?

GPT Responded:

WITH FlattenedData AS ( SELECT event_name, param.key AS event_param_key, param.value.string_value AS event_param_value FROM `client-395400.analytics_290292296.events_20230815`, UNNEST(event_params) AS param WHERE event_name LIKE '%phone_call%' ) SELECT event_name, CASE WHEN event_param_key = 'source' THEN event_param_value ELSE NULL END AS source, CASE WHEN event_param_key = 'medium' THEN event_param_value ELSE NULL END AS medium FROM FlattenedData WHERE event_param_key IN ('source', 'medium');

Unfortunately, the resulting data covered only a single day. The query specifically points to the date “client-395400.analytics_290292296.events_20230815“. I now needed to retrieve data from additional days. With that in mind, I posed my next question:

How do I select from datasets from multiple date ranges? I’m looking at table IDs from client-395400.analytics_290292296.events_20230808 to client-395400.analytics_290292296.events_20230816.

With that prompt, GPT produced another SQL statement:

WITH FlattenedData AS ( SELECT event_date, event_name, param.key AS event_param_key, param.value.string_value AS event_param_value FROM `datatable_id20230808*`, UNNEST(event_params) AS param WHERE event_name LIKE '%phone_call%' AND _TABLE_SUFFIX BETWEEN '08' AND '16' ) SELECT event_date, event_name, CASE WHEN event_param_key = 'source' THEN event_param_value ELSE NULL END AS source, CASE WHEN event_param_key = 'medium' THEN event_param_value ELSE NULL END AS medium FROM FlattenedData WHERE event_param_key IN ('source', 'medium');

ChatGPT clued me in on the wildcard (*) table feature, which, to be honest, is probably something someone who actually knows SQL would know.

Then, there was another challenge. I had to create a custom “source/medium” dimension because the event parameters only gave me “source” and “medium” separately. To accomplish this took a lot of human/robot synergy. To craft an SQL statement that tackled this issue, I asked:

Can you make a new parameter that combines 'source' & 'medium' into a field called 'source / medium'?

However, the SQL that ChatGPT generated gave me an error. So, I responded with:

Hey, we got an error here. It says 'Invalid field name "source / medium". Fields must have the right characters and be at most 300 characters long.

ChatGPT tried again, but we encountered another error.

Feeling overwhelmed by the SQL intricacies and clueless about the root of the error, I simply pleaded:

HELP, HOW DO I FIX THIS!!!!!!!!!!

In return, ChatGPT supplied yet another SQL query with errors.

Changing tactics, I pulled up the JSON schema of the event parameters (a language I’m thankfully familiar with) and shared it with ChatGPT. Armed with that information, I posed my question:

Can you give me a new parameter that brings together 'source' & 'medium' into a 'source/medium' field?

Given the additional context, ChatGPT crafted another SQL statement:

WITH FlattenedData AS ( SELECT event_date, event_name, param.key AS event_param_key, param.value.string_value AS event_param_value FROM `datatable_id20230808*`, UNNEST(event_params) AS param WHERE event_name LIKE '%phone_call%' AND _TABLE_SUFFIX BETWEEN '08' AND '16' ) SELECT event_date, event_name, CASE WHEN event_param_key = 'source' THEN event_param_value ELSE NULL END AS source, CASE WHEN event_param_key = 'medium' THEN event_param_value ELSE NULL END AS medium FROM FlattenedData WHERE event_param_key IN ('source', 'medium'); Query Results

And there it was! Exactly what I needed. Using the “explore data” feature in BigQuery, I loaded my newly created data table into Looker Studio and started experimenting with various charts and graphs.

Final Thoughts

Diving into the world of BigQuery without a shred of SQL knowledge might seem like a bold endeavor if not a foolhardy one. Yet, as I found out, in the digital age of ever-evolving technologies and platforms, you don’t always need traditional training to make significant strides. With the right tools, a dash of determination, and a little help from our (AI) friends we’re equipped to tackle most technical challenges.

Embarking on your own Google Analytics Journey? We Can Help

At Odd Dog, we blend data-driven insights with innovative marketing strategies. Understanding your data doesn’t have to feel far-fetched, schedule a free data consultation to get started.  Take control of your data and find actionable insights that have a real impact on your bottom line.

×
Prefer Email? Say Hello!
[email protected]
Talk With Us!
206-705-4605
Book a Meeting!

Please use the calendar to the right to book an initial meeting with our sales team.

Fill Out the Form Below!

Have a specific service in mind or looking to share more about your project goals? Fill out the form on this page and one of our sales specialist will get back to you ASAP.