In a world where data shapes decisions, Google Analytics 4 (GA4) has emerged from the ashes of UA as the phoenix of website data analytics — a critical tool for understanding user behavior and optimizing online experiences. You may have also heard whispers about the magic that happens when GA4 joins forces with BigQuery, an exciting prospect for those well-versed in these tools. But if phrases like “data integration” and “SQL” sound more like cryptic codes than comprehensible concepts, you’re not alone — and Odd Dog is here to help.
What unfolds is a tale of unexpected assistance from an AI named ChatGPT. This is not a story about a tech expert; it’s a journey anyone can take. So, if you’ve ever wondered what the fuss around GA4 and BigQuery is all about, join us as we venture into the realm of data integration, with ChatGPT as our guide, proving that you don’t need to speak SQL to make sense of your digital insights.
Integrate GA4 Data in BigQuery Without Knowing SQL
Since the launch of Google Analytics 4 (GA4), analytics experts have highlighted the value of linking GA4 with BigQuery. If you’re anything like me, the thought of integrating BigQuery with GA4 can be daunting, especially if you’re unfamiliar with SQL.
In my constant pursuit of biting off more than I can chew, I decided that 4 p.m. on a Thursday afternoon was the perfect time to delve into linking GA4 and BigQuery. However, I quickly realized a minor oversight: I don’t know SQL, not in the slightest. Regretting my decision to take four darkroom photography classes in college and not a single computer science course, I turned to ChatGPT. After some trial and error, I discovered that by working with ChatGPT, you can query GA4 data in BigQuery without understanding any SQL. In this article, I’ll guide you through setting up a connection between GA4 and BigQuery and demonstrate how to leverage ChatGPT to craft effective SQL statements.
What is BigQuery?
Before we can start querying GA4 data with BigQuery, we must understand what BigQuery is. In short, it is Google’s data warehouse. It allows users to store and manage data from multiple sources. Using SQL, this data can be manipulated for reporting and analysis.
Why Use BigQuery With GA4?
BigQuery provides raw, unsampled data. In contrast, GA4 employs data modeling and machine learning in its reporting, aligning with Google’s privacy-centric approach. However, a notable drawback is that GA4 data often doesn’t become available for 24–48 hours, preventing analysts from drawing insights from the most recent data. The key benefits of integrating GA4 and BigQuery include:
- Preserving User-Specific Data: GA4 retains data only for 2 or 14 months, depending on your data retention settings.
- Access to Unsampled Events and User-Level Data: These data points can be subject to sampling in GA4.
- Data Ownership: You maintain complete control and ownership of your BigQuery data.
- Advanced Data Manipulation: Craft custom SQL queries to retrieve specific data points.
- Bypass Looker Studio Quota: Data pulled from BigQuery isn’t subject to the quotas that apply to Google Analytics 4.
How to Integrate GA4 and BigQuery
Before we get to stare blankly at the BigQuery SQL workspace, hoping to magically learn SQL, we have to set up the integration between GA4 and BigQuery.
Set Up BigQuery Project:
- Before linking, ensure you have a Google Cloud project and BigQuery enabled. If not, go to the Google Cloud Console and create a project. Then, navigate to BigQuery and ensure it’s set up.
Link GA4 to BigQuery:
- Navigate to your GA4 property.
- Click on Admin (gear icon) at the bottom left.
- Under the Property column, click on BigQuery Linking.
- Click on the Link to start the linking process.
- Choose the appropriate Google Cloud project and click Continue.
- Choose the BigQuery dataset where you want to store the GA4 data. You can create a new dataset directly from this interface if required.
- Configure data stream and data retention settings as per your requirements.
- Click Submit to finalize the link.
Exporting Data to BigQuery:
- Once the link is set up, GA4 will start sending daily export batches of event data to the linked BigQuery dataset.
- Additionally, for the first link, GA4 exports a historical backlog of data (limited to the duration you specify or available data, whichever is less).
Accessing the Data in BigQuery:
- Navigate to the BigQuery Console.
- You should find your dataset under the linked project.
- Inside the dataset, you’ll find tables that represent your GA4 data. Typically, they are named with the format events_YYYYMMDD where YYYYMMDD is the date of the data.
Query GA4 BigQuery Data
Now comes the million-dollar question: “I’ve got my data; what’s next?” This situation reminds me of the fish in the final scene of Pixar’s “Finding Nemo”. They executed their escape plan flawlessly, only to end up floating in the ocean inside plastic bags. We’ve established a link between BigQuery and GA4, but manipulating the data requires knowledge of SQL.
The logical step might be to revisit this dataset after a year, having spent that time immersing myself in the linguistic intricacies of SQL. However, as I’ve often said, “When you don’t know something, don’t learn it; ask ChatGPT.” By describing my data needs in my own imperfect English, ChatGPT translated my words into SQL queries. Without any prior experience with SQL, armed with the right questions, and backed by my trusty digital ally, I was crafting coherent SQL statements in no time.
Working With ChatGPT to Craft Effective SQL Queries
The first step to a productive conversation with ChatGPT is to clearly outline your end goal. This involves succinctly and accurately describing what you seek from the dataset. Also, it doesn’t hurt to say ‘please’ and ‘thank you’ — best to stay on the good side of robots, just in case of an AI uprising.
My goal was simple: I wanted to create a table that showed “source/medium” data for “phone_call” events, especially since the CallRail GA4 integration wasn’t attributing this data correctly. With that in mind, I asked my first question: