Skip to main content
GUIDE · BIGQUERY10 min read

Export TrackLayer events to BigQuery for warehouse attribution

A production guide for sending TrackLayer events into BigQuery, with the dataset setup, service account permissions, canonical schema, SQL starting points, partitioning, cost controls, data joins, and operational checks needed for reliable warehouse attribution.

Context

Why warehouse export

Platform-native attribution is useful for daily channel operations, but it is not enough for multi-touch attribution across paid media, lifecycle, affiliates, organic search, direct, and sales-assisted paths. Each platform gives itself a particular view of credit, usually inside its own attribution window and identity graph. Exporting TrackLayer events to BigQuery gives the data team an independent event log for MTA, incrementality analysis, and finance-facing revenue reconciliation.

Warehouse export also lets you join events with data that ad and email platforms will never fully understand. CRM stages, customer tiers, subscription status, refunds, product margins, inventory availability, sales ownership, and support outcomes can all change the meaning of a conversion. When TrackLayer events sit next to CRM and product data, attribution can answer whether a campaign created profitable customers, not just whether it touched a purchase.

Custom cohort analysis becomes much easier once the event stream is queryable in SQL. You can compare signup cohorts by first source, purchase cohorts by first product viewed, repeat purchase timing by region, or median journey length by lifecycle segment. Those questions are hard to answer inside destination dashboards because the model needs your business definitions, not just the destination's reporting primitives.

Requirements

Prerequisites

01

A Google Cloud project with BigQuery enabled and permission to create datasets, tables, service accounts, IAM bindings, and JSON keys.

02

A GCP service account dedicated to TrackLayer exports, not a shared personal credential or broad application runtime identity.

03

A BigQuery dataset already created for production warehouse data, or permission to create a new dataset named tracklayer_events.

04

A TrackLayer Growth+ workspace with access to scheduled reports, export destinations, and field mapping controls.

Build

Step-by-step setup

Step 1

Create BigQuery dataset tracklayer_events

Create a dataset in the same region as the warehouse tables you expect to join most often. If your CRM mirror and spend tables live in EU, keep TrackLayer events in EU as well. Cross-region queries are awkward, slower to operate, and easier to misconfigure. Use a clear dataset name such as tracklayer_events so analysts can discover it without hunting through generic exports.

Dataset name
tracklayer_events

Location
EU

Project
acme-prod
Step 2

Create a GCP service account with BigQuery Data Editor + Job User roles

Create a dedicated service account for TrackLayer and grant only the roles needed to load rows and run export verification jobs. At minimum, assign BigQuery Data Editor on the target dataset and BigQuery Job User on the project. Avoid Owner, Editor, or broad organization-level roles. The account should be easy to rotate and easy to disable without affecting other pipelines.

Service account
tracklayer-bigquery-export@acme-prod.iam.gserviceaccount.com

Roles
roles/bigquery.dataEditor on dataset tracklayer_events
roles/bigquery.jobUser on project acme-prod
Step 3

Download service account JSON key

Create one JSON key for the service account and store it directly in TrackLayer. Treat the file like a production secret. Do not commit it to the repo, paste it into issue trackers, or reuse it in local scripts. If your security team requires key rotation, create the new key first, update TrackLayer, verify a sync, then revoke the old key.

{
  "type": "service_account",
  "project_id": "acme-prod",
  "client_email": "tracklayer-bigquery-export@acme-prod.iam.gserviceaccount.com",
  "private_key": "redacted_private_key"
}
Step 4

Add export destination in TrackLayer /reports → Schedule a BigQuery export

In TrackLayer, open /reports, choose Schedule a BigQuery export, and add a new destination. Paste the service account JSON, set the project ID, dataset, and table prefix, then choose the production environment. Keep staging and production destinations separate so test events do not mix into attribution models.

Destination
Project: acme-prod
Dataset: tracklayer_events
Table: events
Environment: production
Step 5

Choose frequency + field mapping

Most teams start with hourly exports for attribution and daily exports for heavy back-office reporting. Map canonical fields first: event_id, event_name, event_time, merchant_id, user_data, custom_data, platform_deliveries, and created_at. Keep field names stable once analysts build dashboards on top of them.

Frequency: hourly
Primary timestamp: event_time
Identity fields: em_hash, ph_hash, fbp, fbc
Raw properties: custom_data
Step 6

Verify first sync — rows appear within 5 minutes of schedule

After the first scheduled run, query the destination table for recent rows. You should see new events within five minutes of the schedule time. Validate row count, event_time, merchant_id, and the nested user_data object before connecting the table to dashboards or downstream models.

SELECT
  event_name,
  COUNT(*) AS events,
  MAX(created_at) AS latest_exported_at
FROM `acme-prod.tracklayer_events.events`
WHERE event_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
GROUP BY event_name
ORDER BY events DESC;
Schema

Canonical BigQuery fields

Keep the raw export schema compact and stable. BigQuery is best used as the durable source table, with views or marts layered on top for specific teams. The fields below are enough to support attribution, identity checks, delivery diagnostics, and downstream modeling without forcing every event-specific property into a top-level column.

FieldTypePurpose
event_idSTRINGStable unique ID for the business event, used for deduplication and traceability.
event_nameSTRINGCanonical TrackLayer event name such as view_content, add_to_cart, purchase, lead, or refund.
event_timeTIMESTAMPThe original event timestamp, not the time the export job wrote the row.
merchant_idSTRINGTrackLayer merchant or workspace identifier for multi-store reporting and access control.
user_dataSTRUCTIdentity and browser fields: em_hash, ph_hash, fbp, fbc, client_ip, and client_user_agent.
custom_dataJSONEvent-specific payload such as value, currency, order_id, product IDs, coupon, or content metadata.
platform_deliveriesARRAY<STRUCT>Destination delivery records for Meta, Google, TikTok, Klaviyo, and other connected platforms.
created_atTIMESTAMPTrackLayer creation or export timestamp for operational checks and freshness monitors.
partitioned_byevent_time_dateDaily partition derived from event_time so attribution queries can scan only relevant dates.
SQL

Sample queries

These examples assume a single events table at acme-prod.tracklayer_events.events. Replace project and dataset names with your own, then adapt JSON paths to match the field mapping you selected in TrackLayer.

Revenue by platform in last 30 days

SELECT
  delivery.platform AS platform,
  SUM(SAFE_CAST(JSON_VALUE(custom_data, "$.value") AS NUMERIC)) AS revenue,
  COUNT(DISTINCT event_id) AS purchases
FROM `acme-prod.tracklayer_events.events`,
UNNEST(platform_deliveries) AS delivery
WHERE event_name = "purchase"
  AND event_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY platform
ORDER BY revenue DESC;

Funnel: view_content → add_to_cart → purchase conversion rate by utm_source

WITH user_steps AS (
  SELECT
    JSON_VALUE(custom_data, "$.utm_source") AS utm_source,
    COALESCE(JSON_VALUE(custom_data, "$.external_id"), user_data.em_hash) AS user_key,
    MAX(event_name = "view_content") AS viewed,
    MAX(event_name = "add_to_cart") AS added,
    MAX(event_name = "purchase") AS purchased
  FROM `acme-prod.tracklayer_events.events`
  WHERE event_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
    AND event_name IN ("view_content", "add_to_cart", "purchase")
  GROUP BY utm_source, user_key
)
SELECT
  utm_source,
  COUNTIF(viewed) AS viewers,
  COUNTIF(added) AS adders,
  COUNTIF(purchased) AS purchasers,
  SAFE_DIVIDE(COUNTIF(purchased), COUNTIF(viewed)) AS view_to_purchase_rate
FROM user_steps
GROUP BY utm_source
ORDER BY purchasers DESC;

Median journey length by signup cohort

WITH journeys AS (
  SELECT
    DATE_TRUNC(DATE(MIN(IF(event_name = "sign_up", event_time, NULL))), MONTH) AS signup_cohort,
    COALESCE(JSON_VALUE(custom_data, "$.external_id"), user_data.em_hash) AS user_key,
    TIMESTAMP_DIFF(
      MIN(IF(event_name = "purchase", event_time, NULL)),
      MIN(IF(event_name = "sign_up", event_time, NULL)),
      HOUR
    ) AS hours_to_purchase
  FROM `acme-prod.tracklayer_events.events`
  WHERE event_name IN ("sign_up", "purchase")
  GROUP BY user_key
)
SELECT
  signup_cohort,
  APPROX_QUANTILES(hours_to_purchase, 100)[OFFSET(50)] AS median_hours_to_purchase,
  COUNT(*) AS converted_users
FROM journeys
WHERE signup_cohort IS NOT NULL
  AND hours_to_purchase IS NOT NULL
GROUP BY signup_cohort
ORDER BY signup_cohort DESC;
Performance

Partitioning + clustering

Partition the events table by day using event_time, exposed as an event_time_date partition column. Most attribution questions have a time range, so this one choice prevents queries from scanning years of events when an analyst only needs the last 30 days. Use event_time rather than created_at because attribution follows the moment the user acted, not the moment an export worker wrote the row.

Cluster by merchant_id and event_name. Merchant clustering helps multi-store and agency workspaces isolate each account's rows. Event clustering helps common filters such as purchase, add_to_cart, lead, and refund. Together, partitioning and clustering keep dashboards responsive while preserving a single canonical table.

Spend

Cost optimization

Always query on partition columns. A WHERE clause on event_time or event_time_date should be part of every scheduled dashboard, dbt model, notebook, and exploratory query. Avoid SELECT * on the raw export table because nested JSON and delivery arrays can make scans more expensive than expected.

For large aggregates, use approximate functions where exactness is not materially useful. APPROX_COUNT_DISTINCT and APPROX_QUANTILES are often good enough for funnel exploration, median journey length, audience sizing, and early campaign reads. Save exact joins for finance reconciliation, invoicing, and critical executive reporting.

If you have very high event volume, compare continuous BigQuery sync with a raw archive in Google Cloud Storage plus on-demand external querying or batch loads. Continuous sync is simpler for analysts. Raw GCS storage can be cheaper for cold data, replay, audit, and infrequent historical analysis.

Modeling

Joining with other data

The warehouse export becomes most valuable when TrackLayer events are joined to the business tables that define customer quality, product economics, and marketing cost. Keep the raw event table neutral, then build governed views for the joins analysts run every week.

CRM mirror table

Join TrackLayer events to a CRM mirror on external_id, email hash, or your internal customer ID. This makes attribution useful for sales lifecycle reporting: first purchase after lead creation, expansion after account status changes, and retention by lifecycle stage.

Product catalog

Join order and content events to a product catalog on SKU or product_id. That unlocks margin-aware attribution, category-level funnel reporting, price-band analysis, and cohorts based on the first product a customer viewed or bought.

Marketing spend table

Join utm_source, campaign_id, ad_id, or platform delivery IDs to a spend table. Once cost and revenue live in the same warehouse query, you can calculate blended ROAS, CAC, payback, and contribution margin without waiting for ad platform reports.

Diagnostics

Troubleshooting

permissionDenied when writing rows

Confirm the service account has BigQuery Data Editor on the dataset and BigQuery Job User on the project. Dataset-level access alone is not enough for every load job path.

Dataset location mismatch

Create the dataset in the same location as the query jobs and joined tables. BigQuery will not quietly move EU data into a US dataset for you.

Invalid JSON key

Paste the full service account JSON object, including private_key line breaks. If the key was edited by a password manager or document editor, create a fresh key.

Rows missing after schedule time

Check the TrackLayer export run log, verify the destination is enabled for the right environment, and query a wider event_time window before assuming data loss.

Query costs spike

Add event_time partition filters, avoid SELECT * on wide JSON columns, pre-aggregate common dashboards, and use approximate functions for large exploratory aggregates.

FAQ

Common questions

Do I need one table per event name?

Usually no. A single canonical events table is easier to query, partition, govern, and join. Create derived marts for purchase, funnel, or campaign reporting only after the raw table is stable.

Should TrackLayer export raw PII?

The recommended warehouse schema uses hashed identifiers such as em_hash and ph_hash, plus platform browser identifiers where permitted. Keep raw PII in systems designed for customer records and join through governed keys.

How often should the export run?

Hourly is a good default for attribution and campaign monitoring. Daily can work for finance and executive reporting. High-frequency exports are useful only when downstream workflows truly need fresher data.

Can I backfill historical events?

Yes. Run a controlled backfill into the same schema, preserve original event_time, tag the rows with a migration_id in custom_data, and validate counts against the source system before dashboards consume them.

Can analysts change the schema?

They should request schema changes through the data owner. Analysts can build views and marts freely, but the raw export table should remain a stable contract between TrackLayer and the warehouse.

Next reads

Related implementation guides

We use essential cookies to keep the site secure and functional. Analytics and third-party tags run only with your consent. See our Cookie Policy.

We use essential cookies to keep the site secure and functional. Analytics and third-party tags run only with your consent. See our Cookie Policy.