The dedup problem space
Deduplication starts as a distributed identity problem. A single conversion can appear from a browser pixel, a backend webhook, a checkout system, a queue retry, and an analytics export. Every one of those records can be truthful. None of them is useful if the system cannot decide whether they are copies of one conversion or genuinely separate economic events. The more systems involved, the less safe it becomes to rely on implicit clues such as arrival time, event name, or customer email alone.
Eventual consistency makes this worse. Browser events are usually fast but fragile. Server events are durable but often delayed by queueing, anti-fraud checks, payment confirmation, or warehouse staging. A platform may receive the browser copy first, the server copy first, or only one of them for a while. Dedup therefore cannot depend on delivery order. It has to depend on a stable contract that remains true even when the same conversion arrives through multiple transports minutes or hours apart.
Retry behavior turns dedup into an idempotency problem. A retry is not supposed to create a second commercial fact. It is only supposed to complete the delivery of the first one. The moment a retry path generates a new event_id, the transport layer stops being idempotent and the platform receives what looks like a brand new conversion. Good dedup architecture therefore lives at the seam between distributed systems design and measurement logic.
Per-platform dedup mechanisms
The exact vocabulary differs by network, but the structure is similar across all major ad platforms: one shared conversion identity plus supporting browser, click, or hashed match context. The table below is a practical implementation map rather than a claim that every destination exposes the same public diagnostics.
| Platform | Core dedup keys | Technical notes |
|---|---|---|
| Meta | event_id + fbp OR hashed PII | Browser and server events collapse when the pair shares the same event identity and plausible browser or match-key context. |
| Google Ads | gclid OR order_id OR GOOG_KEY_EVENT_FLAG | Google leans on click identifiers, transaction or order identifiers, and enhanced conversion signals to prevent replay and page-refresh duplicates. |
| TikTok | event_id + user_data fingerprint | The identical event and event_id are the hard requirement, while user data improves the system's confidence that the browser and API copies describe the same action. |
| event_id + epik | Pinterest's dedup surface emphasizes event_id coverage and benefits from carrying the Pinterest click context whenever it exists. | |
| Snapchat | event_id + sc_click_id | Snap conversion quality improves when Purchase carries a durable transaction identity and click context instead of relying on page timing alone. |
| conversion_id + rdt_uuid | Reddit Pixel and CAPI need one shared conversion identity. rdt_uuid helps tie that identity back to the browser session that produced the ad interaction. |
The event_id contract
A correct event_id has three requirements. First, it must be unique per conversion event. Not per user, not per cart, and not per session. Second, it must persist across retries and recovery flows so that a failed delivery can be replayed without producing a second identity. Third, it must be deterministic from the point of view of the system that owns the conversion. Even if the ID is random, the platform-facing value must be fixed once assigned.
There are two sane generation strategies. UUID v4 is the cleanest when the event producer has authoritative storage and can persist the result at creation time. Deterministic hashing is better when multiple systems may need to derive the same identity without a central write step, for example when purchase events can originate from both checkout completion and payment confirmation workers. In that case, a canonical input such as order_id, merchant_id, and a timestamp bucket or event-type suffix can generate the same ID every time.
Timestamp alone is never enough. It is not unique under load, it is vulnerable to clock skew, and it makes independent retries or partial replays look like distinct conversions if the timestamp is regenerated. The same order confirmed twice within one second can collide, while one delayed retry a minute later can fork the identity of the same purchase. Timestamps are useful as part of a canonical payload. They are dangerous as the whole key.
Dedup windows
Window behavior is where teams often get surprised. Platforms do not only care about whether the identifiers match. They also care about when the second copy arrives relative to the first and whether the surrounding browser or click context is still valid. The table below should be read as an operational reference for implementation and debugging, not a substitute for platform docs on specific campaign types.
| Platform | Window | Edge cases beyond window |
|---|---|---|
| Meta | Typically 48h for browser ↔ server overlap | A server retry outside the practical merge window may still be accepted as a standalone conversion, especially if the platform no longer sees the browser context as adjacent. |
| Google Ads | Operationally bound by tag replay and conversion import timing | Late uploads can still attribute if click data is valid, but dedup against an on-page tag depends on preserving the same order or transaction identifier. |
| TikTok | 48h from the first event | TikTok documents a 48-hour event_id window. A later retry can stop deduplicating even if the payload is otherwise identical. |
| Treat as short-latency and same-conversion-window sensitive | If the API copy arrives well after the tag copy, Pinterest may record it as a separate ingestion rather than removing it as a duplicate. | |
| Snapchat | Treat as click-window sensitive and transport-order sensitive | If the click identifier has expired or the transaction identity changed between attempts, a replay can look like a second Purchase instead of a repaired delivery. |
| Treat as attribution-window sensitive with shared conversion identity required | Late server copies can drift out of the practical reporting context even if the same conversion_id is reused, especially when browser UUID coverage is weak. |
Retry + idempotency interaction
The safe mental model is that retries operate on a delivery record while dedup operates on a conversion record. TrackLayer keeps those concerns separate. When a network request to Meta, TikTok, or another destination fails, the retry worker does not rebuild the event from scratch with a fresh identity. It reuses the existing conversion envelope, the same event_id, and the same platform-scoped payload inputs.
This matters even when the failure happens after the remote platform received the request but before TrackLayer observed the response. In distributed systems, that is the classic ambiguous success case. The platform might have accepted the event, while the sender believes it failed. The only defense is idempotent replay. If the retry carries the same event_id, the platform has a chance to recognize it as the same conversion. If the retry uses a new ID, the ambiguity becomes a guaranteed duplicate.
Deterministic event_id generation
Deterministic IDs are useful when multiple services need to agree on the same conversion identity without asking a central event registry in real time. The key is to hash a canonical form, not a loosely concatenated string assembled ad hoc by each service.
import { createHash } from "node:crypto";
type PurchaseIdentityInput = {
merchantId: string;
orderId: string;
eventName: "Purchase" | "Refund";
timestampBucket: string; // for example: 2026-04-23T14:35Z
};
function canonicalize(input: PurchaseIdentityInput) {
return [
input.merchantId.trim().toLowerCase(),
input.orderId.trim().toLowerCase(),
input.eventName,
input.timestampBucket,
].join("|");
}
export function buildDeterministicEventId(input: PurchaseIdentityInput) {
const canonical = canonicalize(input);
return createHash("sha256").update(canonical).digest("hex").slice(0, 32);
}
const eventId = buildDeterministicEventId({
merchantId: "acme-eu",
orderId: "ord_104928",
eventName: "Purchase",
timestampBucket: "2026-04-23T14:35Z",
});
// eventId → 7d5e8d4ff1c96e6caa4e5c7d1ad23942The important detail is canonicalization. Trim whitespace, normalize case where appropriate, pin separators, and define the exact timestamp granularity. If one worker hashes raw order IDs while another lowercases them, deterministic generation stops being deterministic.
Diagnostic queries
Warehouse checks are where dedup bugs become obvious. Platform dashboards often show the symptom after a delay, while your warehouse can usually show the root cause immediately.
Count duplicate event_id values across platform deliveries
This query shows where the same delivery identity was emitted multiple times to the same platform and event type.
select
platform,
event_name,
event_id,
count(*) as delivery_count
from platform_deliveries
where delivered_at >= current_date - interval '30 days'
group by 1, 2, 3
having count(*) > 1
order by delivery_count desc, platform, event_name;Find events with the same order_id but different event_id
This catches the common bug where retries or multiple services mint fresh event identities for one order.
select
merchant_id,
order_id,
count(distinct event_id) as distinct_event_ids,
min(created_at) as first_seen_at,
max(created_at) as last_seen_at
from tracklayer_events
where event_name = 'Purchase'
and created_at >= current_date - interval '60 days'
group by 1, 2
having count(distinct event_id) > 1
order by last_seen_at desc;Detect dedup_ratio trend by week
Use this to see whether deployment changes are lowering the share of deliveries that platforms mark as deduplicated or merged.
select
date_trunc('week', delivered_at) as week,
platform,
sum(case when dedup_status in ('deduplicated', 'merged') then 1 else 0 end)::decimal
/ nullif(count(*), 0) as dedup_ratio
from platform_deliveries
where delivered_at >= current_date - interval '180 days'
group by 1, 2
order by week asc, platform;Identify ghost conversions present in Meta but missing in TrackLayer
This flags conversions reported by Meta receipts or exports that do not map back to a TrackLayer source record, which usually means an unmanaged browser tag or a broken ingestion path.
select
m.event_id,
m.event_name,
m.received_at,
m.value,
m.currency
from meta_receipts m
left join tracklayer_events t
on t.event_id = m.event_id
and t.event_name = m.event_name
where m.received_at >= current_date - interval '30 days'
and t.event_id is null
order by m.received_at desc;TrackLayer's internal dedup pipeline
Internally, TrackLayer treats dedup as a system invariant rather than a destination option. The platform adapter is allowed to transform field names, but it is not allowed to reinterpret conversion identity.
Canonicalize the conversion envelope
TrackLayer first turns raw source events into a normalized conversion envelope. Currency, value, event_name, source timestamps, merchant scope, consent state, and available browser or click IDs are normalized before any destination payload exists.
Assign the event_id once
The system assigns a stable event identity before fan-out. That identity is stored with the conversion record and reused by every downstream dispatcher, retry worker, and warehouse export job.
Hydrate platform match context
Destination adapters then attach platform-specific identifiers such as fbp, gclid, epik, sc_click_id, rdt_uuid, or hashed customer fields. This step does not create identity. It enriches the same identity for the rules of each network.
Deliver with idempotent write paths
Outgoing deliveries are recorded under a unique delivery key that references the shared event_id. If a network call fails, TrackLayer schedules a retry that points back to the same conversion record rather than producing a new outbound event row.
Observe, reconcile, and alert
Platform responses, retry outcomes, and warehouse diagnostics feed a reconciliation layer that can flag duplicate event_id reuse, context loss, or drift between platform receipts and TrackLayer's own ledger.
Common questions
How should Stripe refund events behave?
Refunds should not reuse the purchase event_id. They are distinct economic events and need their own identity, usually derived from refund_id plus the parent charge or order. Reusing the purchase identifier makes the refund look like a replay of the original conversion.
Do partial refunds create new conversions?
They should create new refund events, not new purchase conversions. The purchase remains the original conversion. Each partial refund needs its own event_id so finance reconciliation can distinguish one refund action from another.
How should subscription renewals dedup with Meta?
Each renewal should be a new conversion identity because each renewal is a new billable event. The browser and server copies of that renewal should share one event_id, but month-one and month-two renewals should never share one.
Can deterministic IDs work for anonymous users?
Yes, as long as the canonical input is about the conversion, not the person. A cart confirmation number, checkout session ID, or payment intent ID can generate a deterministic event_id without relying on email or customer account state.
Can dedup work across browsers or devices?
Not as browser-session dedup in the strict sense. If the browser copy comes from one device and the server copy arrives from another context, the stable conversion identity still helps, but platform confidence falls when browser or click context is no longer shared.
What should happen when an order is edited after purchase?
Do not mutate the original purchase event_id. If you need to send an adjustment, use the platform's adjustment or refund semantics with a new event identity linked back to the original order reference for auditability.
Related implementation guides
Dedup explained
Start with the shallow version if you need the core browser plus server mental model before the deeper platform details.
Read guide →Event Match Quality
See how hashed identifiers, browser IDs, and click context improve the odds that deduplicated events still attribute cleanly.
Read guide →Identity resolution deep dive
Go deeper on cross-session identity, merge policy, graph design, and why dedup is one specialized form of identity resolution.
Read guide →