Why identity matters for attribution
Attribution systems do not observe a single neat person moving through a funnel. They observe fragments: a paid click on mobile, a product view on desktop, an abandoned cart in a browser with strict tracking protection, a checkout in a hosted payment flow, and a later repeat purchase from an email campaign. Without identity resolution, each fragment can look like a separate user or a separate source of truth.
Typical e-commerce visitor generates 3.2 anonymous sessions before signing up. That means a large share of high-value intent arrives before a clean account identifier exists. If your server pipeline only enriches events after login, it will understate the role of early visits, paid clicks, product discovery, and cart activity that made the signup or purchase possible.
Identity resolution closes that gap by connecting anonymous identifiers to later deterministic identifiers. The goal is not to guess aggressively. The goal is to preserve enough browser and session context that, when the customer finally gives you a strong first-party signal, prior eligible events can be joined to the same person with an auditable explanation.
The identifier zoo
Every identifier has a different job. Platform cookies help a destination recognize browser context. User-level identifiers bridge devices and sessions. Session identifiers order events. Fingerprints can support investigation, but they should not become your primary customer key.
| Identifier | Type | Persistence | Used by | Privacy status |
|---|---|---|---|---|
| fbp | Pixel | First-party _fbp cookie, typically up to 90 days | Meta Pixel, Meta CAPI | Pseudonymous browser identifier; consent-gated in advertising use cases |
| fbc | Pixel | Derived from fbclid or _fbc cookie, often 90 days | Meta attribution and click matching | Click-derived advertising identifier; handle under ad storage rules |
| _ga | Pixel | Google Analytics client ID cookie, commonly up to 2 years | GA4, Google tags, server-side GTM | Pseudonymous analytics identifier; consent depends on region and purpose |
| _gid | Pixel | Google Analytics short-lived cookie, usually 24 hours | GA4 session and visitor separation | Pseudonymous analytics identifier; short retention but still personal data in many regimes |
| _gcl_aw | Pixel | Google Ads click cookie, commonly 90 days | Google Ads conversion attribution | Advertising click identifier; consent-gated for ad measurement |
| _pin_unauth | Pixel | Pinterest unauthenticated browser cookie | Pinterest tag and Conversions API | Pseudonymous ad platform identifier; requires purpose-aware handling |
| _ttp | Pixel | TikTok browser identifier cookie, commonly 13 months | TikTok Pixel and Events API | Pseudonymous advertising identifier; forward only when policy allows |
| _scid | Pixel | Snap browser identifier cookie, commonly 13 months | Snap Pixel and Conversions API | Pseudonymous advertising identifier; tied to ad measurement consent |
| _rdt_uuid | Pixel | Reddit browser identifier cookie | Reddit Pixel and Conversions API | Pseudonymous advertising identifier; region and consent sensitive |
| _reddit_uuid2 | Pixel | Reddit browser identifier cookie | Reddit measurement and conversion matching | Pseudonymous advertising identifier; do not treat as a user account key |
| email_hash (SHA-256) | User-level | Stable while the email remains on the account | Meta, Google, TikTok, Pinterest, Snap, internal identity graph | Pseudonymous but strongly linkable; normalize, hash, consent-check, and protect |
| phone_hash | User-level | Stable while the phone number remains valid | Ad platforms and CRM matching | Pseudonymous sensitive contact signal; normalize to E.164 before hashing |
| external_id (your user_id) | User-level | Stable for the life of the account unless rotated | TrackLayer, Meta CAPI, CDPs, warehouses | Pseudonymous internal account key; delete or detach on erasure requests |
| advertising_id (iOS IDFA / Android AAID) | User-level | Device-level and resettable by the user | Mobile apps, MMPs, ad networks | Regulated device advertising ID; requires platform permission and consent |
| session_id | Session | One visit, cart, checkout, or configured inactivity window | TrackLayer, analytics, fraud checks, event ordering | Pseudonymous session key; weak identity signal by itself |
| device fingerprint (user_agent + IP + accept-language + screen) | Session | Volatile; changes with network, browser, and device settings | Fallback matching, fraud checks, bot detection | High-risk fingerprinting signal; use sparingly, disclose clearly, and avoid as sole proof |
The union-find algorithm
TrackLayer models identity as a graph of nodes and edges. A node can be an fbp cookie, a session_id, an external_id, an email_hash, or another identifier. An edge says two nodes were seen together on a trusted event. When a signup event contains session_id, external_id, and email_hash, those nodes become part of one connected component.
The plain-English version of union-find is simple: every identifier starts in its own small group. When an event proves two identifiers belong together, TrackLayer unions those groups. When later events ask who this browser or account belongs to, TrackLayer finds the group leader and returns the set of eligible identifiers connected to it. The algorithm is fast enough for streaming events, but the business rules around it matter more than the data structure.
A responsible implementation also stores why a merge happened. Deterministic edges such as email_hash and external_id can safely create durable unions. Weak edges such as IP address and user agent should have weights, short lookback windows, and deletion behavior that can unwind a mistaken association.
Cross-device matching
Cross-device matching works best when it is honest about signal reliability. Rank your signals before writing merge rules, and make sure weak signals cannot override strong account facts.
email_hash 100%
Deterministic when normalized correctly and collected from the same person. It is the best cross-device bridge because the same customer often uses one email across desktop, mobile, and checkout.
phone_hash 95%
Very strong when formatted consistently, but household sharing, recycled numbers, country-code mistakes, and checkout autofill can create edge cases.
login_user_id 100% within same auth
Authoritative inside your own account system. It is perfect for stitching events produced after login, but it cannot identify pre-login visitors until a later event connects the account to earlier anonymous identifiers.
device fingerprint 60-80%
Useful as a hint when cookie or login data is missing, but it is not deterministic. Treat it as a candidate edge with strict time windows, consent checks, and collision monitoring.
The anonymous → signed-up stitch
The most common identity resolution win is not exotic cross-device magic. It is the everyday journey where a user browses anonymously, shows intent, then signs up or buys later. The stitch lets your server-side events tell that story without rewriting history.
Visitor lands anonymously
The browser receives a session_id and platform cookies such as fbp or _ga. TrackLayer stores the event under an anonymous node because no account-level identifier exists yet.
Adds to cart
The cart event carries the same session_id and fbp cookie. TrackLayer now knows the anonymous node has commercial intent, but still avoids pretending it knows the person.
Later signs up
The signup event includes a normalized SHA-256 email_hash and, often, an external_id from your user table. Those identifiers create a deterministic bridge to the prior anonymous node.
Back-fill prior events
TrackLayer back-fills prior fbp-cookie events with the new email_hash for eligible destinations and time windows, preserving original event timestamps and consent state.
What TrackLayer does vs DIY
Many teams can build the first version of identity stitching in a warehouse. The hard part is keeping it operational when consent, deletion, event replay, destination payloads, and bad identifiers all interact at once.
| Area | TrackLayer | DIY |
|---|---|---|
| Identifier capture | Collects browser, session, click, hashed contact, and account identifiers into one event envelope. | Requires tag work, backend schema design, cookie handling, and platform-specific adapters. |
| Graph merging | Uses deterministic edges first, weighted weak edges second, and keeps merge history auditable. | Easy to start with SQL joins, harder to keep reversible, explainable, and fast under volume. |
| Back-fill | Replays eligible identity improvements onto prior anonymous events without changing source facts. | Usually becomes a queue, warehouse job, or custom event mutation service. |
| Consent rules | Evaluates consent per event and destination before enriching or forwarding identifiers. | Teams must wire CMP state into every event path and maintain policy logic per platform. |
| Deletion | Supports subject erasure by detaching identifiers and propagating deletes through connected nodes. | Requires graph traversal, warehouse cleanup, destination suppression, and audit logging. |
| Operations | Ships collision alerts, missing-identifier diagnostics, match-rate checks, and replay visibility. | Monitoring is often added after attribution breaks, which makes root cause analysis slower. |
Privacy + consent
Under GDPR and similar laws, identity resolution should be treated as personal data processing even when identifiers are hashed or pseudonymous. A hash makes a value harder to read; it does not remove the ability to single out, link, or match a person. The practical implication is that your graph needs purpose limitation, retention controls, access controls, and a lawful basis for each class of identifier.
TrackLayer separates pseudonymous and identified states. An anonymous browser node can exist for measurement with limited context. A signed-up state can attach email_hash, phone_hash, and external_id, but only for events and destinations whose consent and policy checks allow that enrichment. If ad personalization is denied, the graph can still support internal analytics while suppressing identifiers from advertising destinations.
Right-to-delete handling should work at graph level, not only row level. When a deletion request arrives for an email_hash or external_id, the system must find connected nodes, detach or erase eligible identifiers, prevent future rehydration from imports, and record what was removed. This is where a merge log matters: you cannot reliably delete a connected identity if you cannot explain how it was connected.
Troubleshooting
Collision rate too high
Check whether weak signals are being promoted to deterministic merges. Lower fingerprint weight, shorten lookback windows, and require email_hash, phone_hash, or login_user_id before permanent union.
Signup not back-filling
Confirm the signup event includes the same session_id or browser cookie seen on the anonymous events. Also check consent state, the allowed back-fill window, and whether the earlier events were already exported.
Email not being hashed client-side
Normalize and hash before the value leaves the browser when your policy requires it. Lowercase, trim, remove invisible characters, then SHA-256 the normalized value. Do not send raw email through debugging logs.
Different hashes for the same customer
Normalize consistently across client, server, checkout, and CRM import paths. Most mismatches come from uppercase letters, trailing spaces, Gmail dots, phone separators, or missing country codes.
Too many anonymous users after login
Make sure login, signup, checkout, and account update events all send external_id and current session identifiers. A user ID that arrives only on Purchase leaves earlier steps disconnected.
Platform match quality drops after consent release
Compare consent purpose mappings before and after the CMP change. The graph may be healthy while destination forwarding now suppresses hashed identifiers for a larger share of traffic.
FAQ
Is hashed email anonymous?
No. SHA-256 email is pseudonymous, not anonymous. It is difficult to read directly, but it remains stable, linkable, and useful for matching. Treat it as personal data in privacy programs.
Should we hash fbp, fbc, _ga, or click IDs?
No. Platforms expect those identifiers in their native format. Hash contact fields like email and phone after normalization, but keep platform cookie and click identifiers unchanged unless a destination specifically says otherwise.
Can identity resolution replace the browser pixel?
Usually not. Browser tags still create click and cookie context that server-side events need later. The most reliable setup captures browser identifiers first, then enriches server events with first-party data as the user becomes known.
How far back should anonymous events be stitched?
Use a bounded window that matches your buying cycle and consent policy. Many ecommerce teams start with 7 to 30 days, then adjust based on cart latency, return visits, and collision monitoring.
What happens when two accounts share one device?
Deterministic account identifiers should split the graph. Shared devices are why fingerprints and browser cookies should not permanently merge people unless a stronger identifier confirms the relationship.
Do we need consent before building the identity graph?
You need a lawful basis and purpose-specific controls. In practice, that means storing consent with each event, limiting identifiers by purpose, and suppressing enrichment or forwarding when the user has denied the relevant purpose.