Why anomaly detection for tracking
Tracking failures rarely announce themselves in a clean, obvious way. A checkout extension can stop forwarding event IDs. A consent banner can stop attaching anonymous identifiers. A destination API key can expire. A release can remove an email field from server payloads. Orders still complete, revenue still appears in the commerce platform, and the marketing dashboard may simply show a lower number tomorrow. That delay is expensive because the team loses the moment when the failure was easiest to isolate.
Ad platforms and analytics tools do surface diagnostics, but they usually see only their own endpoint and they often report downtime indirectly. They may flag low match quality, delayed ingestion, or rejected events after the issue has already affected campaigns, attribution, and lifecycle automation. TrackLayer sits closer to the event stream. It can compare what the store generated, what was resolved, what was sent, what was acknowledged, and what was deduplicated. That makes missed conversions visible while there is still time to protect reporting and performance.
The 6 signals TrackLayer watches
The detector focuses on symptoms that map to real tracking failures. A single metric can be misleading, so TrackLayer keeps the raw signal, the baseline expectation, the anomaly score, and the surrounding event context together. That lets the alert say what changed, where it changed, and why the system thinks it matters.
| Signal | Detection method | Typical threshold | Example alert |
|---|---|---|---|
| event_volume | Compare observed event count per event type and destination against the seasonal baseline for the same hour window. | warning at |z| >= 2.5, critical at |z| >= 3.5 | purchase volume is 48 percent below expected for Meta CAPI during the last 30 minutes. |
| match_quality_avg | Track the rolling average of destination match quality scores and identity fields included in outbound payloads. | drop of 15 percent or z <= -3 | Meta match quality fell from 8.1 to 6.4 after email hashing changed. |
| delivery_failure_rate | Measure rejected, throttled, timed out, and permanently failed delivery attempts as a share of total attempts. | warning above 2 percent, critical above 8 percent | TikTok Events API failures reached 11.7 percent with repeated 401 responses. |
| dedup_ratio | Compare accepted deduplicated events with browser plus server event pairs by event_id and order scope. | change of 25 percent from baseline | purchase dedup ratio collapsed after the browser event_id format changed. |
| identity_resolution_rate | Measure how often raw events resolve to a customer, visitor, cart, order, email, phone, or durable external ID. | drop of 10 percentage points or z <= -3 | checkout identity resolution dropped when the consent banner stopped passing anonymous_id. |
| platform_p95_latency | Calculate destination delivery latency from TrackLayer receipt time to platform acknowledgement at the 95th percentile. | warning above 60 seconds, critical above 180 seconds | Meta CAPI p95 latency spiked to 4.6 minutes during peak sale traffic. |
Z-score + exponential smoothing explained
A z-score is a plain way to ask how unusual the current value is compared with normal behavior. If TrackLayer expected 1,000 purchase events in a window, usually sees a standard deviation of 80, and observes only 720, the current window is 3.5 standard deviations below expected. That does not automatically prove a tracking failure, but it is strong evidence that the event stream no longer looks normal for that merchant, event, destination, and time of week.
z = (observed_value - expected_baseline) / baseline_standard_deviation
smoothed_score_t = alpha * z_t + (1 - alpha) * smoothed_score_t_minus_1Exponential smoothing keeps that score from overreacting to one noisy bucket. The newest z-score gets weight alpha, while the previous smoothed score carries the remaining weight. In practice, this means a sustained failure escalates quickly, while a brief platform wobble can settle back to normal without waking the team.
Baseline construction
TrackLayer builds a 28-day rolling baseline for each monitored combination of merchant, event type, destination, and signal. The rolling window is long enough to learn normal variation, but short enough to adapt when a store grows, changes traffic mix, or shifts campaign strategy. Recent data is not accepted blindly. Windows marked as incident periods, back-fills, known platform outages, or deployment tests are excluded so the baseline does not learn bad behavior as normal.
Seasonality is built into the comparison. A Sunday at 23:00 is compared with prior Sundays at 23:00, and a weekday lunch window is compared with similar weekday lunch windows. That day-of-week plus hour-of-day awareness matters because ecommerce traffic is not evenly distributed. Some stores have payroll-week spikes, weekend browsing patterns, morning checkout behavior, or nightly batch jobs that would look anomalous under a naive daily average. The baseline stores the expected value, variance, sample count, and confidence level, then the detector chooses severity based on both deviation and confidence.
Example: a real Black Friday anomaly
During a Black Friday campaign, one merchant saw Meta CAPI platform_p95_latency jump from a normal range of 18 to 35 seconds to more than 4 minutes. Raw order volume was high but healthy. Delivery failures were low. The anomaly came from acknowledgement delay: TrackLayer received the purchase events on time, queued them normally, sent them to Meta, and then watched the platform take much longer than expected to confirm receipt. Because the event stream itself was intact, the alert focused on delivery latency instead of claiming purchase tracking was down.
The alert included the affected destination, the signal name, the observed p95, the expected p95, the z-score, the first detected window, and links to sample payloads. It read: critical latency anomaly for Meta CAPI purchase events, p95 276 seconds versus 29 seconds expected, sustained for 20 minutes. The team first checked whether checkout was blocked. It was not. They then reduced nonessential replay traffic, paused a historical back-fill job, and watched live purchase delivery return to normal. The incident auto-resolved once the smoothed score stayed below the warning threshold for three consecutive windows.
False positive mitigation
Minimum sample size
TrackLayer waits until a window has enough observations for the metric being evaluated. A 50 percent drop from two events to one is not the same as a 50 percent drop from 2,000 events to 1,000. Low-volume stores and low-volume event types can still be monitored, but their alerts use longer windows and less aggressive severity.
Seasonality adjustment
The model compares Monday at 09:00 with prior Mondays at 09:00, not with Saturday night or yesterday afternoon. That matters for B2B stores, weekend-heavy brands, and merchants with predictable daily fulfillment cycles.
Double-threshold
Warning and critical thresholds are evaluated separately. A warning gives the team a visible signal when tracking is drifting. A critical alert is reserved for a failure that is large, sustained, or likely to affect reporting, attribution, or automations quickly.
Correlated signal requirement
For noisy signals, TrackLayer looks for a second supporting symptom before escalating. A volume drop plus rising failures is stronger evidence than volume alone. A latency spike plus delayed platform acknowledgements is stronger than one slow batch.
Custom rules
Built-in detection covers the common failure modes, but merchants often know which signals deserve stricter handling. In /alerts-rules, a team can define custom rules for a destination, event type, severity, comparison window, sample size, and notification channel. A merchant might create a critical rule when purchase delivery_failure_rate exceeds 3 percent for Meta, a warning when match_quality_avg falls below 7.0 for new customers, or an info alert when platform_p95_latency stays above 90 seconds during a launch.
Custom rules are evaluated alongside the statistical detector. They are useful for contractual requirements, internal reporting deadlines, agency workflows, and campaign-specific monitoring. The rule editor shows the recent baseline and estimated alert frequency before saving, so teams can see whether a rule is likely to be useful or noisy.
Alert fatigue
More alerts do not create better tracking. They usually create slower response, because the team learns that most notifications are not worth immediate attention. TrackLayer is designed around fewer, higher-confidence alerts with clear severity. The system should explain what changed, why it matters, what evidence supports the conclusion, and whether the issue is still active.
auto-resolved
info
warning
critical
page
Auto-resolved alerts stay in the incident timeline without interrupting anyone. Info alerts document unusual but low-risk movement. Warning alerts ask for investigation during working time. Critical alerts indicate likely data loss, attribution damage, automation impact, or destination outage. Page is reserved for severe live failures where delay would cost meaningful conversion data.
Common questions
Does TrackLayer require a fixed threshold for every store?
No. Defaults exist so merchants get coverage quickly, but the baseline is store-specific and destination-specific. A high-volume apparel store and a niche B2B catalog should not share the same event count threshold.
Can anomaly detection catch a broken checkout pixel?
Yes, if the break changes a watched signal. The most common pattern is a purchase volume drop, dedup ratio shift, or identity resolution drop shortly after a frontend release, checkout app update, or consent banner change.
How fast do alerts fire?
Most production alerts evaluate on rolling windows between 5 and 30 minutes. Critical delivery failures can surface faster, while low-volume stores may need longer windows to avoid noisy conclusions.
What happens when traffic legitimately changes?
TrackLayer does not assume every deviation is bad. Seasonality, sample size, campaign metadata, and correlated signals help distinguish a real tracking issue from a product launch, sale, holiday, or paid media budget change.
Can custom rules override built-in detection?
Custom rules can add stricter or merchant-specific checks, but they do not remove core health monitoring. A merchant might page on purchase delivery failures above 3 percent while leaving the standard TrackLayer critical rule at 8 percent.
Related implementation guides
Deduplication explained
How TrackLayer keeps one user action as one business event across browser tracking, server events, retries, and platform APIs.
Read guide →Identity resolution guide
A technical walkthrough of visitor, customer, cart, order, email, phone, and external ID resolution for ecommerce tracking.
Read guide →Meta CAPI setup guide
Configure Meta Conversions API with event IDs, user_data, consent handling, diagnostics, and server-side delivery checks.
Read guide →