§ 05 · Concepts
Alerts
What each alert type means, when it fires, and what to do about it.
Alerts
TrackLayer watches your pipeline and fires an alert when one of 4 specific things goes wrong. You get an email + an in-dashboard notification. Here's what each type means.
Alerts are meant to answer a practical question fast: is this a platform credential issue, a temporary queue slowdown, a tracking outage, or a delivery-quality regression? Each alert is tied to a concrete threshold so you are not guessing whether a warning matters. When one fires, start with the alert type, then inspect the specific surface it points to.
An alert does not mean every event is broken. It means TrackLayer detected a pattern that is unusual enough to need human attention. In many cases the fastest path is to confirm whether the issue is isolated to one destination, one environment, or one recent deploy.
Alert types
platform_down
Meta / TikTok / Google Ads is rejecting your deliveries.
Fires when: 5 consecutive failed deliveries within 60 minutes on a single platform.
Usually caused by: expired access token, rate limit, ad account disabled, platform outage.
What to do: Go to Dashboard /platforms, click the affected platform, then run Test send. Read the error message first before changing anything. The most common fix is regenerating the access token, but do not assume that is always the issue. If the message mentions account state, permissions, or rate limiting, fix that specific platform-side problem before retrying deliveries.
This alert is platform-scoped on purpose. A Meta token can expire while TikTok and Google Ads continue working normally. When you see platform_down, focus on the failing connector instead of treating it like a full-pipeline outage.
queue_backlog
Events are piling up in the queue.
Fires when: 100+ events in queued status older than 10 minutes.
Usually caused by: a downstream platform throttling us, our consumer worker being slow.
What to do: This usually self-recovers within 15 minutes, especially if the slowdown came from a temporary platform throttle. If it does not clear, check /dlq for patterns. Look for repeated failures on the same platform, event type, or validation issue. A backlog is often a symptom rather than the root cause, so the question is whether one failing destination is causing retries to stack up.
Do not panic if you see this briefly during a traffic spike. The signal that matters is persistence. A short queue backlog during a product launch or campaign burst can be normal; a backlog that stays flat or grows is not.
tracking_stopped
You haven't sent events in 24 hours.
Fires when: zero events for 24h after having >=1 event in the prior 24h, so new installs do not false-fire.
Usually caused by: your backend deploy stopped calling our SDK, API key expired or was revoked, consent mode blocking.
What to do: Check /audit-log for recent API key changes first. Then run /tools/tracking-check on your main domain and verify your backend SDK is still initialized in the runtime that should send events. This alert usually means something changed recently: a deploy, an environment variable rotation, a key revoke, or a consent implementation update that now blocks collection in paths that used to send.
This alert is designed to catch silent failures. It is common for tracking to stop after a backend refactor, worker redeploy, or secret rotation because the rest of the application still looks healthy. If no events are arriving at all, start upstream and verify the SDK is still being called before you spend time debugging downstream deliveries.
delivery_rate_drop
Success rate dropped below 95%.
Fires when: 1h rolling delivery success rate drops more than 10 percentage points versus the prior 24h baseline.
Usually caused by: platform-specific regressions, bad user_data quality dragging down specific platforms.
What to do: Go to /events, filter by status=failed, and look for a pattern in user_data. Missing email, missing fbp, malformed phone numbers, or a sudden field-format change are common causes. Fix the issue at the source rather than replaying broken events over and over.
This alert is different from platform_down. The platform may still be up, but your payload quality or mapping may have regressed enough that acceptance dropped. That is why the comparison is against your own recent baseline instead of a fixed global threshold alone.
De-duplication
TrackLayer sends one alert per type per merchant in a 12-hour window. If the issue keeps happening, you get re-alerted after 12 hours passes.
This keeps noisy incidents from generating dozens of identical notifications while still making sure long-running problems come back to the surface. If you fixed the issue and the alert fires again after the window resets, treat that as a fresh signal that the underlying condition returned.
Delivery
Alerts are delivered to:
- Email, sent to
merchant.emailand any admin team members - Dashboard
/notifications - Optional Slack webhook, configured in
/settings > Webhooks
Email is meant for visibility when no one is actively watching the dashboard. The dashboard notification center is the system of record for current alert history. Slack is useful when a merchant or agency team already works out of a shared incident channel and wants TrackLayer issues in the same stream as deploy or billing notifications.
Turning them off
Go to /settings > Notifications, open Alert preferences, and toggle each alert type individually.
You cannot disable platform_down if you want to reach the support SLA. That alert is how we detect that you likely need intervention before a platform outage turns into a longer attribution gap.
If you want less noise, tune preferences around informational alerts first and keep the ones that detect hard delivery failures or total tracking loss. The goal is not zero alerts. The goal is getting the right alert early enough to fix the problem before revenue reporting drifts.
Alert still confusing? -> File a ticket