In all four cases, the HTTP 200s look identical to last week. The ad platform sees degraded signal and optimizes against it. By the time ops notices, the learning phase has absorbed three days of budget against a broken contract.
Impression waste. When Meta CAPI returns 429 or 500, the pixel in the checkout still fires. The user completes purchase, the browser sends the event, TrackLayer queues it, and the destination rejects it. But the ad that drove that click already won the auction and served the impression. Budget burned on a conversion that never happened.
Over a 6-hour API outage on a $50K/month Meta account, that's roughly $1,200 of wasted spend on broken signal. Multiply across seven platforms and it adds up fast.
Optimization decay. Ad platforms learn from the signal they receive. If half of your purchase events fail silently, the model assumes your conversion rate is half of reality. It widens reach, lowers bid confidence, and your CPA drifts up.
The fix is not increasing budget. The fix is a system that stops sending when the pipe breaks and resumes when it heals.
id | merchant_id | platform | action | error_rate | reason | created_at --------+-------------+------------+--------------+------------+----------------------------+-------------------------- ap_8f2c | northfield | tiktok | pause | 31.2% | rate_threshold_exceeded | 2026-04-23 14:01:58 ap_8f2d | northfield | tiktok | auto_resume | 2.8% | auto_clear_4h_window | 2026-04-23 18:01:58 ap_8f3e | glasshouse | meta | pause | 28.7% | rate_threshold_exceeded | 2026-04-23 16:42:11 ap_8f3f | glasshouse | meta | manual_resume| — | ops_intervention | 2026-04-23 16:45:02 ap_8f4a | halcyon | klaviyo | pause | 41.3% | rate_threshold_exceeded | 2026-04-22 09:14:33
# resume a paused destination manually
curl -X POST \
-H "Authorization: Bearer $TRACKLAYER_API_KEY" \
-H "Content-Type: application/json" \
https://tracklayer-api.sublime.workers.dev/v1/destinations/tiktok_abc123/resume \
-d '{
"reason": "Platform confirmed healthy via support ticket #8921"
}'
# response
{
"destination_id": "tiktok_abc123",
"state": "active",
"resumed_at": "2026-04-23T14:45:02Z",
"resumed_by": "api",
"previous_pause_reason": "error_rate_31.2%",
"paused_at": "2026-04-23T14:01:58Z"
}The resume action is logged in the auto_pause_events table with action = 'manual_resume' and reason captured. Dashboard shows the full pause/resume timeline per platform.
event_id | event_type | resource_id | actor | prev_hash | hash | created_at ------------+-----------------+-------------------+---------------+-----------------+-------------------------------+-------------------------- ae_9a2f1 | destination_pause| tiktok_abc123 | system_worker | h8f3a2c...d1 | h9b4e3d...f2 (sha256 prev||mat) | 2026-04-23 14:01:58 ae_9a2f2 | alert_sent | northfield_merch | system_worker | h9b4e3d...f2 | ha1c5e9...b3 | 2026-04-23 14:02:01 ae_9b3e1 | destination_resume| tiktok_abc123 | api_user | ha1c5e9...b3 | hc2d6f0...a4 | 2026-04-23 14:45:02
Samples last 1,000 delivery attempts per platform every 5 minutes.
Default tunable threshold per platform. Flips state on sustained breach.
Auto-resumes when error rate drops below threshold or manual intervention.
Every pause/resume written to audit_events with hash chain.
We use essential cookies to keep the site secure and functional. Analytics and third-party tags run only with your consent. See our Cookie Policy.
We use essential cookies to keep the site secure and functional. Analytics and third-party tags run only with your consent. See our Cookie Policy.