Skip to main content
Engineering28 Apr 20267 min readAlex RiveraPrincipal Engineer

Why we chose Cloudflare Workers for ingestion

cloudflareedgearchitecture

Pick an ingestion primitive and you inherit a culture: deploy cadence, how you think about cold starts, and what "regional" really means for your customers. We evaluated managed Kubernetes, single-region containers, and Cloudflare Workers for the first hop that accepts browser beacons and server posts. Workers won—not because they are trendy, but because they matched how we want operators to experience TrackLayer.

Latency and fan-out shape the product

Ad platforms reward fast, honest signals. Our edge tier validates signatures, normalizes payloads, and enqueues work with single-digit millisecond overhead in the regions we care about. Workers let us pin logic next to Eyeball Networks without pretending every customer wants to manage their own ingress cluster.

Fan-out—one canonical event becoming many guarded deliveries—is orchestrated asynchronously, but acceptance has to be snappy. A Worker that ACKs quickly and hands durable work to queues keeps browsers from timing out during flaky Wi-Fi Sessions.

Isolation beats noisy neighbours

Traditional multi-tenant servers share memory pools and jitter in ways that haunt on-call rotations. Workers give us per-invocation isolation with tight CPU limits; a pathological regex in one tenant's transform cannot starve unrelated traffic. Limits force us to keep hot paths small, which improves reviewability—a hidden compliance win.

Residency without copy-pasting regions

European customers demanded EU-West ingress with an explicit boundary before data touches US analytics subsystems. Cloudflare's regional deployment model mapped cleanly to how we explain sub-processors on security reviews. Containers could do the same, but would have multiplied patching and image promotion work for each residency slice.

We still run stateful systems where they belong—queues, warehousing, heavy joins—but the first mile is Workers: fast, isolated, and region-conscious. Trade-offs remain (debugging async flows, WASM bundle discipline). For ingestion, the fit has been overwhelmingly positive—measured less in novelty and more in stable SLO graphs.

Debugging and observability

Workers force discipline in logging—we cannot attach strace and wander. Structured logs with trace IDs bridging to downstream queues saved us. That investment pays off whenever a merchant asks why a burst of events vanished: we correlate edge acceptance timestamps with queue depths and transformer outcomes without implying magical packet loss gremlins.

Local emulation improved dramatically over past years, yet staging still validates against Workers preview URLs with synthetic load shaped like production spikes (launch traffic, clumsy crawler loops). Surprise findings go into regression harnesses—especially around maximum header sizes and chunked uploads from older mobile SDK versions.

When Workers would not be enough

Huge binary payloads—batch historical replays exceeding edge payload limits—bypass Workers and upload directly to resilient object storage APIs with signed URLs. We are opinionated about not forcing square pegs through the Workers path.

Long-running aggregates—anything needing minutes of CPU—belong offline. Keeping that boundary crisp prevents organisational myths that “everything is realtime” simply because ingestion ACKs fast.

Choosing Cloudflare Workers was pragmatic: match product promises (regional ingestion, dependable latency) with an execution model junior engineers can reason about. If your platform skews OLAP-heavy, maybe Workers only front a bigger fleet—and that composition is healthy too.

Cost, caps, and capacity planning

Workers billing rewards tight code paths: we watch invocations, CPU time, and subrequest counts per route. Spikes correlate with merchant misconfiguration more often than with attacks—still, rate limits protect everyone. Forecasting combines historical event shapes with projected campaign launches so we widen quotas before catalogs drop, not during.

Runbooks distinguish customer-visible degradation from internal back-pressure signals. Alerts page on-call when acceptance latency crosses SLO budgets even if error rates look flat—silent slowdowns wreck trust.

That operational clarity—knowing what spikes cost and why—helps us price fairly and justify EU-only footprints without mystical margin math. Workers are not free, but they are predictable in the dimensions we actually control.

Subscribe

Get new posts in your inbox

Same list as changelog subscribers — product writing, no spam.

Major releases only. Unsubscribe anytime. (API wiring in progress.)
Related posts

All posts

We use essential cookies to keep the site secure and functional. Analytics and third-party tags run only with your consent. See our Cookie Policy.

We use essential cookies to keep the site secure and functional. Analytics and third-party tags run only with your consent. See our Cookie Policy.