Skip to main content
GUIDE · A/B TESTING9 min read

Server-side experiment tracking + A/B tests: the practical guide

A field guide for teams that want reliable experiment exposure and conversion data when the browser cannot be trusted as the source of truth. Use TrackLayer to bridge assignment systems, deduplicate users, and produce experiment readouts your product and analytics teams can defend.

Problem

Why client-side experiments miss 30% of conversions

Client-side experimentation feels simple because the browser sees the variant first. The problem is that the browser rarely sees the full customer journey anymore. iOS privacy changes reduced durable browser identity, in-app browsers strip context, Safari limits storage windows, and ad blockers routinely suppress analytics or experiment callbacks. A user can be cleanly assigned into variant B, continue to checkout, and still disappear from the experiment log because the exposure ping, identity cookie, or conversion callback never makes it through. The product team sees the UI change. The measurement layer sees fragments.

That gap widens when the conversion itself is server-confirmed. Orders, subscriptions, approvals, and qualified leads often finalize in backend systems long after the frontend session has drifted away. If your experiment relies on client-only event capture, you will undercount the very conversions that matter most, and the missing users are not random. Privacy-heavy devices and traffic sources become systematically underrepresented. That is how teams end up with experiment dashboards that report a clean lift on visible traffic while missing 30% of final conversions in the backend truth set.

Schema

TrackLayer experiment events

TrackLayer keeps the experiment contract intentionally small. Exposure tells the system who entered which variant. Conversion tells it which business outcome should be attributed back to that assignment. Early-stop tells it which records belong to a broken or intentionally aborted run.

event_namepurposerequired fields
experiment_exposedRecords that a user was assigned and actually saw or qualified for a variant.experiment_id, variant, user_id or anonymous_id, exposed_at
experiment_convertedRecords a success event tied back to the prior exposure at user level.experiment_id, variant, conversion_event, user_id, value or metric_value
experiment_ended_earlyCaptures aborted runs so paused tests are excluded from clean win-rate reporting.experiment_id, variant, user_id, ended_reason, ended_at
Build

4-step setup

Step 1

Define variants in /experiments

Create the experiment record first so TrackLayer has a canonical ID, readable variant labels, expected primary metric, and ownership metadata. This prevents the all-too-common state where Growth or Product runs a test under one slug while analytics exports use another slug a week later.

Step 2

Call client.trackExposure({ experiment_id, variant })

Your app should emit exposure only when the user truly qualifies for the test and the assigned experience is rendered or activated. Do not fire exposure on every page load before the variant decision is stable. Exposure is the denominator, so inflated counts destroy the validity of the readout.

Step 3

Let TrackLayer handle dedup + user-level attribution

TrackLayer joins anonymous sessions to known users, removes repeated exposures from refreshes or retries, and ties later conversion events back to the correct experiment assignment. That matters most when the conversion happens on another device, in a delayed backend job, or after the browser identity has changed.

Step 4

Review results in /experiments/[id]

The experiment detail view should show exposed users, converted users, conversion rate, lift, credible interval or confidence interval, and early-stop markers. Teams need one place where product, growth, and analytics are looking at the same denominator and the same stopping logic.

client.trackExposure({
  experiment_id: "checkout_headline_test",
  variant: "variant_b"
});
Bridges

Integration with popular tools

The cleanest architecture keeps assignment where your product team already works and keeps measurement where your data team can trust it. TrackLayer is the bridge layer, not a forced rewrite of your flagging stack.

toolexperiment API styleTrackLayer bridge approach
GrowthBookFeature rule evaluation with explicit tracking callbackSend the GrowthBook assignment callback into TrackLayer as experiment_exposed, then forward purchase or signup events as experiment_converted.
StatsigGate / experiment evaluation plus event loggingMap Statsig exposure logs to TrackLayer exposures and keep backend success metrics in TrackLayer for cross-channel dedup.
LaunchDarklyFlag evaluation with variation keys and event hooksUse variation keys as variant labels, emit exposure after evaluation, and join downstream business events through TrackLayer user identity.
VWOCampaign visitor assignment with goal callbacksTreat VWO as assignment UI, but send canonical exposure and conversion events through TrackLayer for warehouse-grade reporting.
OptimizelyDecision APIs with activate / track style callsMirror activate into experiment_exposed and route track goal events into experiment_converted with the same experiment_id and variant.
PostHogFeature flag and experiment events inside product analyticsUse PostHog for product-side assignment while TrackLayer receives exposure and server-confirmed conversions for cleaner revenue attribution.
Inference

Statistical significance

Teams usually talk about statistical significance as if there is only one valid framework. In practice, there are two common ones. Frequentist testing asks whether the observed difference would be unlikely under a null hypothesis. Bayesian analysis asks how probable each outcome is given the observed data and prior assumptions. Product teams often find Bayesian outputs easier to read because the answer is stated in plain probability terms rather than p-values and rejection thresholds.

The stopping rule matters more than the label. You should stop an experiment when the sample is large enough, the primary metric is stable, and the decision threshold has been met without obvious instrumentation issues. TrackLayer uses a 90% posterior threshold as the practical default: if one variant has at least a 90% posterior probability of beating the control on the primary metric, the result is eligible for decision review. That is not a license to stop the second the chart crosses 90%. It is a prompt to verify data quality, novelty effects, and business impact before rollout.

Pitfalls

Common mistakes

Peeking at results every few hours

If the team keeps checking until a chart looks good, the false-positive rate climbs fast. Decide the stopping rule first, then keep the experiment live until that rule is satisfied or the test is explicitly invalidated.

Treating correlated metrics as independent wins

Revenue, checkout completion, and average order value often move together. Declaring three separate wins from one shopper behavior change overstates confidence and creates noisy decision logs.

Ignoring user-level dedup

Page refreshes, reconnects, retries, and multiple tabs can emit the same exposure or conversion more than once. Without user-level dedup you are not measuring experimental lift, you are measuring logging noise.

Testing too many variants at once

A seven-variant layout test feels ambitious but usually starves each branch of traffic. If you truly need many competing options, use a multi-armed bandit system for allocation rather than pretending a tiny sample can support classic A/B conclusions.

Treating novelty effect as a real win

A new badge, animation, or pricing layout can spike engagement for a few days because it is different, not because it is better. Watch whether the effect persists after the first wave of curiosity fades.

Warehouse

Reporting

Experiment readouts should not stay trapped in one UI. TrackLayer can export experiment exposures, conversions, attribution joins, and decision metadata into BigQuery or Snowflake so your stats team can rerun the math, compare segment-level effects, and check whether the product team's chosen winner still holds under a stricter model. That export layer is also where finance and analytics teams can reconcile experiment results against the canonical orders table instead of relying on front-end-only counters.

The practical pattern is to export one clean row per user-event association: experiment ID, variant, exposure timestamp, conversion timestamp, metric value, user key, dedup status, and destination source. BigQuery is often the fastest route for SQL analysis and Looker-style dashboards. Snowflake is common when the stats team already runs governed models there. In both cases, TrackLayer should be the source that standardizes the experiment log before warehouse review starts.

FAQ

Common questions

What counts as an exposure?

An exposure should mean the user was assigned to a variant and the experience actually became eligible or visible. Assignment in memory without rendering is usually not enough.

Can I track conversions entirely server-side?

Yes, and for revenue, lead qualification, and subscription events that is usually the better approach. Server-confirmed conversions are harder to block and easier to deduplicate than browser-only goal events.

Why does TrackLayer care about user-level attribution instead of session-level attribution?

Because experiments are decisions about people, not tabs. The same person can refresh, switch devices, or convert hours later. User-level linkage preserves the real treatment effect.

When should I mark experiment_ended_early?

Use it when the run is invalidated by a rollout bug, targeting mistake, pricing incident, or any operational issue that makes the observed data untrustworthy for winner selection.

Does TrackLayer replace my experimentation tool?

Usually no. The experimentation tool still handles assignment and rollout. TrackLayer acts as the measurement layer that standardizes exposure and conversion data across tools and exports.

Next reads

Related implementation guides

We use essential cookies to keep the site secure and functional. Analytics and third-party tags run only with your consent. See our Cookie Policy.

We use essential cookies to keep the site secure and functional. Analytics and third-party tags run only with your consent. See our Cookie Policy.