Why client-side experiments miss 30% of conversions
Client-side experimentation feels simple because the browser sees the variant first. The problem is that the browser rarely sees the full customer journey anymore. iOS privacy changes reduced durable browser identity, in-app browsers strip context, Safari limits storage windows, and ad blockers routinely suppress analytics or experiment callbacks. A user can be cleanly assigned into variant B, continue to checkout, and still disappear from the experiment log because the exposure ping, identity cookie, or conversion callback never makes it through. The product team sees the UI change. The measurement layer sees fragments.
That gap widens when the conversion itself is server-confirmed. Orders, subscriptions, approvals, and qualified leads often finalize in backend systems long after the frontend session has drifted away. If your experiment relies on client-only event capture, you will undercount the very conversions that matter most, and the missing users are not random. Privacy-heavy devices and traffic sources become systematically underrepresented. That is how teams end up with experiment dashboards that report a clean lift on visible traffic while missing 30% of final conversions in the backend truth set.
TrackLayer experiment events
TrackLayer keeps the experiment contract intentionally small. Exposure tells the system who entered which variant. Conversion tells it which business outcome should be attributed back to that assignment. Early-stop tells it which records belong to a broken or intentionally aborted run.
| event_name | purpose | required fields |
|---|---|---|
| experiment_exposed | Records that a user was assigned and actually saw or qualified for a variant. | experiment_id, variant, user_id or anonymous_id, exposed_at |
| experiment_converted | Records a success event tied back to the prior exposure at user level. | experiment_id, variant, conversion_event, user_id, value or metric_value |
| experiment_ended_early | Captures aborted runs so paused tests are excluded from clean win-rate reporting. | experiment_id, variant, user_id, ended_reason, ended_at |
4-step setup
Define variants in /experiments
Create the experiment record first so TrackLayer has a canonical ID, readable variant labels, expected primary metric, and ownership metadata. This prevents the all-too-common state where Growth or Product runs a test under one slug while analytics exports use another slug a week later.
Call client.trackExposure({ experiment_id, variant })
Your app should emit exposure only when the user truly qualifies for the test and the assigned experience is rendered or activated. Do not fire exposure on every page load before the variant decision is stable. Exposure is the denominator, so inflated counts destroy the validity of the readout.
Let TrackLayer handle dedup + user-level attribution
TrackLayer joins anonymous sessions to known users, removes repeated exposures from refreshes or retries, and ties later conversion events back to the correct experiment assignment. That matters most when the conversion happens on another device, in a delayed backend job, or after the browser identity has changed.
Review results in /experiments/[id]
The experiment detail view should show exposed users, converted users, conversion rate, lift, credible interval or confidence interval, and early-stop markers. Teams need one place where product, growth, and analytics are looking at the same denominator and the same stopping logic.
client.trackExposure({
experiment_id: "checkout_headline_test",
variant: "variant_b"
});Integration with popular tools
The cleanest architecture keeps assignment where your product team already works and keeps measurement where your data team can trust it. TrackLayer is the bridge layer, not a forced rewrite of your flagging stack.
| tool | experiment API style | TrackLayer bridge approach |
|---|---|---|
| GrowthBook | Feature rule evaluation with explicit tracking callback | Send the GrowthBook assignment callback into TrackLayer as experiment_exposed, then forward purchase or signup events as experiment_converted. |
| Statsig | Gate / experiment evaluation plus event logging | Map Statsig exposure logs to TrackLayer exposures and keep backend success metrics in TrackLayer for cross-channel dedup. |
| LaunchDarkly | Flag evaluation with variation keys and event hooks | Use variation keys as variant labels, emit exposure after evaluation, and join downstream business events through TrackLayer user identity. |
| VWO | Campaign visitor assignment with goal callbacks | Treat VWO as assignment UI, but send canonical exposure and conversion events through TrackLayer for warehouse-grade reporting. |
| Optimizely | Decision APIs with activate / track style calls | Mirror activate into experiment_exposed and route track goal events into experiment_converted with the same experiment_id and variant. |
| PostHog | Feature flag and experiment events inside product analytics | Use PostHog for product-side assignment while TrackLayer receives exposure and server-confirmed conversions for cleaner revenue attribution. |
Statistical significance
Teams usually talk about statistical significance as if there is only one valid framework. In practice, there are two common ones. Frequentist testing asks whether the observed difference would be unlikely under a null hypothesis. Bayesian analysis asks how probable each outcome is given the observed data and prior assumptions. Product teams often find Bayesian outputs easier to read because the answer is stated in plain probability terms rather than p-values and rejection thresholds.
The stopping rule matters more than the label. You should stop an experiment when the sample is large enough, the primary metric is stable, and the decision threshold has been met without obvious instrumentation issues. TrackLayer uses a 90% posterior threshold as the practical default: if one variant has at least a 90% posterior probability of beating the control on the primary metric, the result is eligible for decision review. That is not a license to stop the second the chart crosses 90%. It is a prompt to verify data quality, novelty effects, and business impact before rollout.
Common mistakes
Peeking at results every few hours
If the team keeps checking until a chart looks good, the false-positive rate climbs fast. Decide the stopping rule first, then keep the experiment live until that rule is satisfied or the test is explicitly invalidated.
Treating correlated metrics as independent wins
Revenue, checkout completion, and average order value often move together. Declaring three separate wins from one shopper behavior change overstates confidence and creates noisy decision logs.
Ignoring user-level dedup
Page refreshes, reconnects, retries, and multiple tabs can emit the same exposure or conversion more than once. Without user-level dedup you are not measuring experimental lift, you are measuring logging noise.
Testing too many variants at once
A seven-variant layout test feels ambitious but usually starves each branch of traffic. If you truly need many competing options, use a multi-armed bandit system for allocation rather than pretending a tiny sample can support classic A/B conclusions.
Treating novelty effect as a real win
A new badge, animation, or pricing layout can spike engagement for a few days because it is different, not because it is better. Watch whether the effect persists after the first wave of curiosity fades.
Reporting
Experiment readouts should not stay trapped in one UI. TrackLayer can export experiment exposures, conversions, attribution joins, and decision metadata into BigQuery or Snowflake so your stats team can rerun the math, compare segment-level effects, and check whether the product team's chosen winner still holds under a stricter model. That export layer is also where finance and analytics teams can reconcile experiment results against the canonical orders table instead of relying on front-end-only counters.
The practical pattern is to export one clean row per user-event association: experiment ID, variant, exposure timestamp, conversion timestamp, metric value, user key, dedup status, and destination source. BigQuery is often the fastest route for SQL analysis and Looker-style dashboards. Snowflake is common when the stats team already runs governed models there. In both cases, TrackLayer should be the source that standardizes the experiment log before warehouse review starts.
Common questions
What counts as an exposure?
An exposure should mean the user was assigned to a variant and the experience actually became eligible or visible. Assignment in memory without rendering is usually not enough.
Can I track conversions entirely server-side?
Yes, and for revenue, lead qualification, and subscription events that is usually the better approach. Server-confirmed conversions are harder to block and easier to deduplicate than browser-only goal events.
Why does TrackLayer care about user-level attribution instead of session-level attribution?
Because experiments are decisions about people, not tabs. The same person can refresh, switch devices, or convert hours later. User-level linkage preserves the real treatment effect.
When should I mark experiment_ended_early?
Use it when the run is invalidated by a rollout bug, targeting mistake, pricing incident, or any operational issue that makes the observed data untrustworthy for winner selection.
Does TrackLayer replace my experimentation tool?
Usually no. The experimentation tool still handles assignment and rollout. TrackLayer acts as the measurement layer that standardizes exposure and conversion data across tools and exports.
Related implementation guides
Identity resolution
Understand how anonymous sessions, known users, and delayed conversions connect into one reliable customer graph.
Read guide →Dedup explained
See how TrackLayer keeps one user action from becoming multiple rows across browser events, retries, and destinations.
Read guide →BigQuery export
Move clean experiment and conversion data into BigQuery for warehouse analysis and statistics-team review.
Read guide →