marketing calculators

A/B Test Statistical Significance Calculator

Determine whether the difference in conversion rates between your control and treatment groups is statistically significant, not just random noise. Use it before declaring a winning variant.

About this calculator

This calculator uses a two-proportion z-test to determine if an observed difference in conversion rates is real. The z-score is: z = (p₂ − p₁) / √(p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂), where p₁ = controlConversions / controlVisitors, p₂ = treatmentConversions / treatmentVisitors, and n₁, n₂ are the respective sample sizes. If the absolute z-score exceeds the critical value — 1.96 for 95% confidence or 2.576 for 99% confidence — the result is statistically significant. A 95% confidence level means there is only a 5% probability that the observed difference is due to chance. The critical values come from the standard normal distribution. Higher confidence levels require larger sample sizes to detect the same effect.

How to use

Control: 2,000 visitors, 80 conversions → p₁ = 80/2000 = 0.04 (4%). Treatment: 2,000 visitors, 100 conversions → p₂ = 100/2000 = 0.05 (5%). Numerator: 0.05 − 0.04 = 0.01. Denominator: √(0.04×0.96/2000 + 0.05×0.95/2000) = √(0.0000192 + 0.00002375) = √0.00004295 ≈ 0.00655. Z = 0.01 / 0.00655 ≈ 1.527. At 95% confidence the critical value is 1.96. Since 1.527 < 1.96, the result is NOT significant — you need more data before declaring a winner.

Frequently asked questions

What does statistical significance mean in an A/B test?

Statistical significance means the probability that your observed difference in conversion rates occurred by random chance is below a threshold you set in advance, called the significance level (α). At 95% confidence, α = 0.05, meaning you accept a 5% chance of a false positive (concluding there is a real difference when there isn't). It does not tell you the size of the effect or whether it is practically meaningful — a statistically significant result can still represent a trivially small conversion lift. Always pair significance with effect size and business impact.

How many visitors do I need for a valid A/B test?

Required sample size depends on three factors: your baseline conversion rate, the minimum detectable effect (MDE) you care about, and your desired confidence level. As a rough rule, detecting a 10% relative lift (e.g., from 5% to 5.5%) at 95% confidence typically requires thousands of visitors per variant. You can use a sample size calculator before starting your test to avoid under-powered experiments, which are the most common cause of misleading A/B test results. Running a test too early and stopping when you see a positive result inflates false-positive rates significantly.

Why should I use 99% confidence instead of 95% confidence for some A/B tests?

Use 99% confidence when the cost of a false positive is high — for example, when rolling out a change that is expensive to reverse, affects a core revenue flow, or will be seen by all users permanently. The tradeoff is that 99% confidence requires roughly 70% more traffic to achieve the same statistical power as a 95% test. For low-stakes UI tweaks or early-stage exploration, 95% is usually sufficient. Many teams also use 90% confidence for initial screening tests and reserve 99% for final go/no-go decisions.