Statistical Power Calculator
Calculates the statistical power of a hypothesis test or the sample size needed to detect a given effect. Use it when designing experiments to ensure your study can reliably detect meaningful differences.
About this calculator
Statistical power (1 − β) is the probability that a test correctly rejects a false null hypothesis. It depends on four interrelated quantities: effect size (Cohen's d = (μ₁ − μ₂) / σ), significance level α, sample size n (per group), and the chosen power target. For a two-sample z-test the non-centrality parameter is NCP = d × √(n/2), and power = 1 − Φ(z_α − NCP), where Φ is the standard normal CDF and z_α is the critical value (1.96 for α = 0.05). To find the required sample size per group: n = 2 × ((z_α + z_β) / d)², where z_β = 0.84 for 80% power. Cohen's benchmarks classify d = 0.2 as small, 0.5 as medium, and 0.8 as large. Low power (< 0.80) risks a Type II error — missing a real effect.
How to use
You are planning a two-group study with effect size d = 0.5, α = 0.05, and you want to know the required sample size for 80% power. Step 1 — set z_α = 1.96 (for α = 0.05) and z_β = 0.84 (for 80% power). Step 2 — apply the formula: n = 2 × ((1.96 + 0.84) / 0.5)² = 2 × (2.80 / 0.5)² = 2 × (5.6)² = 2 × 31.36 = 62.72. Step 3 — round up to n = 63 participants per group (126 total). Enter d = 0.5, α = 0.05, select 'sample size', and the calculator returns 63.
Frequently asked questions
What is a good level of statistical power for a research study?
The conventional minimum is 0.80 (80%), meaning the study has an 80% chance of detecting a true effect of the specified size. Many funding bodies and journals now recommend 0.90 or higher to reduce the risk of underpowered null results. Power below 0.80 is generally considered insufficient because it makes Type II errors (false negatives) unacceptably likely. The appropriate target ultimately depends on the costs of missing a real effect in your specific domain.
How does effect size influence the required sample size in a power analysis?
Effect size and required sample size are inversely related: smaller effects require much larger samples to detect reliably. For example, detecting a small effect (d = 0.2) at 80% power requires roughly 394 participants per group, while a large effect (d = 0.8) requires only about 26. This is why pilot studies are valuable — even a rough estimate of effect size can prevent costly over- or under-powered designs. Always base effect size on prior literature or a minimally meaningful difference, not on preliminary data alone.
What is the difference between Type I and Type II errors in hypothesis testing?
A Type I error (false positive) occurs when you reject a true null hypothesis; its probability is controlled by the significance level α. A Type II error (false negative) occurs when you fail to reject a false null hypothesis; its probability is β = 1 − power. Reducing α (e.g., from 0.05 to 0.01) lowers Type I errors but increases Type II errors unless you also raise the sample size. Balancing both error rates is a central goal of study design, and power analysis makes that trade-off explicit.