Hardy-Weinberg Equilibrium Calculator
Verify the Hardy-Weinberg equilibrium identity p² + 2pq + q² = 1 from two allele frequencies. Foundational population-genetics check that, in the absence of evolutionary forces, allele and genotype frequencies remain constant across generations.
Last updated: May 2026
Compare with similar
About this calculator
The Hardy-Weinberg principle (Hardy 1908, Weinberg 1908) states that in a sufficiently large, randomly mating population with no mutation, no migration, no selection, and no genetic drift, both allele frequencies (p, q) and genotype frequencies (p², 2pq, q²) remain constant from one generation to the next. For a single locus with two alleles A and a where p = frequency of A and q = frequency of a, the genotype frequencies are: p² for AA (homozygous dominant), 2pq for Aa (heterozygous), q² for aa (homozygous recessive), and these must sum to 1: p² + 2pq + q² = (p + q)² = 1. This calculator computes exactly that sum — a sanity check that p and q sum to 1 (so genotypes sum to 1) and that you've entered them correctly. Variables: p and q are allele frequencies, each between 0 and 1, and they must satisfy p + q = 1 for a two-allele locus. Edge cases: if p + q ≠ 1, the formula still computes a sum but it won't equal 1, which immediately flags an input error. Real-world value of Hardy-Weinberg: it's the null model against which population geneticists test for evolutionary change. Deviations from expected genotype frequencies (e.g., excess heterozygotes or homozygotes) indicate that one of the five assumptions is violated — selection against a genotype, inbreeding, population structure, migration, or mutation. Hardy-Weinberg is also the basis for inferring carrier frequencies of recessive genetic diseases from observed disease prevalence: if 1 in 10,000 newborns has cystic fibrosis (q² ≈ 0.0001, so q ≈ 0.01), then 2pq ≈ 2 · 0.99 · 0.01 ≈ 0.02 — about 1 in 50 people are heterozygous carriers. The principle generalises to more than two alleles (multinomial expansion) and to multiple loci, though the algebra gets denser.
How to use
Example 1 — Standard two-allele check. Allele A has frequency 0.6, allele a has frequency 0.4. Enter p = 0.6, q = 0.4. Sum = (0.6)² + 2(0.6)(0.4) + (0.4)² = 0.36 + 0.48 + 0.16 = 1.0. ✓ Confirms the allele frequencies are valid (sum to 1) and gives the expected genotype distribution: 36% AA homozygous, 48% Aa heterozygous, 16% aa homozygous. Example 2 — Predicting carrier frequency for a recessive disease. Cystic fibrosis affects roughly 1 in 2500 people of European descent, so q² = 1/2500 = 0.0004, giving q ≈ 0.02. Then p = 1 − 0.02 = 0.98. Enter p = 0.98, q = 0.02. Sum = (0.98)² + 2(0.98)(0.02) + (0.02)² = 0.9604 + 0.0392 + 0.0004 = 1.0. ✓ The heterozygous carrier frequency (2pq) is 0.0392 or about 1 in 25 people — a striking number that quantifies how common carriers of recessive diseases are even when the disease itself is rare. This is the core logic behind newborn screening and carrier-testing programs.
Frequently asked questions
What are the assumptions of Hardy-Weinberg equilibrium?
Five strict assumptions: (1) large population size — no genetic drift (random sampling effects that change allele frequencies in small populations); (2) random mating — no assortative mating, no inbreeding; (3) no mutation — allele frequencies don't change through new mutations or back-mutations; (4) no migration — no gene flow into or out of the population; (5) no natural selection — all genotypes have equal fitness. Real populations always violate at least some of these assumptions to some degree; Hardy-Weinberg is a null hypothesis, not a description of any actual population. Deviations from expected H-W genotype frequencies (tested with chi-square) indicate which assumption is violated and by how much. The principle is most useful as a baseline expectation against which to detect evolutionary forces or to estimate allele frequencies from genotype data when the assumptions approximately hold.
How do I calculate carrier frequency for a recessive genetic disease?
If the disease occurs in 1 out of N newborns, then q² = 1/N (assuming H-W holds for the population). Take the square root to get q = √(1/N). Then p = 1 − q (for a two-allele system). The carrier frequency is 2pq, which for rare diseases is approximately 2q (since p ≈ 1). Example: cystic fibrosis at 1/2500 → q² = 0.0004 → q = 0.02 → carriers 2 · 0.98 · 0.02 ≈ 0.039 ≈ 1 in 25 people. Tay-Sachs at 1/3500 in Ashkenazi Jews → q ≈ 0.017 → carriers ≈ 3.3% or 1 in 30. Phenylketonuria at 1/10000 → q ≈ 0.01 → carriers ≈ 2% or 1 in 50. The pattern: carrier frequencies are dramatically higher than disease frequencies, which is why pre-conception or prenatal carrier screening is so cost-effective for population health.
How do I test whether a population is in Hardy-Weinberg equilibrium?
Compare observed genotype counts to expected counts using a chi-square goodness-of-fit test. Step 1: calculate p and q from observed allele frequencies (count alleles, not genotypes). Step 2: compute expected genotype counts as p²·N (AA), 2pq·N (Aa), q²·N (aa) where N = total individuals. Step 3: χ² = Σ (observed − expected)² / expected, summed over the three genotypes. Step 4: compare against the chi-square distribution with degrees of freedom = (number of genotypes − number of alleles) = 3 − 2 = 1 for a biallelic locus. A p-value below 0.05 suggests the population deviates significantly from H-W expectations. Common causes: assortative mating, inbreeding (excess homozygotes), heterozygote advantage (excess heterozygotes), recent migration, sampling bias.
What are the most common mistakes people make with Hardy-Weinberg?
The first is forgetting that p + q must equal 1 for a two-allele locus; entering p = 0.6, q = 0.6 produces nonsense because the alleles can't each be 60% of the population. The second is confusing allele frequency (p, q) with genotype frequency (p², 2pq, q²); 30% of people having the disease genotype (q² = 0.3) is very different from 30% allele frequency (q = 0.3, q² = 0.09). The third is applying H-W expectations to small populations (less than ~1000 individuals) where genetic drift can produce substantial deviation even without selection. The fourth is assuming H-W applies across populations; combining genotype counts from two genetically distinct populations always produces an apparent excess of homozygotes (the Wahlund effect) even if each subpopulation is in H-W. The fifth is using H-W for sex-linked loci without adjustment; X-linked traits have different expected frequencies in males (hemizygous) vs females (homozygous or heterozygous).
When should I not use this calculator?
Skip it for loci with more than two alleles (most actual genetic loci have multiple variants); use the multinomial expansion (p² + q² + r² + 2pq + 2pr + 2qr = 1 for three alleles) or specialised population-genetics software. Don't use it for X-linked or Y-linked traits without adjusting for hemizygous males; the expected frequencies differ between sexes for sex-linked loci. Avoid it for populations known to violate H-W assumptions strongly (small isolated populations, populations with strong recent migration or selection, founder populations); the equilibrium frequencies don't apply meaningfully. It's the wrong tool for analysing multi-locus haplotypes, linkage disequilibrium, or polygenic traits where multiple loci interact. Finally, don't use it to "prove" no selection is occurring; the H-W test has limited power to detect mild selection over a few generations, and absence of evidence isn't evidence of absence.