Q: What are the most common mistakes people make with chi-square tests?

The first is using chi-square on continuous data — chi-square is for categorical/count data; for continuous variables use t-tests, ANOVA, or correlations. The second is using counts smaller than the rule-of-thumb threshold (E ≥ 5), which inflates the false-positive rate or produces misleading p-values; use Fisher’s exact test in that regime. The third is using chi-square on paired data (the same subject measured twice) instead of McNemar’s test. The fourth is using the chi-square statistic without dividing by expected — just reporting Σ(O − E)² is not chi-square. The fifth is forgetting that significance depends on degrees of freedom: a χ² of 5 is significant at df = 1 but not at df = 5. The sixth is interpreting a significant χ² as evidence about which specific cells differ — for that you need post-hoc analysis (standardised residuals or pairwise comparisons with multiple-testing correction). And the seventh is treating chi-square as a measure of effect size — it’s not; use Cramér’s V or Cohen’s w for effect size in contingency tables.

Q: When should I not use this calculator?

Skip it for continuous data — chi-square is only for counts of categorical observations. Avoid it when any expected cell count is less than 5 — use Fisher’s exact test or simulation-based p-values instead. It is the wrong tool for paired or repeated-measures categorical data; use McNemar’s test for 2×2 paired, or the Cochran-Mantel-Haenszel test for stratified designs. Do not use it for testing differences in means or variances — those require t-tests, ANOVA, or Levene’s test. Skip it for very small total samples (n < 20) where exact tests give more reliable answers. And do not use it as the sole tool for assessing the relationship between two categorical variables — pair it with an effect size (Cramér’s V, Cohen’s w) and a visualisation (mosaic plot, stacked bar chart) so readers can see both significance and magnitude.

Question 1

What does the chi-square statistic actually measure?

Accepted Answer

Chi-square measures the discrepancy between observed counts and expected counts under a null hypothesis, weighted so that each cell’s contribution is proportional to the squared deviation relative to its expected count. A larger chi-square means the observed data deviates more from H₀ — providing more evidence against the null. The statistic is converted to a p-value using the chi-square distribution with appropriate degrees of freedom: small p-values reject H₀, suggesting the observed pattern is unlikely under the null. The weighting by 1/E means small expected counts have outsized influence — a cell with E = 2 and O = 5 contributes (3)²/2 = 4.5, while a cell with E = 100 and O = 110 contributes only (10)²/100 = 1.0. This is why ‘expected counts ≥ 5’ is the rule-of-thumb requirement for chi-square validity — smaller expected counts make the test sensitive to tiny absolute deviations that may not be meaningful.

Question 2

What is the difference between goodness-of-fit and tests of independence?

Accepted Answer

Goodness-of-fit tests whether observed category frequencies match a single expected distribution — for example, whether a die is fair (expected 1/6 in each face), whether colour preferences match a marketer’s hypothesis, or whether genetic-cross outcomes match Mendelian ratios. Degrees of freedom = (number of categories) − 1 − (number of parameters estimated from data). Test of independence checks whether two categorical variables in a contingency table are associated — whether smoking status is related to lung cancer, or product preference to age group. Degrees of freedom = (rows − 1) × (columns − 1). Both use the same per-cell formula and the same chi-square reference distribution, but the degrees of freedom and the way expected counts are computed differ. Goodness-of-fit uses a pre-specified distribution; independence uses row and column marginals.

Question 3

What are the assumptions of the chi-square test?

Accepted Answer

Three main assumptions: (1) observations are independent — each subject contributes to only one cell, not multiple; (2) expected counts are large enough — typically Eᵢ ≥ 5 in every cell for the asymptotic chi-square distribution to apply (some textbooks allow up to 20% of cells with E between 1 and 5, but no cell with E < 1); (3) data are random samples from the population, not biased or self-selected. Violations: paired or repeated-measures data needs McNemar’s test, not chi-square. Sparse contingency tables (many cells with low expected counts) need Fisher’s exact test instead. Continuity correction (Yates’ correction) is sometimes applied for 2×2 tables to improve the approximation, though modern practice often skips it because it is overly conservative. For very small samples (total n < 20) prefer exact tests; the chi-square approximation only kicks in around n ≥ 30–50.

Question 4

What are the most common mistakes people make with chi-square tests?

Accepted Answer

The first is using chi-square on continuous data — chi-square is for categorical/count data; for continuous variables use t-tests, ANOVA, or correlations. The second is using counts smaller than the rule-of-thumb threshold (E ≥ 5), which inflates the false-positive rate or produces misleading p-values; use Fisher’s exact test in that regime. The third is using chi-square on paired data (the same subject measured twice) instead of McNemar’s test. The fourth is using the chi-square statistic without dividing by expected — just reporting Σ(O − E)² is not chi-square. The fifth is forgetting that significance depends on degrees of freedom: a χ² of 5 is significant at df = 1 but not at df = 5. The sixth is interpreting a significant χ² as evidence about which specific cells differ — for that you need post-hoc analysis (standardised residuals or pairwise comparisons with multiple-testing correction). And the seventh is treating chi-square as a measure of effect size — it’s not; use Cramér’s V or Cohen’s w for effect size in contingency tables.

Question 5

When should I not use this calculator?

Accepted Answer

Skip it for continuous data — chi-square is only for counts of categorical observations. Avoid it when any expected cell count is less than 5 — use Fisher’s exact test or simulation-based p-values instead. It is the wrong tool for paired or repeated-measures categorical data; use McNemar’s test for 2×2 paired, or the Cochran-Mantel-Haenszel test for stratified designs. Do not use it for testing differences in means or variances — those require t-tests, ANOVA, or Levene’s test. Skip it for very small total samples (n < 20) where exact tests give more reliable answers. And do not use it as the sole tool for assessing the relationship between two categorical variables — pair it with an effect size (Cramér’s V, Cohen’s w) and a visualisation (mosaic plot, stacked bar chart) so readers can see both significance and magnitude.

Chi-Square Test Calculator

Compare with similar

About this calculator

How to use

Frequently asked questions

What does the chi-square statistic actually measure?

What is the difference between goodness-of-fit and tests of independence?

What are the assumptions of the chi-square test?

What are the most common mistakes people make with chi-square tests?

When should I not use this calculator?

Sources & references