Q: What are the most common mistakes people make computing or interpreting correlation?

The first is reporting r without ever looking at the scatter plot — Anscombe's quartet famously shows four data sets with identical r ≈ 0.82 but completely different shapes (one linear, one curved, one with a single huge outlier, one with all but one point identical). The second is conflating r with r²; r = 0.4 sounds substantial but r² = 0.16 means X explains only 16% of Y's variance, which is often unimpressive. The third is treating r as causal evidence — covered above. The fourth is failing to spot outlier-driven correlations: a single extreme point can push r from 0.0 to 0.6 with no real relationship in the bulk of the data, or hide a strong relationship in the bulk. The fifth is computing r on truncated data (restricting range on X) and concluding "no relationship" because r drops sharply — range restriction always attenuates r even when the underlying relationship is strong.

Q: When should I not use this calculator?

Skip it for non-linear relationships — Pearson r will dramatically understate the strength of curved or quadratic associations. Do not use it on ordinal or rank data; use Spearman ρ or Kendall τ instead. It is the wrong tool when one of your variables is categorical (use point-biserial, phi, or Cramér's V depending on the situation). Avoid it for time-series data without first checking for autocorrelation and trend — both can inflate r without any genuine cross-variable relationship. Do not use it for very small samples (n < 5); confidence intervals on r are extremely wide there and small-sample r values are essentially noise. Finally, never report a single correlation coefficient as evidence of a relationship without also showing the scatter plot, sample size, and ideally a confidence interval or p-value.

Question 1

How strong does a correlation need to be to "matter"?

Accepted Answer

There is no universal threshold — context dominates. Rough conventions: |r| > 0.7 is often called strong, 0.4–0.7 moderate, and below 0.4 weak. But these are field-dependent. In physics or engineering, where measurements are precise and the underlying relationship is deterministic, you would expect |r| above 0.95 for a "real" relationship; anything below should make you suspicious. In social sciences and behavioural research, r values of 0.3–0.5 are routinely treated as meaningful because the underlying phenomena are noisy. r² is often the more practical statistic: r = 0.5 means r² = 0.25, so X explains only a quarter of the variation in Y — most of what is happening to Y is driven by something else. Always report sample size and ideally a confidence interval for r, because small samples can produce dramatic-looking correlations purely by chance.

Question 2

Does correlation imply causation?

Accepted Answer

No, and this is the most repeated warning in statistics for good reason. A non-zero r tells you only that two variables move together linearly in your sample; it does not say one causes the other. There are four common alternative explanations to keep in mind. (1) Reverse causation: maybe Y causes X, not the other way around. (2) Confounding: a third variable Z drives both X and Y, producing a spurious correlation between them. (3) Selection bias: the sample over-represents pairs where X and Y happen to align. (4) Coincidence: with enough variables and small enough samples, some correlations are random noise. Establishing causation requires controlled experiments (randomised assignment), natural experiments, instrumental variables, or rigorous causal-inference techniques (DAGs, propensity scores). Correlation is a useful first clue, never a conclusion.

Question 3

When should I use Pearson r vs Spearman ρ vs Kendall τ?

Accepted Answer

Use Pearson r when both variables are continuous, approximately normally distributed, and the relationship is genuinely linear. Use Spearman's rank correlation (ρ) when the relationship is monotonic but not linear (Y consistently increases with X but not in a straight line), or when your data contains influential outliers — Spearman operates on ranks rather than raw values and is therefore robust. Use Kendall's tau when sample size is small (n < 20), when there are many tied ranks, or when you want a more conservative measure of association (Kendall typically gives smaller numbers than Spearman on the same data). All three measure association; only Pearson assumes linearity. If a scatter plot shows a clear curve, Pearson will understate the true strength of the relationship — switch to Spearman or fit a non-linear model.

Question 4

What are the most common mistakes people make computing or interpreting correlation?

Accepted Answer

The first is reporting r without ever looking at the scatter plot — Anscombe's quartet famously shows four data sets with identical r ≈ 0.82 but completely different shapes (one linear, one curved, one with a single huge outlier, one with all but one point identical). The second is conflating r with r²; r = 0.4 sounds substantial but r² = 0.16 means X explains only 16% of Y's variance, which is often unimpressive. The third is treating r as causal evidence — covered above. The fourth is failing to spot outlier-driven correlations: a single extreme point can push r from 0.0 to 0.6 with no real relationship in the bulk of the data, or hide a strong relationship in the bulk. The fifth is computing r on truncated data (restricting range on X) and concluding "no relationship" because r drops sharply — range restriction always attenuates r even when the underlying relationship is strong.

Question 5

When should I not use this calculator?

Accepted Answer

Skip it for non-linear relationships — Pearson r will dramatically understate the strength of curved or quadratic associations. Do not use it on ordinal or rank data; use Spearman ρ or Kendall τ instead. It is the wrong tool when one of your variables is categorical (use point-biserial, phi, or Cramér's V depending on the situation). Avoid it for time-series data without first checking for autocorrelation and trend — both can inflate r without any genuine cross-variable relationship. Do not use it for very small samples (n < 5); confidence intervals on r are extremely wide there and small-sample r values are essentially noise. Finally, never report a single correlation coefficient as evidence of a relationship without also showing the scatter plot, sample size, and ideally a confidence interval or p-value.

Correlation Coefficient Calculator

Compare with similar

About this calculator

How to use

Frequently asked questions

How strong does a correlation need to be to "matter"?

Does correlation imply causation?

When should I use Pearson r vs Spearman ρ vs Kendall τ?

What are the most common mistakes people make computing or interpreting correlation?

When should I not use this calculator?

Sources & references