statistics calculators

Correlation Coefficient Calculator

Compute the Pearson correlation coefficient r and coefficient of determination R² from summary statistics of paired data. Use it to measure the linear relationship between two variables in research or data analysis.

About this calculator

The Pearson correlation coefficient r measures the strength and direction of the linear relationship between two variables, ranging from −1 (perfect negative) to +1 (perfect positive), with 0 indicating no linear association. The formula is: r = (n·ΣXY − ΣX·ΣY) / √[(n·ΣX² − (ΣX)²) · (n·ΣY² − (ΣY)²)]. Each term in the formula uses summary statistics computed from raw data: ΣX (sum of X), ΣY (sum of Y), ΣXY (sum of products), ΣX² (sum of squared X), ΣY² (sum of squared Y), and n (sample size). The coefficient of determination R² = r² indicates what proportion of the variance in Y is explained by X. For example, r = 0.9 means R² = 0.81, so 81% of variability in Y is accounted for by the linear relationship with X. Correlations above |0.7| are generally considered strong.

How to use

Use n = 4 pairs: (1,2),(2,4),(3,5),(4,8). Compute: ΣX = 10, ΣY = 19, ΣXY = 1·2+2·4+3·5+4·8 = 2+8+15+32 = 57, ΣX² = 1+4+9+16 = 30, ΣY² = 4+16+25+64 = 109. Now apply the formula: numerator = 4·57 − 10·19 = 228 − 190 = 38. Denominator = √[(4·30 − 100)·(4·109 − 361)] = √[20 · 75] = √1500 ≈ 38.73. r = 38 / 38.73 ≈ 0.981. R² ≈ 0.962, meaning ~96% of variance in Y is explained by X.

Frequently asked questions

What does a Pearson correlation coefficient of 0.7 mean in practice?

A correlation of r = 0.7 indicates a strong positive linear relationship: as one variable increases, the other tends to increase consistently. The coefficient of determination R² = 0.49 means 49% of the variation in the response variable is explained by the predictor. In social sciences, r = 0.7 is considered high; in physical sciences the bar is often higher. It does not imply causation — a third variable could be driving both.

How is the Pearson r different from Spearman rank correlation?

Pearson r measures linear association and requires data to be continuous and roughly normally distributed. Spearman's rank correlation measures monotonic (not necessarily linear) association by ranking the values first, making it suitable for ordinal data or non-normal distributions. When both are applicable, Pearson r is more statistically powerful. However, a single extreme outlier can drastically distort Pearson r while having minimal effect on Spearman's coefficient, making Spearman the safer choice for skewed data.

Why can a high correlation coefficient still indicate a weak relationship for prediction?

R² (the square of r) tells you the proportion of variance in Y explained by X, and it can be surprisingly low even for moderate correlations. For instance, r = 0.5 gives R² = 0.25, meaning only 25% of variability is accounted for — leaving 75% unexplained. Prediction accuracy depends on R², not r alone. Additionally, correlation only captures linear patterns; a strong curved relationship may show r ≈ 0 even though the two variables are tightly related in a non-linear way.