statistics calculators

Linear Regression Calculator

Find the best-fit straight line through your data by calculating the slope and intercept of a linear regression model. Use it to predict outcomes, quantify relationships, and assess how well X explains Y.

About this calculator

Simple linear regression models the relationship between two variables as ŷ = a + bx, where b is the slope and a is the intercept. The slope is computed from summary statistics using the formula b = (n·ΣXY − ΣX·ΣY) / (n·ΣX² − (ΣX)²), which minimises the sum of squared residuals (the ordinary least squares criterion). Once b is known, the intercept is found as a = (ΣY − b·ΣX) / n, or equivalently a = ȳ − b·x̄. The correlation coefficient r measures the strength and direction of the linear relationship, ranging from −1 to +1, while r² (the coefficient of determination) tells you the proportion of variance in Y explained by X. These calculations require only five aggregate inputs — n, ΣX, ΣY, ΣXY, and ΣX² — making them efficient for large datasets.

How to use

You have 4 data points: (1,2), (2,4), (3,5), (4,4). Compute the inputs: n = 4, ΣX = 10, ΣY = 15, ΣXY = 1×2 + 2×4 + 3×5 + 4×4 = 2+8+15+16 = 41, ΣX² = 1+4+9+16 = 30. Enter these values. Slope b = (4×41 − 10×15) / (4×30 − 10²) = (164 − 150) / (120 − 100) = 14 / 20 = 0.70. Intercept a = (15 − 0.70×10) / 4 = (15 − 7) / 4 = 2.00. The regression line is ŷ = 2.00 + 0.70x, meaning Y increases by 0.70 units for each one-unit increase in X.

Frequently asked questions

What do the slope and intercept tell me in a linear regression?

The slope (b) represents the expected change in the dependent variable Y for every one-unit increase in the independent variable X. A positive slope means Y increases as X increases; a negative slope means Y decreases. The intercept (a) is the predicted value of Y when X equals zero. In many real-world contexts the intercept may not be directly interpretable — for example, predicting salary at zero years of experience — but it is still necessary for placing the regression line correctly. Together, they define a unique straight line that best fits your data in a least-squares sense.

How do I interpret R-squared in a linear regression model?

R-squared (r²) is the coefficient of determination and ranges from 0 to 1. It tells you the proportion of total variability in Y that is explained by the linear relationship with X. An r² of 0.85 means 85% of the variation in Y is accounted for by X, with the remaining 15% due to other factors or random noise. Higher r² values indicate a better-fitting model, but context matters — an r² of 0.50 might be excellent in social science research yet poor in engineering. R-squared alone does not confirm that the model is appropriate; always inspect a residual plot for patterns.

What assumptions must be met for linear regression results to be valid?

Linear regression rests on four core assumptions: linearity (the relationship between X and Y is actually linear), independence (observations are not correlated with each other), homoscedasticity (the variance of residuals is constant across all values of X), and normality of residuals (residuals are approximately normally distributed, especially important for inference). Violations of these assumptions can lead to biased coefficient estimates or incorrect p-values. You can check them visually with scatter plots and residual plots, or formally with tests like Breusch-Pagan for homoscedasticity and the Shapiro-Wilk test for normality.