Question 1

What does linear regression actually compute?

Accepted Answer

Linear regression finds the unique straight line y = a + b·x that minimises the sum of squared vertical distances from observed points to the line. The slope b represents the average change in y per one-unit increase in x; the intercept a represents the predicted y when x = 0 (often outside the range of the data, in which case it lacks a direct physical interpretation). The least-squares procedure gives unbiased and minimum-variance estimates of slope and intercept under the standard assumptions (linearity, independent errors, constant variance, normality). Linear regression also provides r² (the proportion of variance in y explained by x), confidence intervals on the slope and intercept, and prediction intervals for individual future observations. It is the workhorse of all parametric statistical modelling — multiple regression, ANOVA, ANCOVA, mixed models, and most machine learning regression methods extend this core framework in various directions.

Question 2

How do I interpret slope and intercept in real-world terms?

Accepted Answer

The slope is in units of (y per unit of x): salary per year of experience ($/yr), price per square foot ($/ft²), exam score per hour studied (points/hr). It tells you the marginal rate at which y is associated with x — not a deterministic prediction but an average pattern. The intercept is in units of y, and represents the predicted y value when x = 0. The intercept is interpretable only when x = 0 is meaningful for your data: if you fit weight vs height in adults (where height = 0 is meaningless), the intercept is just a mathematical artefact. For interpretive purposes, sometimes researchers center x at its mean (using x − x̄ instead of x), which makes the intercept the predicted y at the average x — often more meaningful. The slope’s interpretation does not change when x is centred or scaled, but the units do.

Question 3

What is r² and how does it relate to the regression slope?

Accepted Answer

r² (coefficient of determination) is the fraction of variance in y explained by x: r² = SSR/SST = 1 − SSE/SST, where SSR is the sum of squares explained by the regression, SSE is the sum of squared residuals, and SST is the total sum of squares of y. r² ranges from 0 (the model explains nothing — slope is essentially 0) to 1 (the model explains all variance — every point lies exactly on the line). For simple linear regression, r² equals the square of the Pearson correlation coefficient r. The slope b and r² are linked but distinct: a steep slope with a noisy scatter has high slope and low r²; a small slope with a tight cluster has low slope and high r². Reporting only one is misleading — slope is the rate of change, r² is how reliably the line predicts. Always report both, plus a scatter plot.

Question 4

What are the most common mistakes people make with linear regression?

Accepted Answer

The first is interpreting slope as causation; a strong slope just shows correlation, and many confounders can produce apparent linear relationships. The second is extrapolating outside the observed x range; a model fit to x ∈ [1, 5] may be wildly wrong at x = 100. The third is fitting linear models to nonlinear data; visual inspection of residuals reveals curvature that the model misses, and r² alone is not enough to detect this. The fourth is ignoring outliers, which can flip both slope and intercept dramatically. The fifth is using regression on autocorrelated time-series data without correcting for temporal dependence; classical inference assumes independent errors. The sixth is reporting only the slope without standard error, confidence interval, or hypothesis test on whether slope differs significantly from zero. The seventh is fitting models to summary statistics that hide group-level structure — Simpson’s paradox can reverse the slope’s sign when within-group patterns differ from the pooled pattern.

Question 5

When should I not use this calculator?

Accepted Answer

Skip it for non-linear relationships — try polynomial regression, log-transformations, or non-linear models (exponential, logistic). Avoid it for small samples (n < 5) where the slope estimate is highly unstable and inference is unreliable. It is the wrong tool when ordinary least squares assumptions are violated: heteroscedastic errors (variance depends on x), autocorrelated errors (time series), or non-normal errors with heavy tails — these need weighted least squares, ARIMA, or robust regression. Do not use it for categorical predictors without dummy-coding them properly first. Skip it for high-dimensional regression (many predictors), which requires multiple regression with potential regularisation (ridge, lasso, elastic net). And for predictions outside the observed x range, treat results with extreme caution — linear extrapolation can be drastically wrong far from the data, especially when the true relationship is curved or saturates.

Linear Regression Calculator

Compare with similar

About this calculator

How to use

Frequently asked questions

What does linear regression actually compute?

How do I interpret slope and intercept in real-world terms?

What is r² and how does it relate to the regression slope?

What are the most common mistakes people make with linear regression?

When should I not use this calculator?

Sources & references