Question 1

What do slope, intercept, and r² actually tell me?

Accepted Answer

The slope b is the rate of change: how much y changes per one-unit increase in x. A slope of 2 means y goes up by 2 for every one-unit increase in x. The intercept a is the predicted y when x = 0; it is the line's starting point on the y-axis. Often x = 0 is outside the range of your data and the intercept has no real-world meaning by itself — that is fine, it still anchors the line. r² is the fraction of variation in y that the line "explains" relative to the total variation; r² = 0.80 means 80% of the variability in y is captured by the linear model on x, and 20% is unexplained by the model. r² ranges from 0 (no linear association) to 1 (perfect fit). High r² does not mean the model is correct — it just means the line fits these data points well — and low r² does not always mean the variables are unrelated, only that the relationship is not linear.

Question 2

How is linear regression different from correlation?

Accepted Answer

Correlation (Pearson r) measures the strength and direction of the linear relationship between X and Y on a scale from −1 to +1 — it is symmetric in X and Y and dimensionless. Linear regression fits a directional model y = a + b·x where x is the predictor and y is the response; switching their roles produces a different line (the y-on-x line is not the same as the x-on-y line). Slope and r are related but not the same: b = r · (sy / sx), so they share a sign but have different scales. r² = correlation² gives the variance-explained interpretation that regression cares about. Use correlation when you simply want to quantify association; use regression when you want to predict y from x, quantify how much y changes per unit x, or build a model for further analysis.

Question 3

What assumptions does OLS regression rely on, and what happens if they are violated?

Accepted Answer

OLS regression makes four classic assumptions for inference (slope estimation works without them, but p-values and CIs don't): (1) Linearity — the true relationship between X and Y is linear; check with a scatter plot and a residual-vs-fitted plot. (2) Independence — observations are not correlated with each other; time-series data routinely violates this. (3) Homoscedasticity — residual variance is constant across X; "fan-shaped" residual plots indicate violation. (4) Normality of residuals — for inference to work in small samples; check with a Q-Q plot. Outliers and influential points are a separate concern: a single high-leverage point can drag the line dramatically. When assumptions fail, you have options: transform X or Y (log, square root), use robust regression (Huber, LAD), use generalised least squares for heteroscedasticity, or use time-series models for autocorrelation. The estimates remain unbiased even when assumptions fail; what fails is the uncertainty around them.

Question 4

What are the most common mistakes people make with linear regression?

Accepted Answer

The first is extrapolating beyond the range of the data — the model only describes behaviour where you have observations; predictions far outside that range are speculation. The second is treating r² as a goodness-of-fit verdict; r² near 1 does not mean the line is the right model (it just means it fits these data well), and r² near 0 can hide a strong non-linear relationship. The third is ignoring outliers and influential points; OLS is not robust, and a single bad point can flip slope from positive to negative. The fourth is confusing correlation with causation: "miles driven" and "engine wear" both rise with vehicle age but driving more does not necessarily cause wear if engine quality is the real driver. The fifth is fitting a line to data that is obviously curved — produces a meaningless slope and a useless r²; visualise first, then fit. Finally, do not report slope without standard error; a slope of 2 ± 0.1 is very different from a slope of 2 ± 5.

Question 5

When should I not use this calculator?

Accepted Answer

Skip it when your data is clearly non-linear (look at the scatter plot first); use polynomial regression, log-transforms, or non-linear fits instead. Do not use it for multiple regression (more than one predictor) — this calculator handles simple linear regression only; for multivariate models use a statistics package. It is the wrong tool for time-series data without first checking and correcting for autocorrelation; ARIMA, exponential smoothing, or other time-series models are more appropriate. Avoid it for datasets with extreme outliers unless you have first investigated whether to remove or down-weight them; robust regression (Theil-Sen, RANSAC) is better in those cases. Do not use it when you need confidence intervals, prediction intervals, or hypothesis tests on the slope — those require additional formulas and software (or at minimum the residual standard error). Finally, do not interpret slope and intercept causally without the right study design; regression describes association, not causation.

Linear Regression Calculator

Compare with similar

About this calculator

How to use

Frequently asked questions

What do slope, intercept, and r² actually tell me?

How is linear regression different from correlation?

What assumptions does OLS regression rely on, and what happens if they are violated?

What are the most common mistakes people make with linear regression?

When should I not use this calculator?

Sources & references