Linear Regression Calculator
Find the linear relationship between two variables by computing the slope, correlation coefficient, and R² from paired data points. Use it when analyzing trends in sales, scientific experiments, or any dataset where you want to predict one variable from another.
About this calculator
Simple linear regression models the relationship between an independent variable X and a dependent variable Y using the equation Ŷ = a + bX, where b is the slope and a is the intercept. The slope is calculated as b = (n·ΣXY − ΣX·ΣY) / (n·ΣX² − (ΣX)²). The Pearson correlation coefficient r = (n·ΣXY − ΣX·ΣY) / √[(n·ΣX² − (ΣX)²)(n·ΣY² − (ΣY)²)] measures the strength and direction of the linear association, ranging from −1 to +1. The coefficient of determination R² = r² tells you what proportion of variance in Y is explained by X. For example, R² = 0.81 means 81% of the variation in Y is accounted for by the linear model. These metrics together describe both the direction and the predictive power of the relationship.
How to use
Suppose you have three data points: X = 1, 2, 3 and Y = 2, 4, 5. Step 1: n = 3, ΣX = 6, ΣY = 11, ΣXY = (1×2)+(2×4)+(3×5) = 25, ΣX² = 14, ΣY² = 45. Step 2: Slope b = (3×25 − 6×11) / (3×14 − 6²) = (75 − 66) / (42 − 36) = 9 / 6 = 1.5. Step 3: r = (3×25 − 6×11) / √[(3×14 − 36)(3×45 − 121)] = 9 / √[6 × 14] = 9 / √84 ≈ 0.982. Step 4: R² ≈ 0.964, meaning 96.4% of the variance in Y is explained by X.
Frequently asked questions
What does the correlation coefficient r tell me about my data?
The correlation coefficient r measures both the strength and direction of the linear relationship between two variables, ranging from −1 to +1. A value near +1 indicates a strong positive relationship (as X increases, Y increases), while a value near −1 indicates a strong negative relationship. Values close to 0 suggest little to no linear association. Note that r only captures linear relationships — two variables can have a strong curved relationship yet show r ≈ 0. Always visualize your data with a scatter plot alongside calculating r.
What is R-squared and what is a good R-squared value?
R² (the coefficient of determination) equals r² and represents the proportion of variability in Y that is explained by the linear regression model. An R² of 0.75 means 75% of the variation in Y is accounted for by X. What constitutes a 'good' R² depends heavily on the field — physical sciences often expect R² above 0.99, while social sciences may consider 0.3 meaningful. A high R² does not guarantee the model is appropriate; always check residual plots to verify linear assumptions are met.
How is the slope of a regression line used for prediction?
The slope b tells you how much Y is expected to change for each one-unit increase in X. Once you have the slope and intercept from the regression, you can substitute any new X value into Ŷ = a + bX to generate a predicted Y. For instance, a slope of 2.5 in a height-weight regression means each additional centimeter of height is associated with 2.5 kg more weight on average. Predictions are most reliable when the new X falls within the range of your original data; extrapolating far beyond that range increases the risk of inaccurate predictions.