Simple Linear Regression Calculator
Use this calculator to determine the slope (m) and y-intercept (b) of a simple linear regression line (y = mx + b) given the necessary summary statistics from your dataset.
Results:
Understanding Simple Linear Regression
Simple linear regression is a statistical method that allows us to model the relationship between two continuous variables: a dependent variable (Y) and an independent variable (X). The goal is to find the best-fitting straight line through the data points, which can then be used to predict the value of Y for a given value of X.
The Regression Line Equation
The equation for a simple linear regression line is typically expressed as:
Y = mX + b
- Y: The dependent variable (the one we are trying to predict).
- X: The independent variable (the one used for prediction).
- m: The slope of the regression line. It represents the change in Y for every one-unit change in X.
- b: The Y-intercept. This is the predicted value of Y when X is 0.
How is it Calculated?
The "best-fitting" line is determined using the method of least squares, which minimizes the sum of the squared differences between the observed Y values and the Y values predicted by the line. The formulas for the slope (m) and y-intercept (b) are derived from this principle:
Slope (m) Formula:
m = [n(Σxy) - (Σx)(Σy)] / [n(Σx²) - (Σx)²]
- n: The number of data points.
- Σx: The sum of all X values.
- Σy: The sum of all Y values.
- Σxy: The sum of the products of each X and Y pair.
- Σx²: The sum of the squared X values.
Y-intercept (b) Formula:
b = ȳ - m * x̄
- ȳ: The mean (average) of the Y values (Σy / n).
- x̄: The mean (average) of the X values (Σx / n).
- m: The calculated slope.
Practical Example
Imagine a marketing team wants to understand the relationship between advertising spend (X) and sales (Y). They collect data over 5 months:
Data Points (X, Y): (10, 25), (20, 45), (30, 60), (40, 80), (50, 100)
From this data, they calculate the following summary statistics:
- n (Number of Data Points): 5
- Σx (Sum of X values): 10 + 20 + 30 + 40 + 50 = 150
- Σy (Sum of Y values): 25 + 45 + 60 + 80 + 100 = 310
- Σxy (Sum of XY products): (10*25) + (20*45) + (30*60) + (40*80) + (50*100) = 250 + 900 + 1800 + 3200 + 5000 = 11150
- Σx² (Sum of X squared values): (10²) + (20²) + (30²) + (40²) + (50²) = 100 + 400 + 900 + 1600 + 2500 = 5500
Using these values in the calculator above:
- Slope (m): 1.85
- Y-intercept (b): 6.5
The resulting regression equation is: Y = 1.85X + 6.5
This means for every additional unit of advertising spend (X), sales (Y) are predicted to increase by 1.85 units. When advertising spend is zero, predicted sales are 6.5 units.
Limitations
While powerful, simple linear regression assumes a linear relationship between variables. It's crucial to visualize your data (e.g., with a scatter plot) to ensure this assumption holds. Outliers can also significantly influence the regression line.