线性回归分析 linear regression.ppt
TRANSCRIPT
-
7/30/2019 linear regression.ppt
1/15
Regression Analysis
Module 3
-
7/30/2019 linear regression.ppt
2/15
Regression
Regression is the attempt to explain the variation in a dependent
variable using the variation in independent variables.Regression is thus an explanation of causation.
If the independent variable(s) sufficiently explain the variation in thedependent variable, the model can be used for prediction.
Independent variable (x)
Dep
endentvariable
-
7/30/2019 linear regression.ppt
3/15
Simple Linear Regression
Independent variable (x)
Dep
endentvariable
(y)
The output of a regression is a function that predicts the dependentvariable based upon values of the independent variables.
Simple regression fits a straight line to the data.
y = b0 + b1X
b0 (y intercept)
B1 = slope
= y/ x
-
7/30/2019 linear regression.ppt
4/15
Simple Linear Regression
Independent variable (x)
Dep
endentvariable
The function will make a prediction for each observed data point.
The observation is denoted by y and the prediction is denoted by y.
Zero
Prediction: y
Observation: y
^
^
-
7/30/2019 linear regression.ppt
5/15
Simple Linear Regression
For each observation, the variation can be described as:
y = y +
Actual = Explained + Error
Zero
Prediction error:
^
Prediction: y
Observation: y
-
7/30/2019 linear regression.ppt
6/15
Regression
Independent variable (x)
Dependen
tvariable
A least squares regression selects the line with the lowest total sumof squared prediction errors.
This value is called the Sum of Squares of Error, or SSE.
-
7/30/2019 linear regression.ppt
7/15
Calculating SSR
Independent variable (x)
Dep
endentvariable
The Sum of Squares Regression (SSR) is the sum of the squareddifferences between the prediction for each observation and thepopulation mean.
Population mean: y
-
7/30/2019 linear regression.ppt
8/15
Regression Formulas
The Total Sum of Squares (SST) is equal to SSR + SSE.
Mathematically,
SSR = ( y y ) (measure of explained variation)
SSE = ( y y ) (measure of unexplained variation)
SST = SSR + SSE = ( y y ) (measure of total variation in y)
^
^
2
2
-
7/30/2019 linear regression.ppt
9/15
The Coefficient of Determination
The proportion of total variation (SST) that is explained by theregression (SSR) is known as the Coefficient of Determination, and isoften referred to as R .
R = =
The value of R can range between 0 and 1, and the higher its value
the more accurate the regression model is. It is often referred to as apercentage.
SSR SSRSST SSR + SSE
2
2
2
-
7/30/2019 linear regression.ppt
10/15
Standard Error of Regression
The Standard Error of a regression is a measure of its variability. Itcan be used in a similar manner to standard deviation, allowing forprediction intervals.
y 2 standard errors will provide approximately 95% accuracy, and 3standard errors will provide a 99% confidence interval.
Standard Error is calculated by taking the square root of the averageprediction error.
Standard Error =SSEn-k
Where n is the number of observations in the sample andk is the total number of variables in the model
-
7/30/2019 linear regression.ppt
11/15
The output of a simple regression is the coefficient and theconstant A. The equation is then:
y = A + * x +
where is the residual error.
is the per unit change in the dependent variable for each unit
change in the independent variable. Mathematically:
= y x
-
7/30/2019 linear regression.ppt
12/15
Multiple Linear Regression
More than one independent variable can be used to explain variance inthe dependent variable, as long as they are not linearly related.
A multiple regression takes the form:
y = A + X + X + + k Xk +
where k is the number of variables, or parameters.
1 1 2 2
-
7/30/2019 linear regression.ppt
13/15
Multicollinearity
Multicollinearity is a condition in which at least 2 independent
variables are highly linearly correlated. It will often crash computers.
Example table ofCorrelations
Y X1 X2Y 1.000
X1 0.802 1.000
X2 0.848 0.578 1.000
A correlations table can suggest which independent variables may besignificant. Generally, an ind. variable that has more than a .3correlation with the dependent variable and less than .7 with anyother ind. variable can be included as a possible predictor.
-
7/30/2019 linear regression.ppt
14/15
Nonlinear Regression
Nonlinear functions can also be fit as regressions. Commonchoices include Power, Logarithmic, Exponential, and Logistic,but any continuous function can be used.
-
7/30/2019 linear regression.ppt
15/15
Regression Output in Excel
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.982655
R Square 0.96561
Adjusted R Square 0.959879
Standard Error 26.01378
Observations 15
ANOVA
df SS MS F ignificance F Regression 2 228014.6 114007.3 168.4712 1.65E-09
Residual 12 8120.603 676.7169
Total 14 236135.2
Coeffic ientsandard Err t Stat P-value Lower 95%Upper 95%
Intercept 562.151 21.0931 26.65094 4.78E-12 516.1931 608.1089
Temperature -5.436581 0.336216 -16.1699 1.64E-09 -6.169133 -4.704029
Insulation -20.01232 2.342505 -8.543127 1.91E-06 -25.1162 -14.90844
Estimated Heating Oil = 562.15 - 5.436 (Temperature) - 20.012 (Insulation)
Y = B0 + B1 X1 + B2X2 + B3X3 - - - +/- Error
Total = Estimated/Predicted +/- Error