线性回归分析 linear regression.ppt

Upload: victorchung23

Post on 04-Apr-2018

303 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 linear regression.ppt

    1/15

    Regression Analysis

    Module 3

  • 7/30/2019 linear regression.ppt

    2/15

    Regression

    Regression is the attempt to explain the variation in a dependent

    variable using the variation in independent variables.Regression is thus an explanation of causation.

    If the independent variable(s) sufficiently explain the variation in thedependent variable, the model can be used for prediction.

    Independent variable (x)

    Dep

    endentvariable

  • 7/30/2019 linear regression.ppt

    3/15

    Simple Linear Regression

    Independent variable (x)

    Dep

    endentvariable

    (y)

    The output of a regression is a function that predicts the dependentvariable based upon values of the independent variables.

    Simple regression fits a straight line to the data.

    y = b0 + b1X

    b0 (y intercept)

    B1 = slope

    = y/ x

  • 7/30/2019 linear regression.ppt

    4/15

    Simple Linear Regression

    Independent variable (x)

    Dep

    endentvariable

    The function will make a prediction for each observed data point.

    The observation is denoted by y and the prediction is denoted by y.

    Zero

    Prediction: y

    Observation: y

    ^

    ^

  • 7/30/2019 linear regression.ppt

    5/15

    Simple Linear Regression

    For each observation, the variation can be described as:

    y = y +

    Actual = Explained + Error

    Zero

    Prediction error:

    ^

    Prediction: y

    Observation: y

  • 7/30/2019 linear regression.ppt

    6/15

    Regression

    Independent variable (x)

    Dependen

    tvariable

    A least squares regression selects the line with the lowest total sumof squared prediction errors.

    This value is called the Sum of Squares of Error, or SSE.

  • 7/30/2019 linear regression.ppt

    7/15

    Calculating SSR

    Independent variable (x)

    Dep

    endentvariable

    The Sum of Squares Regression (SSR) is the sum of the squareddifferences between the prediction for each observation and thepopulation mean.

    Population mean: y

  • 7/30/2019 linear regression.ppt

    8/15

    Regression Formulas

    The Total Sum of Squares (SST) is equal to SSR + SSE.

    Mathematically,

    SSR = ( y y ) (measure of explained variation)

    SSE = ( y y ) (measure of unexplained variation)

    SST = SSR + SSE = ( y y ) (measure of total variation in y)

    ^

    ^

    2

    2

  • 7/30/2019 linear regression.ppt

    9/15

    The Coefficient of Determination

    The proportion of total variation (SST) that is explained by theregression (SSR) is known as the Coefficient of Determination, and isoften referred to as R .

    R = =

    The value of R can range between 0 and 1, and the higher its value

    the more accurate the regression model is. It is often referred to as apercentage.

    SSR SSRSST SSR + SSE

    2

    2

    2

  • 7/30/2019 linear regression.ppt

    10/15

    Standard Error of Regression

    The Standard Error of a regression is a measure of its variability. Itcan be used in a similar manner to standard deviation, allowing forprediction intervals.

    y 2 standard errors will provide approximately 95% accuracy, and 3standard errors will provide a 99% confidence interval.

    Standard Error is calculated by taking the square root of the averageprediction error.

    Standard Error =SSEn-k

    Where n is the number of observations in the sample andk is the total number of variables in the model

  • 7/30/2019 linear regression.ppt

    11/15

    The output of a simple regression is the coefficient and theconstant A. The equation is then:

    y = A + * x +

    where is the residual error.

    is the per unit change in the dependent variable for each unit

    change in the independent variable. Mathematically:

    = y x

  • 7/30/2019 linear regression.ppt

    12/15

    Multiple Linear Regression

    More than one independent variable can be used to explain variance inthe dependent variable, as long as they are not linearly related.

    A multiple regression takes the form:

    y = A + X + X + + k Xk +

    where k is the number of variables, or parameters.

    1 1 2 2

  • 7/30/2019 linear regression.ppt

    13/15

    Multicollinearity

    Multicollinearity is a condition in which at least 2 independent

    variables are highly linearly correlated. It will often crash computers.

    Example table ofCorrelations

    Y X1 X2Y 1.000

    X1 0.802 1.000

    X2 0.848 0.578 1.000

    A correlations table can suggest which independent variables may besignificant. Generally, an ind. variable that has more than a .3correlation with the dependent variable and less than .7 with anyother ind. variable can be included as a possible predictor.

  • 7/30/2019 linear regression.ppt

    14/15

    Nonlinear Regression

    Nonlinear functions can also be fit as regressions. Commonchoices include Power, Logarithmic, Exponential, and Logistic,but any continuous function can be used.

  • 7/30/2019 linear regression.ppt

    15/15

    Regression Output in Excel

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R 0.982655

    R Square 0.96561

    Adjusted R Square 0.959879

    Standard Error 26.01378

    Observations 15

    ANOVA

    df SS MS F ignificance F Regression 2 228014.6 114007.3 168.4712 1.65E-09

    Residual 12 8120.603 676.7169

    Total 14 236135.2

    Coeffic ientsandard Err t Stat P-value Lower 95%Upper 95%

    Intercept 562.151 21.0931 26.65094 4.78E-12 516.1931 608.1089

    Temperature -5.436581 0.336216 -16.1699 1.64E-09 -6.169133 -4.704029

    Insulation -20.01232 2.342505 -8.543127 1.91E-06 -25.1162 -14.90844

    Estimated Heating Oil = 562.15 - 5.436 (Temperature) - 20.012 (Insulation)

    Y = B0 + B1 X1 + B2X2 + B3X3 - - - +/- Error

    Total = Estimated/Predicted +/- Error