msb12e ppt ch11

99
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Upload: subas-nandy

Post on 15-Apr-2017

131 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Msb12e ppt ch11

11-1Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Page 2: Msb12e ppt ch11

11-2Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Statistics for Business and Economics

Chapter 11Simple Linear Regression

Page 3: Msb12e ppt ch11

11-3Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Contents

1. Probabilistic Models2. Fitting the Model: The Least Squares

Approach3. Model Assumptions4. Assessing the Utility of the Model:

Making Inferences about the Slope 1

Page 4: Msb12e ppt ch11

11-4Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Contents

5. The Coefficients of Correlation and Determination

6. Using the Model for Estimation and Prediction

7. A Complete Example

Page 5: Msb12e ppt ch11

11-5Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Learning Objectives

• Introduce the straight-line (simple linear regression) model as a means of relating one quantitative variable to another quantitative variable

• Assess how well the simple linear regression model fits the sample data

Page 6: Msb12e ppt ch11

11-6Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Learning Objectives

• Introduce the correlation coefficient as a means of relating one quantitative variable to another quantitative variable

• Employ the simple linear regression model for predicting the value of one variable from a specified value of another variable

Page 7: Msb12e ppt ch11

11-7Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

11.1

Probabilistic Models

Page 8: Msb12e ppt ch11

11-8Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Models• Representation of some phenomenon• Mathematical model is a mathematical

expression of some phenomenon• Often describe relationships between

variables• Types

– Deterministic models– Probabilistic models

Page 9: Msb12e ppt ch11

11-9Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Deterministic Models• Hypothesize exact relationships• Suitable when prediction error is

negligible• Example: force is exactly mass times

acceleration– F = m·a

© 1984-1994 T/Maker Co.

Page 10: Msb12e ppt ch11

11-10Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Probabilistic Models• Hypothesize two components

– Deterministic– Random error

• Example: sales volume (y) is 10 times advertising spending (x) + random error

– y = 10x + – Random error may be due to factors

other than advertising

Page 11: Msb12e ppt ch11

11-11Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

General Form of Probabilistic Models

y = Deterministic component + Random error

where y is the variable of interest. We always assume that the mean value of the random error equals 0. This is equivalent to assuming that the mean value of y, E(y), equals the deterministic component of the model; that is,

E(y) = Deterministic component

Page 12: Msb12e ppt ch11

11-12Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

A First-Order (Straight Line) Probabilistic Model

y = 0 + 1x +where

y = Dependent or response variable(variable to be modeled)x = Independent or predictor variable(variable used as a predictor of y)E(y) = 0 + 1x = Deterministic component (epsilon) = Random error component

Page 13: Msb12e ppt ch11

11-13Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

A First-Order (Straight Line) Probabilistic Model

y = 0 + 1x +

0 (beta zero) = y-intercept of the line, that is, the point at which the line intercepts or cuts through the y-axis

1 (beta one) = slope of the line, that is, the change (amount of increase or decrease) in the deterministic component of y for every 1-unit increase in x

Page 14: Msb12e ppt ch11

11-14Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

[Note: A positive slope implies that E(y) increases by the amount 1 for each unit increase in x. A negative slope implies that E(y) decreases by the amount 1.]

A First-Order (Straight Line) Probabilistic Model

Page 15: Msb12e ppt ch11

11-15Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Five-Step ProcedureStep 1: Hypothesize the deterministic component of the

model that relates the mean, E(y), to the independent variable x.

Step 2: Use the sample data to estimate unknown parameters in the model.

Step 3: Specify the probability distribution of the random error term and estimate the standard deviation of this distribution.

Step 4: Statistically evaluate the usefulness of the model.

Step 5: When satisfied that the model is useful, use it for prediction, estimation, and other purposes.

Page 16: Msb12e ppt ch11

11-16Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

11.2

Fitting the Model:The Least Squares Approach

Page 17: Msb12e ppt ch11

11-17Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Scatterplot1. Plot of all (xi, yi) pairs

2. Suggests how well model will fit

0204060

0 20 40 60x

y

Page 18: Msb12e ppt ch11

11-18Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Thinking Challenge

0204060

0 20 40 60x

y

• How would you draw a line through the points?

• How do you determine which line ‘fits best’?

Page 19: Msb12e ppt ch11

11-19Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

The least squares line is one that has the following two properties:

1. The sum of the errors equals 0,i.e., mean error = 0.

2. The sum of squared errors (SSE) is smaller than for any other straight-line model, i.e., the error variance is minimum.

Least Squares Line

0 1ˆ ˆˆ y x

Page 20: Msb12e ppt ch11

11-20Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Formula for the Least Squares Estimates

0 1ˆ ˆ: y intercept y x

1: xy

xx

SSSlope

SS

where SSxy xi x yi y SSxx xi x 2

n = Sample size

Page 21: Msb12e ppt ch11

11-21Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

y-intercept: represents the predicted value of y when x = 0 (Caution: This value will not be meaningful if the value x = 0 is nonsensical or outside the range of the sample data.)

slope: represents the increase (or decrease) in y for every 1-unit increase in x (Caution: This interpretation is valid only for x-values within the range of the sample data.)

Interpreting the Estimates of 0 and 1 in Simple Liner Regression

1

0

Page 22: Msb12e ppt ch11

11-22Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Least Squares Graphically

2

y

x

1 3

4

^^

^^

2 0 1 2 2ˆ ˆ ˆy x

0 1ˆ ˆˆi iy x

2 2 2 2 21 2 3 4

1

ˆ ˆ ˆ ˆ ˆLS minimizes n

ii

Page 23: Msb12e ppt ch11

11-23Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Least Squares ExampleYou’re a marketing analyst for Hasbro Toys. You gather the following data:Ad Expenditure (100$) Sales (Units)

1 12 13 24 25 4

Find the least squares line relatingsales and advertising.

Page 24: Msb12e ppt ch11

11-24Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Scatterplot Sales vs. Advertising

01234

0 1 2 3 4 5

Sales

Advertising

Page 25: Msb12e ppt ch11

11-25Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Parameter Estimation Solution

15 35 5

xx 10 2

5 5 y

y

3 2 7

xySS x x y y

x y

2

23 10

xxSS x x

x

Page 26: Msb12e ppt ch11

11-26Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Parameter Estimation Solution

ˆ .1 .7y x

0 1ˆ ˆ 2 .70 3 .10y x

17ˆ .7

10 xy

xx

SSB

SS

The slope of the least squares line is:

Page 27: Msb12e ppt ch11

11-27Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Parameter Estimation Computer Output

Parameter Estimates

Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

0^

1^

ˆ .1 .7y x

Page 28: Msb12e ppt ch11

11-28Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient Interpretation Solution

1. Slope (1)• Sales Volume (y) is expected to increase by

$700 for each $100 increase in advertising (x), over the sampled range of advertising expenditures from $100 to $500

^

2. y-Intercept (0)• Since 0 is outside of the range of the

sampled values of x, the y-intercept has no meaningful interpretation

^

Page 29: Msb12e ppt ch11

11-29Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

11.3

Model Assumptions

Page 30: Msb12e ppt ch11

11-30Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Basic Assumptions of the Probability Distribution

Assumption 1:The mean of the probability distribution of is 0 – that is, the average of the values of over an infinitely long series of experiments is 0 for each setting of the independent variable x. This assumption implies that the mean value of y, E(y), for a given value of x is E(y) = 0 + 1x.

Page 31: Msb12e ppt ch11

11-31Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Basic Assumptions of the Probability Distribution

Assumption 2:The variance of the probability distribution of is constant for all settings of the independent variable x. For our straight-line model, this assumption means that the variance of is equal to a constant, say 2, for all values of x.

Page 32: Msb12e ppt ch11

11-32Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Basic Assumptions of the Probability Distribution

Assumption 3:The probability distribution of is normal.

Assumption 4:The values of associated with any two observed values of y are independent–that is, the value of associated with one value of y has no effect on the values of associated with other y values.

Page 33: Msb12e ppt ch11

11-33Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Basic Assumptions of the Probability Distribution

.

Page 34: Msb12e ppt ch11

11-34Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

To estimate the standard deviation of , we calculate

We will refer to s as the estimated standard error of the regression model.

Estimation of 2 for a (First-Order) Straight-Line Model

2 SSE SSEDegrees of freedom for error 2

sn

21

2

ˆˆwhere SSE

i i yy xy

yy i

y y SS SS

SS y y

s s2

SSEn 2

Page 35: Msb12e ppt ch11

11-35Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Calculating SSE, s2, s Example

You’re a marketing analyst for Hasbro Toys. You gather the following data:Ad Expenditure (100$) Sales (Units)

1 12 13 24 25 4

Find SSE, s2, and s.

Page 36: Msb12e ppt ch11

11-36Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Calculating s2 and s Solution

2 1.1 .366672 5 2

SSEsn

.36667 .6055s

Page 37: Msb12e ppt ch11

11-37Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

11.4

Assessing the Utility of the Model: Making Inferences

about the Slope 1

Page 38: Msb12e ppt ch11

11-38Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

If we make the four assumptions about , the sampling distribution of the least squares estimator of the slope will be normal with mean 1 (the true slope) and standard deviation

Sampling Distribution of

1 SS

xx

1

1

Page 39: Msb12e ppt ch11

11-39Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

We estimate by and refer to this quantity as the estimated standard error of the least squares slope .

Sampling Distribution of

1 SS

xx

ss1

1

1

Page 40: Msb12e ppt ch11

11-40Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

A Test of Model Utility: Simple Linear Regression

Page 41: Msb12e ppt ch11

11-41Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Interpreting p-Values for Coefficients in Regression

Almost all statistical computer software packages report a two-tailed p-value for each of the parameters in the regression model. For example, in simple linear regression, the p-value for the two-tailed test H0: 1 = 0 versus Ha: 1 ≠ 0 is given on the printout. If you want to conduct a one-tailed test of hypothesis, you will need to adjust the p-value reported on the printout as follows:

Page 42: Msb12e ppt ch11

11-42Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Interpreting p-Values for Coefficients in Regression

where p is the p-value reported on the printout and t is the value of the test statistic.

Page 43: Msb12e ppt ch11

11-43Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Page 44: Msb12e ppt ch11

11-44Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Test of Slope Coefficient Example

You’re a marketing analyst for Hasbro Toys. You find β0 = –.1, β1 = .7 and s = .6055.Ad Expenditure (100$) Sales (Units)

1 12 13 24 25 4

Is the relationship significant at the .05 level of significance?

^^

Page 45: Msb12e ppt ch11

11-45Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Test of Slope Coefficient Solution

• H0:• Ha:• • df • Critical Value(s):

t0 3.182-3.182

.025

Reject H0 Reject H0

.025

1 = 01 0.055 – 2 = 3

Page 46: Msb12e ppt ch11

11-46Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Test StatisticSolution

s ö1

s

SSxx

.6055

55 15 2

5

.1914

t ö1

S ö1

.70

.19143.657

Page 47: Msb12e ppt ch11

11-47Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Test of Slope Coefficient Solution

• H0:• Ha:• • df • Critical Value(s):

t0 3.182-3.182

.025

Reject H0 Reject H0

.025

1 = 01 0.055 – 2 = 3

Test Statistic:

Decision:

Conclusion:

t 3.657

Reject at = .05

There is evidence of a relationship

Page 48: Msb12e ppt ch11

11-48Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Test of Slope CoefficientComputer Output

Parameter Estimates Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

t = 1 / S

P-Value

S1 1 1

^^^^

Page 49: Msb12e ppt ch11

11-49Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

11.5

The Coefficients of Correlation and Determination

Page 50: Msb12e ppt ch11

11-50Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Correlation Models

• Answers ‘How strong is the linear relationship between two variables?’

• Coefficient of correlation– Sample correlation coefficient denoted r– Values range from –1 to +1– Measures degree of association– Does not indicate cause–effect

relationship

Page 51: Msb12e ppt ch11

11-51Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Correlation

xy

xx yy

SSr

SS SS

SSxy x x y y SSxx x x 2SS yy y y 2

where

Page 52: Msb12e ppt ch11

11-52Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Correlation

Page 53: Msb12e ppt ch11

11-53Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Correlation

Page 54: Msb12e ppt ch11

11-54Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Correlation

Page 55: Msb12e ppt ch11

11-55Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Correlation Example

You’re a marketing analyst for Hasbro Toys. Ad Expenditure (100$) Sales (Units)

1 12 13 24 25 4

Calculate the coefficient ofcorrelation.

Page 56: Msb12e ppt ch11

11-56Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Correlation Solution

SSxy x x y y 7

SS yy y y 26

SSxx x x 2 10

7 .90410 6

xy

xx yy

SSr

SS SS

Page 57: Msb12e ppt ch11

11-57Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

A Test for Linear Correlation

Page 58: Msb12e ppt ch11

11-58Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Condition Required for a Valid Test of Correlation

• The sample of (x, y) values is randomly selected from a normal population.

Page 59: Msb12e ppt ch11

11-59Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Correlation Thinking Challenge

You’re an economist for the county cooperative. You gather the following data:Fertilizer (lb.) Yield (lb.)

4 3.0 6 5.510 6.512 9.0

Find the coefficient of correlation.© 1984-1994 T/Maker Co.

Page 60: Msb12e ppt ch11

11-60Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Correlation Solution

SSxy x x y y 26

SS yy y y 218.5

SSxx x x 2 40

26 .95640 18.5

xy

xx yy

SSr

SS SS

Page 61: Msb12e ppt ch11

11-61Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Determination

It represents the proportion of the total sample variability around y that is explained by the linear relationship between y and x.

r 2

Explained VariationTotal Variation

SS yy SSE

SS yy

1 SSESS yy

0 r2 1r2 = (coefficient of correlation)2

Page 62: Msb12e ppt ch11

11-62Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Determination Example

You’re a marketing analyst for Hasbro Toys. You know r = .904. Ad Expenditure (100$) Sales (Units)

1 12 13 24 25 4

Calculate and interpret thecoefficient of determination.

Page 63: Msb12e ppt ch11

11-63Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Coefficient of Determination Solution

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.

Page 64: Msb12e ppt ch11

11-64Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

r2 Computer Output

Root MSE 0.60553 R-square 0.8167 Dep Mean 2.00000 Adj R-sq 0.7556 C.V. 30.27650

r2 adjusted for number of explanatory variables & sample size

r2

Page 65: Msb12e ppt ch11

11-65Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

11.6

Using the Model for Estimation and Determination

Page 66: Msb12e ppt ch11

11-66Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Probabilistic Model• Used to make inferences

– Estimate the mean value of y, E(y) for a specific x Estimate the mean sales for all months during

which $400 (x = 4) is expended on advertising– Predict a new individual y value for given x

If we expend $400 in advertising next month, we want to predict the sales revenue for that month

Page 67: Msb12e ppt ch11

11-67Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Page 68: Msb12e ppt ch11

11-68Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

A 100(1 – )% Confidence Interval for the Mean Value of

y at x = xp

2

/21ˆ

SS

p

xx

x xy t s

ndf = n – 2

ˆ(Estimated standard error of )/2ˆ yy t

Page 69: Msb12e ppt ch11

11-69Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

A 100(1 – )% Prediction Interval for an Individual New

Value of y at x = xp

2

/21ˆ 1

SS

p

xx

x xy t s

n

df = n – 2

(Estimated standard error of prediction)/2ˆ y t

Page 70: Msb12e ppt ch11

11-70Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Error of estimating the meanvalue of y for a given value of x

Page 71: Msb12e ppt ch11

11-71Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Error of predicting a futurevalue of y for a given value of x

Page 72: Msb12e ppt ch11

11-72Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Confidence Interval Example

You’re a marketing analyst for Hasbro Toys.You find β0 = –.1, β 1 = .7 and s = .6055.Ad Expenditure (100$) Sales (Units)

1 12 13 24 25 4

Find a 95% confidence interval forthe mean sales when advertising is $4.

^^

Page 73: Msb12e ppt ch11

11-73Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Confidence Interval Solution

2

/2

2

1ˆSS

ˆ .1 .7 4 2.7

4 312.7 3.182 .60555 10

1.645 ( ) 3.755

p

xx

x xy t s

n

y

E Y

x to be predicted

Page 74: Msb12e ppt ch11

11-74Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

A 100(1 – )% Prediction Interval for an Individual New

Value of y at x = xp

2

/21ˆ 1

SS

p

xx

x xy t s

n

Note!

df = n – 2

Page 75: Msb12e ppt ch11

11-75Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Why the Extra ‘S’?

Expected(Mean) y

yy we're trying topredict

Prediction, y

xxp

E(y) = x

^^ ^

y i = x i

Page 76: Msb12e ppt ch11

11-76Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Prediction Interval Example

You’re a marketing analyst for Hasbro Toys.You find β0 = –.1, β 1 = .7 and s = .6055.Ad Expenditure (1000$) Sales (Units)

1 12 13 24 25 4

Predict the sales when advertising is $400. Use a 95% prediction interval.

^^

Page 77: Msb12e ppt ch11

11-77Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Prediction Interval Solution

2

/2

2

4

1ˆ 1SS

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055 15 10

.503 4.897

p

xx

x xy t s

n

y

y

x to be predicted

Page 78: Msb12e ppt ch11

11-78Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Interval Estimate Computer Output

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Predicted y when x = 4

Confidence Interval

SYPrediction

Interval

Page 79: Msb12e ppt ch11

11-79Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Confidence intervals for meanvalues and prediction intervals

for new values

Page 80: Msb12e ppt ch11

11-80Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

11.7

A Complete Example

Page 81: Msb12e ppt ch11

11-81Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleSuppose a fire insurance company wants to relate the amount of fire damage in major residential fires to the distance between the burning house and the nearest fire station. The study is to be conducted in a large suburb of a major city; a sample of 15 recent fires in this suburb is selected. The amount of damage, y, and the distance between the fire and the nearest fire station, x, are recorded for each fire.

Page 82: Msb12e ppt ch11

11-82Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Example

Page 83: Msb12e ppt ch11

11-83Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleStep 1: First, we hypothesize a model to relate fire damage, y, to the distance from the nearest fire station, x. We hypothesize a straight-line probabilistic model:

y = 0 + 1x +

Page 84: Msb12e ppt ch11

11-84Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleStep 2: Use a statistical software package to estimate the unknown parameters in the deterministic component of the hypothesized model. The Excel printout for the simple linear regression analysis is shown on the next slide. The least squares estimates of the slope 1 and intercept 0, highlighted on the printout, are

1

0

ˆ 4.919331ˆ 10.277929

Page 85: Msb12e ppt ch11

11-85Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Example

ˆ Least Squares Equation: 10.278 4.919 y x

Page 86: Msb12e ppt ch11

11-86Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleThis prediction equation is graphed in the Minitab scatterplot.

Page 87: Msb12e ppt ch11

11-87Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleThe least squares estimate of the slope, implies that the estimated mean damage increases by $4,919 for each additional mile from the fire station. This interpretation is valid over the range of x, or from .7 to 6.1 miles from the station. The estimated y-intercept, , has the interpretation that a fire 0 miles from the fire station has an estimated mean damage of $10,278.

1 4.919

0ˆ 10.278

Page 88: Msb12e ppt ch11

11-88Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleStep 3: Specify the probability distribution of the random error component . The estimate of the standard deviation of , highlighted on the Excel printout is

s = 2.31635This implies that most of the observed fire damage (y) values will fall within approximately 2 = 4.64 thousand dollars of their respective predicted values when using the least squares line.

Page 89: Msb12e ppt ch11

11-89Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleStep 4: First, test the null hypothesis that the slope 1 is 0 –that is, that there is no linear relationship between fire damage and the distance from the nearest fire station, against the alternative hypothesis that fire damage increases as the distance increases. We test

H0: 1 = 0Ha: 1 > 0

The two-tailed observed significance level for testing is approximately 0.

Page 90: Msb12e ppt ch11

11-90Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleThe 95% confidence interval yields (4.070, 5.768).We estimate (with 95% confidence) that the interval from $4,070 to $5,768 encloses the mean increase (1) in fire damage per additional mile distance from the fire station.The coefficient of determination, is r2 = .9235, which implies that about 92% of the sample variation in fire damage (y) is explained by the distance (x) between the fire and the fire station.

Page 91: Msb12e ppt ch11

11-91Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleThe coefficient of correlation, r, that measures the strength of the linear relationship between y and x is not shown on the Excel printout and must be calculated. We find The high correlation confirms our conclusion that 1 is greater than 0; it appears that fire damage and distance from the fire station are positively correlated. All signs point to a strong linear relationship between y and x.

r r 2 .9235 .96

Page 92: Msb12e ppt ch11

11-92Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleStep 5: We are now prepared to use the least squares model. Suppose the insurance company wants to predict the fire damage if a major residential fire were to occur 3.5 miles from the nearest fire station. A 95% confidence interval for E(y) and prediction interval for y when x = 3.5 are shown on the Minitab printout on the next slide.

Page 93: Msb12e ppt ch11

11-93Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleStep 5: We are now prepared to use the least

Page 94: Msb12e ppt ch11

11-94Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

ExampleThe predicted value (highlighted on the printout) is , while the 95% prediction interval (also highlighted) is (22.3239, 32.6672). Therefore, with 95% confidence we predict fire damage in a major residential fire 3.5 miles from the nearest station to be between $22,324 and $32,667.

ˆ 27.496y

Page 95: Msb12e ppt ch11

11-95Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Key Ideas

Simple Linear Regression Variablesy = Dependent variable (quantitative)

x = Independent variable (quantitative)

Method of Least Squares Properties1. average error of prediction = 0

2. sum of squared errors is minimum

Page 96: Msb12e ppt ch11

11-96Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Key IdeasPractical Interpretation of y-interceptpredicted y value when x = 0

(no practical interpretation if x = 0 is either nonsensical or outside range of sample data)

Practical Interpretation of SlopeIncrease or decrease in y for every 1-unit increase in x

Page 97: Msb12e ppt ch11

11-97Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Key Ideas

First-Order (Straight Line) ModelE(y) = 0 + 1x

where E(y) = mean of y

0 = y-intercept of line (point where line intercepts the y-axis)

1 = slope of line (change in y for every 1-unit change in x)

Page 98: Msb12e ppt ch11

11-98Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Key Ideas

Coefficient of Correlation, r1. Ranges between –1 and 1

2. Measures strength of linear relationship between y and x

Coefficient of Determination, r2

1. Ranges between 0 and 1

2. Measures proportion of sample variation in y explained by the model

Page 99: Msb12e ppt ch11

11-99Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Key Ideas

Practical Interpretation of Model Standard Deviation, sNinety-five percent of y-values fall within 2s of their respected predicted valuesWidth of confidence interval for E(y) will always be narrower than width of prediction interval for y