10 ch 10 linear regression and correlation

92
01/20/22 1 Huangpu River Chapter 12 Linear Regression and Correlation

Upload: tilahunthm

Post on 08-Nov-2014

34 views

Category:

Documents


6 download

DESCRIPTION

from my lecture notes(MA IN PUBLIC PROCUREMENT AND ASSET MANAGEMENT)

TRANSCRIPT

Page 1: 10 Ch 10 Linear Regression and Correlation

04/08/23 1Huangpu River

Chapter 12

Linear Regression and Correlation

Page 2: 10 Ch 10 Linear Regression and Correlation

04/08/23 2

Chapter 12Chapter 12Linear Regression and CorrelationLinear Regression and Correlation

Weekly

Sales

Aptitude Test Score

Page 3: 10 Ch 10 Linear Regression and Correlation

04/08/23 3

TO DISCUSS SCATTER DIAGRAMS.TO DISCUSS SCATTER DIAGRAMS. TO DISCUSS THE COEFFICIENT OF TO DISCUSS THE COEFFICIENT OF

CORRELATION.CORRELATION. TO DISCUSS THE COEFFICIENT OF TO DISCUSS THE COEFFICIENT OF

DETERMINATION.DETERMINATION. TO USE THE LEAST SQUARES METHOD TO TO USE THE LEAST SQUARES METHOD TO

DETERMINE A LINEAR REGRESSION DETERMINE A LINEAR REGRESSION EQUATION.EQUATION.

TO INTERPRET THE LINEAR REGRESSION TO INTERPRET THE LINEAR REGRESSION EQUATIONEQUATION

Learning ObjectivesLearning Objectives

Page 4: 10 Ch 10 Linear Regression and Correlation

04/08/23 4

Learning Objectives Learning Objectives (continued)(continued)

TO COMPUTE THE STANDARD ERROR OF TO COMPUTE THE STANDARD ERROR OF ESTIMATE AND EXPLAIN ITS USE.ESTIMATE AND EXPLAIN ITS USE.

TO CONSTRUCT A CONFIDENCE TO CONSTRUCT A CONFIDENCE INTERVAL AND A PREDICTION INTERVAL INTERVAL AND A PREDICTION INTERVAL FOR THE ESTIMATES OF THE FOR THE ESTIMATES OF THE DEPENDENT VARIABLE.DEPENDENT VARIABLE.

Understand the limitations, errors, and Understand the limitations, errors, and caveats of using regression and correlation caveats of using regression and correlation and evaluating assumptions using residual and evaluating assumptions using residual analysisanalysis

Page 5: 10 Ch 10 Linear Regression and Correlation

04/08/23 5

GBS221 GRADE DISTRIBUTION

0

10

20

30

40

50

60

70

Class < 59 60 - 69 70 - 79 80 - 89 90 - 100

# O

F S

TU

DEN

TS

Descriptive StatisticsDescriptive Statistics

Page 6: 10 Ch 10 Linear Regression and Correlation

04/08/23 6

Statistical InferenceStatistical Inference

Population Sample

?

?

?2

P

X

S

ps2

Estimates

Page 7: 10 Ch 10 Linear Regression and Correlation

04/08/23 7

Chapter 12Chapter 12Linear Regression and CorrelationLinear Regression and Correlation

Weekly

Sales

Aptitude Test Score

Page 8: 10 Ch 10 Linear Regression and Correlation

04/08/23 8

Example 1: Example 1: Plot the relationship between Test Scores and Plot the relationship between Test Scores and

Weekly Sales:Weekly Sales:

Sales Person

Test Score (X)

Weekly Sales (Y)

Mike 1 2

Melissa 2 4

Jalene 3 8

Jeff 4 6

Brian 5 12

Nicole 6 10

Page 9: 10 Ch 10 Linear Regression and Correlation

04/08/23 9

Correlation and RegressionWeekly Sales vs. Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

__________________________

____

____

___

____

___

____

Page 10: 10 Ch 10 Linear Regression and Correlation

04/08/23 10

Correlation and RegressionWeekly Sales vs Test Scores

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

Page 11: 10 Ch 10 Linear Regression and Correlation

04/08/23 11

Correlation and Regression Weekly Sales vs Test Scores

y = 1.7714x + 0.8

R2 = 0.7845

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

Test Score

Wee

kly

Sal

es (

$000

)

Page 12: 10 Ch 10 Linear Regression and Correlation

04/08/23 12

Demonstrate how to create a scatter Demonstrate how to create a scatter diagram and compute the regression diagram and compute the regression

equation using Excelequation using Excel Use directions on pages 70-74Use directions on pages 70-74

– Use Insert|Chart for ExcelUse Insert|Chart for Excel Also see pages 493-495Also see pages 493-495

(For this demonstration, use X=5)(For this demonstration, use X=5)

Page 13: 10 Ch 10 Linear Regression and Correlation

04/08/23 13

Example 1 continued...Example 1 continued...

r

bb

r

Y

X

O

2

1

Page 14: 10 Ch 10 Linear Regression and Correlation

04/08/23 14

= 3.5

= (.8857)2=.7845

= .8857

= 1.7714

= 0.8

= 7

Example 1 continued...Example 1 continued...

Sample mean of “X” values.

Sample mean of “Y” values.

Y-intercept

Slope of the regression line.

Coefficient of correlation

Coefficient of determinationr

bb

r

Y

X

O

2

1

Page 15: 10 Ch 10 Linear Regression and Correlation

04/08/23 15

Slope-Intercept formSlope-Intercept formof a straight lineof a straight line

Y = mX + bY = mX + b Y is the dependent variableY is the dependent variable X is the independent variableX is the independent variable m is the slope of the linem is the slope of the line b is the Y-interceptb is the Y-intercept But statisticians are peculiar. You But statisticians are peculiar. You

might say they have a deviation!!!might say they have a deviation!!!

Page 16: 10 Ch 10 Linear Regression and Correlation

04/08/23 18

ii XbbY 10

Yi

= Predicted Value of Y for observation i

Xi = Value of X for observation i

b0 = Sample Y - intercept used as estimate ofthe population 0

b1 = Sample Slope used as estimate of the population 1

Simple Linear Regression ModelSimple Linear Regression Model

Page 17: 10 Ch 10 Linear Regression and Correlation

04/08/23 19

Interpreting the ResultsInterpreting the Results

Yi = +0.8 + 1.7714Xi

The slope of 1.7714 means for each increase of one unit in X, the Y is estimated to increase 1.7714 units.

For each increase of 1 unit in the test score, the model predicts that the expected weekly sales are estimated to increase by $1.7714 thousand.

Page 18: 10 Ch 10 Linear Regression and Correlation

04/08/23 20

PERFECT NEGATIVE CORRELATIONPERFECT NEGATIVE CORRELATIONPERFECT NEGATIVE CORRELATIONPERFECT NEGATIVE CORRELATION

Y

X

r = -1

Page 19: 10 Ch 10 Linear Regression and Correlation

04/08/23 21

PERFECT POSITIVE CORRELATIONPERFECT POSITIVE CORRELATIONPERFECT POSITIVE CORRELATIONPERFECT POSITIVE CORRELATION

Y

X

r = +1

Page 20: 10 Ch 10 Linear Regression and Correlation

04/08/23 22

ZERO CORRELATIONZERO CORRELATIONZERO CORRELATIONZERO CORRELATION

Y

X

r = 0

Page 21: 10 Ch 10 Linear Regression and Correlation

04/08/23 23

STRONG POSITIVE CORRELATIONSTRONG POSITIVE CORRELATIONSTRONG POSITIVE CORRELATIONSTRONG POSITIVE CORRELATION

Y

X

Page 22: 10 Ch 10 Linear Regression and Correlation

04/08/23 24

Use the following definitions to Use the following definitions to interpret the results of this exampleinterpret the results of this example

• r is the coefficient of correlation. This indicates the strength of the relationship between X and Y and whether the relationship is + or -.

Page 23: 10 Ch 10 Linear Regression and Correlation

04/08/23 25

Interpretation of the coefficient Interpretation of the coefficient of correlationof correlation

r = 0.8857r = 0.8857 There is strong positive correlation between There is strong positive correlation between

a salesperson’s weekly sales and his/her a salesperson’s weekly sales and his/her score on the aptitude test.score on the aptitude test.

-1 0 +1

+0.8857

Page 24: 10 Ch 10 Linear Regression and Correlation

04/08/23 26

Coefficient of determinationCoefficient of determination r2 is the coefficient of determination. This

indicates the proportion of the variation in Y that is explained by X.

Page 25: 10 Ch 10 Linear Regression and Correlation

04/08/23 27

Coefficient of determinationCoefficient of determination

rr22 = 0.7845 = 0.7845 About 78% of the variation in weekly sales About 78% of the variation in weekly sales

is explained by the variation in test scores.is explained by the variation in test scores.

oror The variation in test scores explains 78% of The variation in test scores explains 78% of

the variation in weekly sales.the variation in weekly sales.

Page 26: 10 Ch 10 Linear Regression and Correlation

04/08/23 28

Purpose of Regression and Purpose of Regression and Correlation AnalysisCorrelation Analysis

• Regression Analysis is Used Primarily for Prediction

A statistical model used to predict the values of a dependent or response variable based on values of at least one independent or explanatory variable

• Correlation Analysis is Used to Measure Strength of the Association Between Numerical Variables

Page 27: 10 Ch 10 Linear Regression and Correlation

04/08/23 29

For the Test Score/Weekly Sales For the Test Score/Weekly Sales ProblemProblem

bO

bO

b1

YX ,

Plot the regression line on your graph Plot the regression line on your graph using the given values of using the given values of “ =0.8” and and “ =1.7714.”

– Hint: The regression line will Hint: The regression line will always go through the Y-intercept always go through the Y-intercept

(0, ) and ( ).

Page 28: 10 Ch 10 Linear Regression and Correlation

04/08/23 30

(0.8)

(0,0.8)

Correlation and RegressionWeekly Sales vs Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

(3.5,7)

Regression Line: =0.8+1.77X

(0,.8)

^

Y

Page 29: 10 Ch 10 Linear Regression and Correlation

04/08/23 31

Correlation and Regression Weekly Sales vs Test Scores

y = 1.7714x + 0.8

R2 = 0.7845

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

Test Score

Wee

kly

Sal

es (

$000

)

Page 30: 10 Ch 10 Linear Regression and Correlation

04/08/23 32

For the Test Score/Weekly Sales For the Test Score/Weekly Sales ProblemProblem

Compute the predicted value of weekly Compute the predicted value of weekly sales ( ) for each of the following sales ( ) for each of the following test scores (X).test scores (X).

XX11 ??22334455

^

Y

^

Y

Page 31: 10 Ch 10 Linear Regression and Correlation

04/08/23 33

For the Test Score/Weekly Sales For the Test Score/Weekly Sales ProblemProblem

Compute the predicted value of weekly Compute the predicted value of weekly sales ( ) for each of the following test sales ( ) for each of the following test scores (X).scores (X).

XX

11 2.5712.571

22 4.3434.343

33 6.1146.114

44 7.8867.886

55 9.6579.657

^

Y

^

Y

Page 32: 10 Ch 10 Linear Regression and Correlation

04/08/23 34

Confidence IntervalsConfidence Intervals

OH NO!!!!!!!!

OH NO!!!!!!!!

Page 33: 10 Ch 10 Linear Regression and Correlation

04/08/23 35

For the Test Score/Weekly Sales For the Test Score/Weekly Sales ProblemProblem

Assume that an applicant scored 5 on the Assume that an applicant scored 5 on the aptitude test:aptitude test:– What do you predict her weekly sales will be?What do you predict her weekly sales will be?– Interpret your answer in light of what you know Interpret your answer in light of what you know

about “point” and “interval” estimates.about “point” and “interval” estimates.

We have to find the standard error of We have to find the standard error of the estimate.the estimate.

Page 34: 10 Ch 10 Linear Regression and Correlation

04/08/23 36

Predicted mean weekly sales for Predicted mean weekly sales for applicants who scored 5 on the aptitude applicants who scored 5 on the aptitude

testtest The predicted weekly sales for applicants The predicted weekly sales for applicants

who scored 5 on the test is $9,657.who scored 5 on the test is $9,657. Since the regression line is an “average” Since the regression line is an “average”

line drawn through the data, this is a point line drawn through the data, this is a point estimate of average weekly sales.estimate of average weekly sales.

A confidence interval can be computed, i.e., A confidence interval can be computed, i.e., “We can be 95% confident that mean “We can be 95% confident that mean weekly sales will be between _ and _.”weekly sales will be between _ and _.”

Page 35: 10 Ch 10 Linear Regression and Correlation

04/08/23 37

Correlation and Regression Weekly Sales vs Test Scores

y = 1.7714x + 0.8

R2 = 0.7845

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

Test Score

Wee

kly

Sal

es (

$000

)

In a REAL PROBLEM there may be many observed values of Y for each value of X.

Page 36: 10 Ch 10 Linear Regression and Correlation

04/08/23 38

Trade Executions vs. Incoming Phone Calls

y = 0.1415x + 39.351

R2 = 0.3533

200

250

300

350

400

450

500

1800190020002100220023002400250026002700

# of Incoming Calls

# o

f T

rad

e E

xe

cu

tio

ns

Page 37: 10 Ch 10 Linear Regression and Correlation

04/08/23 39

Trade Executions vs. Incoming Phone Calls

y = 0.1415x + 39.351

R2 = 0.3533

200

250

300

350

400

450

500

1800190020002100220023002400250026002700

# of Incoming Calls

# o

f T

rad

e E

xe

cu

tio

ns

23 1 SYX

Page 38: 10 Ch 10 Linear Regression and Correlation

04/08/23 40

Standard Error of the EstimateStandard Error of the Estimate

In chapter 3 we measured the dispersion In chapter 3 we measured the dispersion about an “average” called the Mean.about an “average” called the Mean.

In chapter 6 we measured the dispersion In chapter 6 we measured the dispersion about a “average” called the Mean of the about a “average” called the Mean of the Means.Means.

Now we want to measure the dispersion Now we want to measure the dispersion about an “average line” called the about an “average line” called the Regression Line.Regression Line.

Page 39: 10 Ch 10 Linear Regression and Correlation

04/08/23 41

Measures of DispersionMeasures of Dispersion

Estimate theofError Standard =

Mean theofError Standard = or

Deviation Standard Sampleor Population = Sor

SYX

S XX

Page 40: 10 Ch 10 Linear Regression and Correlation

04/08/23 42

Section 12.3, Measures of Section 12.3, Measures of VariationVariation

See pages 421 through 427 in the text.See pages 421 through 427 in the text. We will discuss the following topics.We will discuss the following topics.

– Obtaining the Sum of SquaresObtaining the Sum of Squares– The Coefficient of DeterminationThe Coefficient of Determination– The Standard Error of the EstimateThe Standard Error of the Estimate

Page 41: 10 Ch 10 Linear Regression and Correlation

04/08/23 43

(0.8)

(0,0.8)

Correlation and RegressionWeekly Sales vs Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

Y

Y

X

=0.8+1.77X^

Y

Page 42: 10 Ch 10 Linear Regression and Correlation

04/08/23 44

(0.8)

(0,0.8)

Correlation and RegressionWeekly Sales vs Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

Y

Y

'Y

X

=0.8+1.77X^

Y

Page 43: 10 Ch 10 Linear Regression and Correlation

04/08/23 45

(0.8)

(0,0.8)

Correlation and RegressionWeekly Sales vs Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

Total

Error

Unexplained Error

Error explained by regression line

Y

Y

X

=0.8+1.77X^

Y

^

Y(SSR)

(SSE)

(SST)

What proportion of the variation in Y is explained by the variation in X?

Page 44: 10 Ch 10 Linear Regression and Correlation

04/08/23 46

The Coefficient of The Coefficient of DeterminationDetermination

SSR regression sum of squares

SST total sum of squaresr2 = =

Measures the proportion of variation that is explained by the independent variable X in the regression model

SSR Regression Sum of Squares

Page 45: 10 Ch 10 Linear Regression and Correlation

04/08/23 47

YY Error Total

YYSST2

Total Squares of Sum

YY ^

line regressionby explainedError

)(^

regression squares of Sum2

YYSSR

^

Error dUnexplaine YY

)(^

error squares of Sum2

YYSSR + SSE = SST

Page 46: 10 Ch 10 Linear Regression and Correlation

04/08/23 48

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 47: 10 Ch 10 Linear Regression and Correlation

04/08/23 49

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 48: 10 Ch 10 Linear Regression and Correlation

04/08/23 50

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 49: 10 Ch 10 Linear Regression and Correlation

04/08/23 51

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 50: 10 Ch 10 Linear Regression and Correlation

04/08/23 52

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 51: 10 Ch 10 Linear Regression and Correlation

04/08/23 53

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 52: 10 Ch 10 Linear Regression and Correlation

04/08/23 54

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 53: 10 Ch 10 Linear Regression and Correlation

04/08/23 55

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 54: 10 Ch 10 Linear Regression and Correlation

04/08/23 56

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 55: 10 Ch 10 Linear Regression and Correlation

04/08/23 57

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

Page 56: 10 Ch 10 Linear Regression and Correlation

04/08/23 58

Measures of Variation:Measures of Variation:The Sum of SquaresThe Sum of Squares

SST = Total Sum of SquaresSST = Total Sum of Squares– Measures the variation of the Y values around Measures the variation of the Y values around

their mean their mean SSR = Regression Sum of SquaresSSR = Regression Sum of Squares

– explained variation attributable to the explained variation attributable to the relationship between X and Y.relationship between X and Y.

SSE = Error Sum of SquaresSSE = Error Sum of Squares– variation attributable to factors other than the variation attributable to factors other than the

relationship between X and Yrelationship between X and Y

Y

Page 57: 10 Ch 10 Linear Regression and Correlation

04/08/23 59

Standard Error of the EstimateStandard Error of the Estimate

2

^

2

)(2

nn

SSE YYSYXStandard error of estimate - measures the scatter, or dispersion,

of the observed values around the line of regression.

1

)(2

n

Sxx

Page 58: 10 Ch 10 Linear Regression and Correlation

04/08/23 60

Predicted weekly sales for an applicant Predicted weekly sales for an applicant who scored 5 on the aptitude testwho scored 5 on the aptitude test

The predicted mean weekly sales for The predicted mean weekly sales for applicants who scored 5 on the test is $9,657.applicants who scored 5 on the test is $9,657.

Since the regression line is an “average” line Since the regression line is an “average” line drawn through the data, this is a point drawn through the data, this is a point estimate of average weekly sales.estimate of average weekly sales.

A confidence interval can be computed, i.e., A confidence interval can be computed, i.e., “We can be 95% confident that mean weekly “We can be 95% confident that mean weekly sales will be between _ and _.”sales will be between _ and _.”

Page 59: 10 Ch 10 Linear Regression and Correlation

04/08/23 61

Trade Executions vs. Incoming Phone Calls

y = 0.1415x + 39.351

R2 = 0.3533

200

250

300

350

400

450

500

1800190020002100220023002400250026002700

# of Incoming Calls

# o

f T

rad

e E

xe

cu

tio

ns

23 1 SYX

Page 60: 10 Ch 10 Linear Regression and Correlation

04/08/23 62

Confidence Interval - Large Confidence Interval - Large SampleSample

SYXZorY )( ^

Dream on. It can’t be this easy!!!Dream on. It can’t be this easy!!!

Page 61: 10 Ch 10 Linear Regression and Correlation

04/08/23 63

Confidence Interval - Small Confidence Interval - Small SampleSample

Dream on. It can’t be this easy!!!Dream on. It can’t be this easy!!!

SYXtorY )( ^

Page 62: 10 Ch 10 Linear Regression and Correlation

04/08/23 64

Estimation of Predicted Estimation of Predicted ValuesValues

Confidence Interval Estimate for XY

The Mean of Y given a particular Xi

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

t value from table with df=n-2

Standard error of the estimate

Size of interval vary according to distance away from mean, X.

For the mean weekly sales for a group of applicants who got 5 on the test.

Page 63: 10 Ch 10 Linear Regression and Correlation

04/08/23 65

Estimation of Predicted Values

Prediction Interval Estimate for Individual Response Yi at a Particular Xi

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

1

Addition of this 1 increased width of interval from that for the mean Y

Use this when you want the estimated weekly sales of one particular applicant (e.g., Jo Cruickshank) who scored 5 on the test.

Page 64: 10 Ch 10 Linear Regression and Correlation

04/08/23 66

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Compute the interval estimate (for a group Compute the interval estimate (for a group of applicants who scored 5 on the test).of applicants who scored 5 on the test).– Two tail testTwo tail test– Alpha error = .05Alpha error = .05– df = n-2df = n-2– = 9.657= 9.657

– SSyx= 1.942 yx= 1.942

^

Y

Page 65: 10 Ch 10 Linear Regression and Correlation

04/08/23 67

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Y t SY X nX X

XXn

where t is from Appendix F withn of freedom

( ) ( )

( ) .

1 2

22

2 degrees9 657 2 776 19421

6 914416

25 35. ( . )( . )

( . )

or

9 657 2 776 1942 29524

9 657 2 929

. ( . )( . ) .

. .

or

or

Between $6,728 and $12,586

Page 66: 10 Ch 10 Linear Regression and Correlation

04/08/23 68

Demonstrate how to compute Demonstrate how to compute confidence intervals using “PredInt.”confidence intervals using “PredInt.”

Page 67: 10 Ch 10 Linear Regression and Correlation

04/08/23 69

Confidence Interval Estimate

X Value 5Confidence Level 95%Sample Size 6Degrees of Freedom 4t Value 2.776450856Sample Mean 3.5Sum of Squared Difference 17.50Standard Error of the Estimate 1.942016625h Statistic 0.295238095Average Predicted Y (YHat) 9.657142857

For Average Predicted Y (YHat)Interval Half Width 2.929740344Confidence Interval Lower Limit 6.727402513Confidence Interval Upper Limit 12.5868832

For Individual Response YInterval Half Width 6.136457614Prediction Interval Lower Limit 3.520685243Prediction Interval Upper Limit 15.79360047

Page 68: 10 Ch 10 Linear Regression and Correlation

04/08/23 70

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Interpret the results of your interval Interpret the results of your interval estimate.estimate.

Page 69: 10 Ch 10 Linear Regression and Correlation

04/08/23 71

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problemInterpretation of the Interval EstimateInterpretation of the Interval Estimate

We can say, with 95% confidence, that the mean weekly sales for a group of applicants who scored 5 on the aptitude test will be between $6,728 and $12,586.

Page 70: 10 Ch 10 Linear Regression and Correlation

04/08/23 72

Confidence Interval Estimate

X Value 5Confidence Level 95%Sample Size 6Degrees of Freedom 4t Value 2.776450856Sample Mean 3.5Sum of Squared Difference 17.50Standard Error of the Estimate 1.942016625h Statistic 0.295238095Average Predicted Y (YHat) 9.657142857

For Average Predicted Y (YHat)Interval Half Width 2.929740344Confidence Interval Lower Limit 6.727402513Confidence Interval Upper Limit 12.5868832

For Individual Response YInterval Half Width 6.136457614Prediction Interval Lower Limit 3.520685243Prediction Interval Upper Limit 15.79360047

Page 71: 10 Ch 10 Linear Regression and Correlation

04/08/23 73

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Compute the prediction interval (for Jo Compute the prediction interval (for Jo Cruickshank who scored 5 on the test).Cruickshank who scored 5 on the test).– Two tail testTwo tail test– Alpha error = .05Alpha error = .05– df = n-2df = n-2– = 9.657= 9.657– Syx = 1.942 Syx = 1.942

^

Y

Page 72: 10 Ch 10 Linear Regression and Correlation

04/08/23 74

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Y t SY X nX X

XXn

where t is from Appendix F withn rees of freedom

( ) ( )

( ) deg .

1 1 2

22

29 657 2 776 1942 129524

9 657 6135

. ( . )( . ) .

. .

or

or

Between $3,522 and $15,792

Page 73: 10 Ch 10 Linear Regression and Correlation

04/08/23 75

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Interpret the results of your prediction Interpret the results of your prediction interval.interval.

Page 74: 10 Ch 10 Linear Regression and Correlation

04/08/23 76

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problemInterpretation of the Prediction IntervalInterpretation of the Prediction Interval

We can say, with 95% confidence, that the weekly sales for applicant Jo Cruickshank, who scored 5 on the aptitude test, will be between $3,520 and $15,790.

Page 75: 10 Ch 10 Linear Regression and Correlation

04/08/23 77

Common Errors When Using Common Errors When Using Regression And Correlation Regression And Correlation

AnalysisAnalysis

Page 76: 10 Ch 10 Linear Regression and Correlation

04/08/23 78

Using Regression and Correlation Analyses: Using Regression and Correlation Analyses: Limitations and ErrorsLimitations and Errors

Extrapolation beyond the range of the Extrapolation beyond the range of the observed dataobserved data

Cause and effectCause and effect Using past trends to estimate future trendsUsing past trends to estimate future trends Misinterpreting the coefficients of correlation Misinterpreting the coefficients of correlation

and determinationand determination Finding relationships when they do not existFinding relationships when they do not exist

Page 77: 10 Ch 10 Linear Regression and Correlation

04/08/23 79

Finding relationships when they do not existFinding relationships when they do not exist

Nearly all sick people have eaten carrots. Nearly all sick people have eaten carrots. Obviously, the effects are cumulative.Obviously, the effects are cumulative.

An estimated 99.9% of all people who die from An estimated 99.9% of all people who die from cancer and ruptured appendix have eaten carrots.cancer and ruptured appendix have eaten carrots.

Another 99.9% of people involved in auto Another 99.9% of people involved in auto accidents ate carrots within 60 days of the accidents ate carrots within 60 days of the incident.incident.

Some 93.1% of gang members come from homes Some 93.1% of gang members come from homes where carrots were frequently served.where carrots were frequently served.

Page 78: 10 Ch 10 Linear Regression and Correlation

04/08/23 80

MORE… MORE… Finding relationships when they do not exist Finding relationships when they do not exist

Among the people born in 1839, who later dined Among the people born in 1839, who later dined on carrots, there has been a 100 % mortality rate.on carrots, there has been a 100 % mortality rate.

Studies have shown, based on recent laboratory Studies have shown, based on recent laboratory tests, that rats who were fed 500 lbs. of carrots per tests, that rats who were fed 500 lbs. of carrots per day died within 3 weeks.day died within 3 weeks.

Many bunnies have been examined post-mortem Many bunnies have been examined post-mortem and were found to have eaten carrots.and were found to have eaten carrots.

Page 79: 10 Ch 10 Linear Regression and Correlation

04/08/23 81

MORE… MORE… Finding relationships when they do not exist Finding relationships when they do not exist

All surviving carrot eaters born between 1900 and All surviving carrot eaters born between 1900 and 1910 have wrinkled skin, brittle bones, few if any 1910 have wrinkled skin, brittle bones, few if any of their own teeth and failing eyesight.of their own teeth and failing eyesight.

Virtually all people who experience depression for Virtually all people who experience depression for at least 45 minutes a week are known to have at least 45 minutes a week are known to have eaten carrots sometime during their life.eaten carrots sometime during their life.

Page 80: 10 Ch 10 Linear Regression and Correlation

04/08/23 82

Monday, April 21, 1997 The Monday, April 21, 1997 The Arizona RepublicArizona Republic

Tobacco executives insist Tobacco executives insist smoking isn’t addictivesmoking isn’t addictive

No worse than carrots, R.J. No worse than carrots, R.J. Reynolds chief saysReynolds chief says

Page 81: 10 Ch 10 Linear Regression and Correlation

04/08/23 83

Smoking in high school may indicate Smoking in high school may indicate teenage suicide risk, study suggeststeenage suicide risk, study suggests

The Associated PressThe Associated Press

LOS ANGELES - High school students who LOS ANGELES - High school students who smoked were up to 18 times more likely as smoked were up to 18 times more likely as nonsmokers to say they had attempted nonsmokers to say they had attempted suicide, a government study found.suicide, a government study found.

Page 82: 10 Ch 10 Linear Regression and Correlation

04/08/23 84

Cause and effectCause and effectSmoking in high school... Smoking in high school...

The results do not imply that smoking The results do not imply that smoking causes suicide, stressed psychologist causes suicide, stressed psychologist Kenneth Carter of the Centers for Disease Kenneth Carter of the Centers for Disease Control and Prevention. Rather, he said, Control and Prevention. Rather, he said, smoking may be an indicator of depression.smoking may be an indicator of depression.

Page 83: 10 Ch 10 Linear Regression and Correlation

04/08/23 85

Cause and effectCause and effectSmoking in high school...Smoking in high school...

Some depressed youngsters may be using tobacco Some depressed youngsters may be using tobacco to gain some relief, or smoking may be common to gain some relief, or smoking may be common among teenagers who are hopeless and depressed, among teenagers who are hopeless and depressed, he said.he said.

What ever the reason for the link, the findings What ever the reason for the link, the findings suggest that if a student who seems depressed also suggest that if a student who seems depressed also smokes a pack a day, “I’m going to be a lot more smokes a pack a day, “I’m going to be a lot more worried about that student,” Carter said.worried about that student,” Carter said.

Page 84: 10 Ch 10 Linear Regression and Correlation

04/08/23 86

Cause and effectCause and effectSmoking in high school...Smoking in high school...

The results came from 11,243 high school The results came from 11,243 high school students who filled out questionnaires in a students who filled out questionnaires in a 1991 national survey.1991 national survey.

( See following slides for rest of article.)( See following slides for rest of article.)

Page 85: 10 Ch 10 Linear Regression and Correlation

04/08/23 87

Cause and effectCause and effectSmoking in high school...Smoking in high school...

Even light smokers were five to six times as likely to say Even light smokers were five to six times as likely to say they had tried to kill themselves in the previous year, they had tried to kill themselves in the previous year, according to the study, presented by Carter on Monsay at according to the study, presented by Carter on Monsay at the annual meeting of the American Psychological the annual meeting of the American Psychological Association.Association.

Boys who were heavy smokers - six or more cigarettes a Boys who were heavy smokers - six or more cigarettes a day for more than six days within the prior month - were day for more than six days within the prior month - were 18 times as likely as nonsmoking boys to have attempted 18 times as likely as nonsmoking boys to have attempted suicide in the prior year. They also were 10 times as likely suicide in the prior year. They also were 10 times as likely to report making a plan for killing themselves in the prior to report making a plan for killing themselves in the prior year.year.

Page 86: 10 Ch 10 Linear Regression and Correlation

04/08/23 88

Cause and effectCause and effectSmoking in high school...Smoking in high school...

Girls who were heavy smokers were five Girls who were heavy smokers were five times as likely as nonsmoking girls to have times as likely as nonsmoking girls to have attempted suicide.attempted suicide.

Page 87: 10 Ch 10 Linear Regression and Correlation

04/08/23 89

This is an old homework problemThis is an old homework problem

You can use it for practice if you’d like.

Page 88: 10 Ch 10 Linear Regression and Correlation

04/08/23 90

GMAT GPI688 3.72647 3.44652 3.21608 3.29680 3.91617 3.28557 3.02599 3.13616 3.45594 3.33567 3.07542 2.86551 2.91573 2.79536 3639 3.55619 3.47694 3.6718 3.88759 3.76

Data

Page 89: 10 Ch 10 Linear Regression and Correlation

04/08/23 91

Scatter Diagram and Regression Line to predict Grade Point Index from GMAT score

y = 0.0049x + 0.3003

R2 = 0.7978

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

500 550 600 650 700 750 800

GMAT Score

Gra

de

Po

int

Ind

ex

For each one unit increase in the GMAT score, GPI increases by .0049 points. Or, for a 100 point increase in GMAT, GPI increases by .49 or half a grade.

R2: About 80% of the variation in GPI is explained by the the variation in the GMAT Score

R=.89

There is very strong positive correlation between the GPI and GMAT score. (+1.0 is perfect positive correlation and zero is no correlation.

Page 90: 10 Ch 10 Linear Regression and Correlation

04/08/23 92

A further discussion of A further discussion of correlation and regressioncorrelation and regression

There is very high correlation between GPI and GMAT test scores. Does this mean that GMAT test scores cause GPI?

This does not mean that studying for the GMAT will raise your GPI in graduate school. An increase in GMAT does not

cause your GPI to go up. If this were the case, why study finance in graduate school why not concentrate on raising your GMAT score.

Page 91: 10 Ch 10 Linear Regression and Correlation

04/08/23 93

Confidence Interval Estimate Problem 13.77 on page 847

X Value 600Confidence Level 95%Sample Size 20Degrees of Freedom 18t Value 2.100923666Sample Mean 622.8Sum of Squared Difference 72757.20Standard Error of the Estimate 0.155870258h Statistic 0.05714486Average Predicted Y (YHat) 3.222458849

For Average Predicted Y (YHat)Interval Half Width 0.078282036Confidence Interval Lower Limit 3.144176813Confidence Interval Upper Limit 3.300740886

For Individual Response YInterval Half Width 0.336698188Prediction Interval Lower Limit 2.885760661Prediction Interval Upper Limit 3.559157038

We are 95% confident that the mean GPI, for a group of students who scored 600 on the GMAT, will be between 3.14 and 3.30.

We are 95% confident that the GPI, for one student who scored 600 on the GMAT, will be between 2.90 and 3.56.

Margin of error

Page 92: 10 Ch 10 Linear Regression and Correlation

04/08/23 94

)(^ 2

YY

)(^ 2

YY

YY

^

Y

Y

YY ^

2^

YY 2

^

)( YY

2)( YY