10 ch 10 linear regression and correlation

Post on 08-Nov-2014

34 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

DESCRIPTION

from my lecture notes(MA IN PUBLIC PROCUREMENT AND ASSET MANAGEMENT)

TRANSCRIPT

04/08/23 1Huangpu River

Chapter 12

Linear Regression and Correlation

04/08/23 2

Chapter 12Chapter 12Linear Regression and CorrelationLinear Regression and Correlation

Weekly

Sales

Aptitude Test Score

04/08/23 3

TO DISCUSS SCATTER DIAGRAMS.TO DISCUSS SCATTER DIAGRAMS. TO DISCUSS THE COEFFICIENT OF TO DISCUSS THE COEFFICIENT OF

CORRELATION.CORRELATION. TO DISCUSS THE COEFFICIENT OF TO DISCUSS THE COEFFICIENT OF

DETERMINATION.DETERMINATION. TO USE THE LEAST SQUARES METHOD TO TO USE THE LEAST SQUARES METHOD TO

DETERMINE A LINEAR REGRESSION DETERMINE A LINEAR REGRESSION EQUATION.EQUATION.

TO INTERPRET THE LINEAR REGRESSION TO INTERPRET THE LINEAR REGRESSION EQUATIONEQUATION

Learning ObjectivesLearning Objectives

04/08/23 4

Learning Objectives Learning Objectives (continued)(continued)

TO COMPUTE THE STANDARD ERROR OF TO COMPUTE THE STANDARD ERROR OF ESTIMATE AND EXPLAIN ITS USE.ESTIMATE AND EXPLAIN ITS USE.

TO CONSTRUCT A CONFIDENCE TO CONSTRUCT A CONFIDENCE INTERVAL AND A PREDICTION INTERVAL INTERVAL AND A PREDICTION INTERVAL FOR THE ESTIMATES OF THE FOR THE ESTIMATES OF THE DEPENDENT VARIABLE.DEPENDENT VARIABLE.

Understand the limitations, errors, and Understand the limitations, errors, and caveats of using regression and correlation caveats of using regression and correlation and evaluating assumptions using residual and evaluating assumptions using residual analysisanalysis

04/08/23 5

GBS221 GRADE DISTRIBUTION

0

10

20

30

40

50

60

70

Class < 59 60 - 69 70 - 79 80 - 89 90 - 100

# O

F S

TU

DEN

TS

Descriptive StatisticsDescriptive Statistics

04/08/23 6

Statistical InferenceStatistical Inference

Population Sample

?

?

?2

P

X

S

ps2

Estimates

04/08/23 7

Chapter 12Chapter 12Linear Regression and CorrelationLinear Regression and Correlation

Weekly

Sales

Aptitude Test Score

04/08/23 8

Example 1: Example 1: Plot the relationship between Test Scores and Plot the relationship between Test Scores and

Weekly Sales:Weekly Sales:

Sales Person

Test Score (X)

Weekly Sales (Y)

Mike 1 2

Melissa 2 4

Jalene 3 8

Jeff 4 6

Brian 5 12

Nicole 6 10

04/08/23 9

Correlation and RegressionWeekly Sales vs. Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

__________________________

____

____

___

____

___

____

04/08/23 10

Correlation and RegressionWeekly Sales vs Test Scores

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

04/08/23 11

Correlation and Regression Weekly Sales vs Test Scores

y = 1.7714x + 0.8

R2 = 0.7845

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

Test Score

Wee

kly

Sal

es (

$000

)

04/08/23 12

Demonstrate how to create a scatter Demonstrate how to create a scatter diagram and compute the regression diagram and compute the regression

equation using Excelequation using Excel Use directions on pages 70-74Use directions on pages 70-74

– Use Insert|Chart for ExcelUse Insert|Chart for Excel Also see pages 493-495Also see pages 493-495

(For this demonstration, use X=5)(For this demonstration, use X=5)

04/08/23 13

Example 1 continued...Example 1 continued...

r

bb

r

Y

X

O

2

1

04/08/23 14

= 3.5

= (.8857)2=.7845

= .8857

= 1.7714

= 0.8

= 7

Example 1 continued...Example 1 continued...

Sample mean of “X” values.

Sample mean of “Y” values.

Y-intercept

Slope of the regression line.

Coefficient of correlation

Coefficient of determinationr

bb

r

Y

X

O

2

1

04/08/23 15

Slope-Intercept formSlope-Intercept formof a straight lineof a straight line

Y = mX + bY = mX + b Y is the dependent variableY is the dependent variable X is the independent variableX is the independent variable m is the slope of the linem is the slope of the line b is the Y-interceptb is the Y-intercept But statisticians are peculiar. You But statisticians are peculiar. You

might say they have a deviation!!!might say they have a deviation!!!

04/08/23 18

ii XbbY 10

Yi

= Predicted Value of Y for observation i

Xi = Value of X for observation i

b0 = Sample Y - intercept used as estimate ofthe population 0

b1 = Sample Slope used as estimate of the population 1

Simple Linear Regression ModelSimple Linear Regression Model

04/08/23 19

Interpreting the ResultsInterpreting the Results

Yi = +0.8 + 1.7714Xi

The slope of 1.7714 means for each increase of one unit in X, the Y is estimated to increase 1.7714 units.

For each increase of 1 unit in the test score, the model predicts that the expected weekly sales are estimated to increase by $1.7714 thousand.

04/08/23 20

PERFECT NEGATIVE CORRELATIONPERFECT NEGATIVE CORRELATIONPERFECT NEGATIVE CORRELATIONPERFECT NEGATIVE CORRELATION

Y

X

r = -1

04/08/23 21

PERFECT POSITIVE CORRELATIONPERFECT POSITIVE CORRELATIONPERFECT POSITIVE CORRELATIONPERFECT POSITIVE CORRELATION

Y

X

r = +1

04/08/23 22

ZERO CORRELATIONZERO CORRELATIONZERO CORRELATIONZERO CORRELATION

Y

X

r = 0

04/08/23 23

STRONG POSITIVE CORRELATIONSTRONG POSITIVE CORRELATIONSTRONG POSITIVE CORRELATIONSTRONG POSITIVE CORRELATION

Y

X

04/08/23 24

Use the following definitions to Use the following definitions to interpret the results of this exampleinterpret the results of this example

• r is the coefficient of correlation. This indicates the strength of the relationship between X and Y and whether the relationship is + or -.

04/08/23 25

Interpretation of the coefficient Interpretation of the coefficient of correlationof correlation

r = 0.8857r = 0.8857 There is strong positive correlation between There is strong positive correlation between

a salesperson’s weekly sales and his/her a salesperson’s weekly sales and his/her score on the aptitude test.score on the aptitude test.

-1 0 +1

+0.8857

04/08/23 26

Coefficient of determinationCoefficient of determination r2 is the coefficient of determination. This

indicates the proportion of the variation in Y that is explained by X.

04/08/23 27

Coefficient of determinationCoefficient of determination

rr22 = 0.7845 = 0.7845 About 78% of the variation in weekly sales About 78% of the variation in weekly sales

is explained by the variation in test scores.is explained by the variation in test scores.

oror The variation in test scores explains 78% of The variation in test scores explains 78% of

the variation in weekly sales.the variation in weekly sales.

04/08/23 28

Purpose of Regression and Purpose of Regression and Correlation AnalysisCorrelation Analysis

• Regression Analysis is Used Primarily for Prediction

A statistical model used to predict the values of a dependent or response variable based on values of at least one independent or explanatory variable

• Correlation Analysis is Used to Measure Strength of the Association Between Numerical Variables

04/08/23 29

For the Test Score/Weekly Sales For the Test Score/Weekly Sales ProblemProblem

bO

bO

b1

YX ,

Plot the regression line on your graph Plot the regression line on your graph using the given values of using the given values of “ =0.8” and and “ =1.7714.”

– Hint: The regression line will Hint: The regression line will always go through the Y-intercept always go through the Y-intercept

(0, ) and ( ).

04/08/23 30

(0.8)

(0,0.8)

Correlation and RegressionWeekly Sales vs Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

(3.5,7)

Regression Line: =0.8+1.77X

(0,.8)

^

Y

04/08/23 31

Correlation and Regression Weekly Sales vs Test Scores

y = 1.7714x + 0.8

R2 = 0.7845

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

Test Score

Wee

kly

Sal

es (

$000

)

04/08/23 32

For the Test Score/Weekly Sales For the Test Score/Weekly Sales ProblemProblem

Compute the predicted value of weekly Compute the predicted value of weekly sales ( ) for each of the following sales ( ) for each of the following test scores (X).test scores (X).

XX11 ??22334455

^

Y

^

Y

04/08/23 33

For the Test Score/Weekly Sales For the Test Score/Weekly Sales ProblemProblem

Compute the predicted value of weekly Compute the predicted value of weekly sales ( ) for each of the following test sales ( ) for each of the following test scores (X).scores (X).

XX

11 2.5712.571

22 4.3434.343

33 6.1146.114

44 7.8867.886

55 9.6579.657

^

Y

^

Y

04/08/23 34

Confidence IntervalsConfidence Intervals

OH NO!!!!!!!!

OH NO!!!!!!!!

04/08/23 35

For the Test Score/Weekly Sales For the Test Score/Weekly Sales ProblemProblem

Assume that an applicant scored 5 on the Assume that an applicant scored 5 on the aptitude test:aptitude test:– What do you predict her weekly sales will be?What do you predict her weekly sales will be?– Interpret your answer in light of what you know Interpret your answer in light of what you know

about “point” and “interval” estimates.about “point” and “interval” estimates.

We have to find the standard error of We have to find the standard error of the estimate.the estimate.

04/08/23 36

Predicted mean weekly sales for Predicted mean weekly sales for applicants who scored 5 on the aptitude applicants who scored 5 on the aptitude

testtest The predicted weekly sales for applicants The predicted weekly sales for applicants

who scored 5 on the test is $9,657.who scored 5 on the test is $9,657. Since the regression line is an “average” Since the regression line is an “average”

line drawn through the data, this is a point line drawn through the data, this is a point estimate of average weekly sales.estimate of average weekly sales.

A confidence interval can be computed, i.e., A confidence interval can be computed, i.e., “We can be 95% confident that mean “We can be 95% confident that mean weekly sales will be between _ and _.”weekly sales will be between _ and _.”

04/08/23 37

Correlation and Regression Weekly Sales vs Test Scores

y = 1.7714x + 0.8

R2 = 0.7845

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

Test Score

Wee

kly

Sal

es (

$000

)

In a REAL PROBLEM there may be many observed values of Y for each value of X.

04/08/23 38

Trade Executions vs. Incoming Phone Calls

y = 0.1415x + 39.351

R2 = 0.3533

200

250

300

350

400

450

500

1800190020002100220023002400250026002700

# of Incoming Calls

# o

f T

rad

e E

xe

cu

tio

ns

04/08/23 39

Trade Executions vs. Incoming Phone Calls

y = 0.1415x + 39.351

R2 = 0.3533

200

250

300

350

400

450

500

1800190020002100220023002400250026002700

# of Incoming Calls

# o

f T

rad

e E

xe

cu

tio

ns

23 1 SYX

04/08/23 40

Standard Error of the EstimateStandard Error of the Estimate

In chapter 3 we measured the dispersion In chapter 3 we measured the dispersion about an “average” called the Mean.about an “average” called the Mean.

In chapter 6 we measured the dispersion In chapter 6 we measured the dispersion about a “average” called the Mean of the about a “average” called the Mean of the Means.Means.

Now we want to measure the dispersion Now we want to measure the dispersion about an “average line” called the about an “average line” called the Regression Line.Regression Line.

04/08/23 41

Measures of DispersionMeasures of Dispersion

Estimate theofError Standard =

Mean theofError Standard = or

Deviation Standard Sampleor Population = Sor

SYX

S XX

04/08/23 42

Section 12.3, Measures of Section 12.3, Measures of VariationVariation

See pages 421 through 427 in the text.See pages 421 through 427 in the text. We will discuss the following topics.We will discuss the following topics.

– Obtaining the Sum of SquaresObtaining the Sum of Squares– The Coefficient of DeterminationThe Coefficient of Determination– The Standard Error of the EstimateThe Standard Error of the Estimate

04/08/23 43

(0.8)

(0,0.8)

Correlation and RegressionWeekly Sales vs Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

Y

Y

X

=0.8+1.77X^

Y

04/08/23 44

(0.8)

(0,0.8)

Correlation and RegressionWeekly Sales vs Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

Y

Y

'Y

X

=0.8+1.77X^

Y

04/08/23 45

(0.8)

(0,0.8)

Correlation and RegressionWeekly Sales vs Test scores

0

2

4

6

8

10

12

0 1 2 3 4 5 6

Aptitude Test Score

Wee

kly

Sal

es (

$000

)

Total

Error

Unexplained Error

Error explained by regression line

Y

Y

X

=0.8+1.77X^

Y

^

Y(SSR)

(SSE)

(SST)

What proportion of the variation in Y is explained by the variation in X?

04/08/23 46

The Coefficient of The Coefficient of DeterminationDetermination

SSR regression sum of squares

SST total sum of squaresr2 = =

Measures the proportion of variation that is explained by the independent variable X in the regression model

SSR Regression Sum of Squares

04/08/23 47

YY Error Total

YYSST2

Total Squares of Sum

YY ^

line regressionby explainedError

)(^

regression squares of Sum2

YYSSR

^

Error dUnexplaine YY

)(^

error squares of Sum2

YYSSR + SSE = SST

04/08/23 48

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 49

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 50

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 51

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 52

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 53

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 54

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 55

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 56

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 57

Test WeeklyScore(X) Sales (Y)

1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 252 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 93 8 6.114 -0.886 0.7846 1.8858 3.5562 1 14 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 15 12 9.657 2.657 7.0596 2.3430 5.4896 5 256 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9

7 SSR 54.91251 SSE 15.0857 SST 70 )(

^ 2

YY )(^ 2

YY YY2

^

Y

Y

YY ^

2^

YY

^

YY 2^

)( YY YY 2)( YY

Computations for SSR, SSE and SST

04/08/23 58

Measures of Variation:Measures of Variation:The Sum of SquaresThe Sum of Squares

SST = Total Sum of SquaresSST = Total Sum of Squares– Measures the variation of the Y values around Measures the variation of the Y values around

their mean their mean SSR = Regression Sum of SquaresSSR = Regression Sum of Squares

– explained variation attributable to the explained variation attributable to the relationship between X and Y.relationship between X and Y.

SSE = Error Sum of SquaresSSE = Error Sum of Squares– variation attributable to factors other than the variation attributable to factors other than the

relationship between X and Yrelationship between X and Y

Y

04/08/23 59

Standard Error of the EstimateStandard Error of the Estimate

2

^

2

)(2

nn

SSE YYSYXStandard error of estimate - measures the scatter, or dispersion,

of the observed values around the line of regression.

1

)(2

n

Sxx

04/08/23 60

Predicted weekly sales for an applicant Predicted weekly sales for an applicant who scored 5 on the aptitude testwho scored 5 on the aptitude test

The predicted mean weekly sales for The predicted mean weekly sales for applicants who scored 5 on the test is $9,657.applicants who scored 5 on the test is $9,657.

Since the regression line is an “average” line Since the regression line is an “average” line drawn through the data, this is a point drawn through the data, this is a point estimate of average weekly sales.estimate of average weekly sales.

A confidence interval can be computed, i.e., A confidence interval can be computed, i.e., “We can be 95% confident that mean weekly “We can be 95% confident that mean weekly sales will be between _ and _.”sales will be between _ and _.”

04/08/23 61

Trade Executions vs. Incoming Phone Calls

y = 0.1415x + 39.351

R2 = 0.3533

200

250

300

350

400

450

500

1800190020002100220023002400250026002700

# of Incoming Calls

# o

f T

rad

e E

xe

cu

tio

ns

23 1 SYX

04/08/23 62

Confidence Interval - Large Confidence Interval - Large SampleSample

SYXZorY )( ^

Dream on. It can’t be this easy!!!Dream on. It can’t be this easy!!!

04/08/23 63

Confidence Interval - Small Confidence Interval - Small SampleSample

Dream on. It can’t be this easy!!!Dream on. It can’t be this easy!!!

SYXtorY )( ^

04/08/23 64

Estimation of Predicted Estimation of Predicted ValuesValues

Confidence Interval Estimate for XY

The Mean of Y given a particular Xi

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

t value from table with df=n-2

Standard error of the estimate

Size of interval vary according to distance away from mean, X.

For the mean weekly sales for a group of applicants who got 5 on the test.

04/08/23 65

Estimation of Predicted Values

Prediction Interval Estimate for Individual Response Yi at a Particular Xi

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

1

Addition of this 1 increased width of interval from that for the mean Y

Use this when you want the estimated weekly sales of one particular applicant (e.g., Jo Cruickshank) who scored 5 on the test.

04/08/23 66

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Compute the interval estimate (for a group Compute the interval estimate (for a group of applicants who scored 5 on the test).of applicants who scored 5 on the test).– Two tail testTwo tail test– Alpha error = .05Alpha error = .05– df = n-2df = n-2– = 9.657= 9.657

– SSyx= 1.942 yx= 1.942

^

Y

04/08/23 67

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Y t SY X nX X

XXn

where t is from Appendix F withn of freedom

( ) ( )

( ) .

1 2

22

2 degrees9 657 2 776 19421

6 914416

25 35. ( . )( . )

( . )

or

9 657 2 776 1942 29524

9 657 2 929

. ( . )( . ) .

. .

or

or

Between $6,728 and $12,586

04/08/23 68

Demonstrate how to compute Demonstrate how to compute confidence intervals using “PredInt.”confidence intervals using “PredInt.”

04/08/23 69

Confidence Interval Estimate

X Value 5Confidence Level 95%Sample Size 6Degrees of Freedom 4t Value 2.776450856Sample Mean 3.5Sum of Squared Difference 17.50Standard Error of the Estimate 1.942016625h Statistic 0.295238095Average Predicted Y (YHat) 9.657142857

For Average Predicted Y (YHat)Interval Half Width 2.929740344Confidence Interval Lower Limit 6.727402513Confidence Interval Upper Limit 12.5868832

For Individual Response YInterval Half Width 6.136457614Prediction Interval Lower Limit 3.520685243Prediction Interval Upper Limit 15.79360047

04/08/23 70

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Interpret the results of your interval Interpret the results of your interval estimate.estimate.

04/08/23 71

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problemInterpretation of the Interval EstimateInterpretation of the Interval Estimate

We can say, with 95% confidence, that the mean weekly sales for a group of applicants who scored 5 on the aptitude test will be between $6,728 and $12,586.

04/08/23 72

Confidence Interval Estimate

X Value 5Confidence Level 95%Sample Size 6Degrees of Freedom 4t Value 2.776450856Sample Mean 3.5Sum of Squared Difference 17.50Standard Error of the Estimate 1.942016625h Statistic 0.295238095Average Predicted Y (YHat) 9.657142857

For Average Predicted Y (YHat)Interval Half Width 2.929740344Confidence Interval Lower Limit 6.727402513Confidence Interval Upper Limit 12.5868832

For Individual Response YInterval Half Width 6.136457614Prediction Interval Lower Limit 3.520685243Prediction Interval Upper Limit 15.79360047

04/08/23 73

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Compute the prediction interval (for Jo Compute the prediction interval (for Jo Cruickshank who scored 5 on the test).Cruickshank who scored 5 on the test).– Two tail testTwo tail test– Alpha error = .05Alpha error = .05– df = n-2df = n-2– = 9.657= 9.657– Syx = 1.942 Syx = 1.942

^

Y

04/08/23 74

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Y t SY X nX X

XXn

where t is from Appendix F withn rees of freedom

( ) ( )

( ) deg .

1 1 2

22

29 657 2 776 1942 129524

9 657 6135

. ( . )( . ) .

. .

or

or

Between $3,522 and $15,792

04/08/23 75

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problem

Interpret the results of your prediction Interpret the results of your prediction interval.interval.

04/08/23 76

Weekly Sales/Test Scores problemWeekly Sales/Test Scores problemInterpretation of the Prediction IntervalInterpretation of the Prediction Interval

We can say, with 95% confidence, that the weekly sales for applicant Jo Cruickshank, who scored 5 on the aptitude test, will be between $3,520 and $15,790.

04/08/23 77

Common Errors When Using Common Errors When Using Regression And Correlation Regression And Correlation

AnalysisAnalysis

04/08/23 78

Using Regression and Correlation Analyses: Using Regression and Correlation Analyses: Limitations and ErrorsLimitations and Errors

Extrapolation beyond the range of the Extrapolation beyond the range of the observed dataobserved data

Cause and effectCause and effect Using past trends to estimate future trendsUsing past trends to estimate future trends Misinterpreting the coefficients of correlation Misinterpreting the coefficients of correlation

and determinationand determination Finding relationships when they do not existFinding relationships when they do not exist

04/08/23 79

Finding relationships when they do not existFinding relationships when they do not exist

Nearly all sick people have eaten carrots. Nearly all sick people have eaten carrots. Obviously, the effects are cumulative.Obviously, the effects are cumulative.

An estimated 99.9% of all people who die from An estimated 99.9% of all people who die from cancer and ruptured appendix have eaten carrots.cancer and ruptured appendix have eaten carrots.

Another 99.9% of people involved in auto Another 99.9% of people involved in auto accidents ate carrots within 60 days of the accidents ate carrots within 60 days of the incident.incident.

Some 93.1% of gang members come from homes Some 93.1% of gang members come from homes where carrots were frequently served.where carrots were frequently served.

04/08/23 80

MORE… MORE… Finding relationships when they do not exist Finding relationships when they do not exist

Among the people born in 1839, who later dined Among the people born in 1839, who later dined on carrots, there has been a 100 % mortality rate.on carrots, there has been a 100 % mortality rate.

Studies have shown, based on recent laboratory Studies have shown, based on recent laboratory tests, that rats who were fed 500 lbs. of carrots per tests, that rats who were fed 500 lbs. of carrots per day died within 3 weeks.day died within 3 weeks.

Many bunnies have been examined post-mortem Many bunnies have been examined post-mortem and were found to have eaten carrots.and were found to have eaten carrots.

04/08/23 81

MORE… MORE… Finding relationships when they do not exist Finding relationships when they do not exist

All surviving carrot eaters born between 1900 and All surviving carrot eaters born between 1900 and 1910 have wrinkled skin, brittle bones, few if any 1910 have wrinkled skin, brittle bones, few if any of their own teeth and failing eyesight.of their own teeth and failing eyesight.

Virtually all people who experience depression for Virtually all people who experience depression for at least 45 minutes a week are known to have at least 45 minutes a week are known to have eaten carrots sometime during their life.eaten carrots sometime during their life.

04/08/23 82

Monday, April 21, 1997 The Monday, April 21, 1997 The Arizona RepublicArizona Republic

Tobacco executives insist Tobacco executives insist smoking isn’t addictivesmoking isn’t addictive

No worse than carrots, R.J. No worse than carrots, R.J. Reynolds chief saysReynolds chief says

04/08/23 83

Smoking in high school may indicate Smoking in high school may indicate teenage suicide risk, study suggeststeenage suicide risk, study suggests

The Associated PressThe Associated Press

LOS ANGELES - High school students who LOS ANGELES - High school students who smoked were up to 18 times more likely as smoked were up to 18 times more likely as nonsmokers to say they had attempted nonsmokers to say they had attempted suicide, a government study found.suicide, a government study found.

04/08/23 84

Cause and effectCause and effectSmoking in high school... Smoking in high school...

The results do not imply that smoking The results do not imply that smoking causes suicide, stressed psychologist causes suicide, stressed psychologist Kenneth Carter of the Centers for Disease Kenneth Carter of the Centers for Disease Control and Prevention. Rather, he said, Control and Prevention. Rather, he said, smoking may be an indicator of depression.smoking may be an indicator of depression.

04/08/23 85

Cause and effectCause and effectSmoking in high school...Smoking in high school...

Some depressed youngsters may be using tobacco Some depressed youngsters may be using tobacco to gain some relief, or smoking may be common to gain some relief, or smoking may be common among teenagers who are hopeless and depressed, among teenagers who are hopeless and depressed, he said.he said.

What ever the reason for the link, the findings What ever the reason for the link, the findings suggest that if a student who seems depressed also suggest that if a student who seems depressed also smokes a pack a day, “I’m going to be a lot more smokes a pack a day, “I’m going to be a lot more worried about that student,” Carter said.worried about that student,” Carter said.

04/08/23 86

Cause and effectCause and effectSmoking in high school...Smoking in high school...

The results came from 11,243 high school The results came from 11,243 high school students who filled out questionnaires in a students who filled out questionnaires in a 1991 national survey.1991 national survey.

( See following slides for rest of article.)( See following slides for rest of article.)

04/08/23 87

Cause and effectCause and effectSmoking in high school...Smoking in high school...

Even light smokers were five to six times as likely to say Even light smokers were five to six times as likely to say they had tried to kill themselves in the previous year, they had tried to kill themselves in the previous year, according to the study, presented by Carter on Monsay at according to the study, presented by Carter on Monsay at the annual meeting of the American Psychological the annual meeting of the American Psychological Association.Association.

Boys who were heavy smokers - six or more cigarettes a Boys who were heavy smokers - six or more cigarettes a day for more than six days within the prior month - were day for more than six days within the prior month - were 18 times as likely as nonsmoking boys to have attempted 18 times as likely as nonsmoking boys to have attempted suicide in the prior year. They also were 10 times as likely suicide in the prior year. They also were 10 times as likely to report making a plan for killing themselves in the prior to report making a plan for killing themselves in the prior year.year.

04/08/23 88

Cause and effectCause and effectSmoking in high school...Smoking in high school...

Girls who were heavy smokers were five Girls who were heavy smokers were five times as likely as nonsmoking girls to have times as likely as nonsmoking girls to have attempted suicide.attempted suicide.

04/08/23 89

This is an old homework problemThis is an old homework problem

You can use it for practice if you’d like.

04/08/23 90

GMAT GPI688 3.72647 3.44652 3.21608 3.29680 3.91617 3.28557 3.02599 3.13616 3.45594 3.33567 3.07542 2.86551 2.91573 2.79536 3639 3.55619 3.47694 3.6718 3.88759 3.76

Data

04/08/23 91

Scatter Diagram and Regression Line to predict Grade Point Index from GMAT score

y = 0.0049x + 0.3003

R2 = 0.7978

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

500 550 600 650 700 750 800

GMAT Score

Gra

de

Po

int

Ind

ex

For each one unit increase in the GMAT score, GPI increases by .0049 points. Or, for a 100 point increase in GMAT, GPI increases by .49 or half a grade.

R2: About 80% of the variation in GPI is explained by the the variation in the GMAT Score

R=.89

There is very strong positive correlation between the GPI and GMAT score. (+1.0 is perfect positive correlation and zero is no correlation.

04/08/23 92

A further discussion of A further discussion of correlation and regressioncorrelation and regression

There is very high correlation between GPI and GMAT test scores. Does this mean that GMAT test scores cause GPI?

This does not mean that studying for the GMAT will raise your GPI in graduate school. An increase in GMAT does not

cause your GPI to go up. If this were the case, why study finance in graduate school why not concentrate on raising your GMAT score.

04/08/23 93

Confidence Interval Estimate Problem 13.77 on page 847

X Value 600Confidence Level 95%Sample Size 20Degrees of Freedom 18t Value 2.100923666Sample Mean 622.8Sum of Squared Difference 72757.20Standard Error of the Estimate 0.155870258h Statistic 0.05714486Average Predicted Y (YHat) 3.222458849

For Average Predicted Y (YHat)Interval Half Width 0.078282036Confidence Interval Lower Limit 3.144176813Confidence Interval Upper Limit 3.300740886

For Individual Response YInterval Half Width 0.336698188Prediction Interval Lower Limit 2.885760661Prediction Interval Upper Limit 3.559157038

We are 95% confident that the mean GPI, for a group of students who scored 600 on the GMAT, will be between 3.14 and 3.30.

We are 95% confident that the GPI, for one student who scored 600 on the GMAT, will be between 2.90 and 3.56.

Margin of error

04/08/23 94

)(^ 2

YY

)(^ 2

YY

YY

^

Y

Y

YY ^

2^

YY 2

^

)( YY

2)( YY

top related