chapter 14 correlation and regression - kau...chapter 14 correlation and regression powerpoint...

Post on 13-Aug-2021

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 14Correlation and Regression

PowerPoint Lecture Slides

Essentials of Statistics for the Behavioral Sciences Eighth Edition

by Frederick J. Gravetter and Larry B. Wallnau

Chapter 14 Learning Outcomes

• Understand Pearson r as measure of variables’ relationship1

• Compute Pearson r using definitional or computational formula2

• Use and interpret Pearson r; understand assumptions & limitations3

• Test hypothesis about population correlation (ρ) with sample r4

• Understand the concept of a partial correlation5

Chapter 14 Learning Outcomes (continued)

• Explain/compute Spearman correlation coefficient (ranks)6

• Explain/compute point-biserial correlation coefficient (one dichotomous variable)7

• Explain/compute phi-coefficient for two dichotomous variables8

• Explain/compute linear regression equation to predict Y values9

• Evaluate significance of regression equation 10

Tools You Will Need

• Sum of squares (SS) (Chapter 4)

– Computational formula

– Definitional formula

• z-Scores (Chapter 5)

• Hypothesis testing (Chapter 8)

• Analysis of Variance (Chapter 12)

– MS values and F-ratios

14.1 Introduction to Correlation

• Measures and describes the relationship between two variables

• Characteristics of relationships

– Direction (negative or positive; indicated by the sign, + or – of the correlation coefficient)

– Form (linear is most common)

– Strength or consistency (varies from 0 to 1)

• Characteristics are all independent

Figure 14.1 Scatterplot for Correlational Data

Figure 14.2 Positive and Negative Relationships

Figure 14.3 Different Linear Relationship Values

14.2 The Pearson Correlation

• Measures the degree and the direction of the linear relationship between two variables

• Perfect linear relationship

– Every change in X has a corresponding change in Y

– Correlation will be –1.00 or +1.00

y separatelY and X of variablity

Y and X ofity covariabilr

Sum of Products (SP)

• Similar to SS (sum of squared deviations)

• Measures the amount of covariabilitybetween two variables

• SP definitional formula:

))(( YX MYMXSP

SP – Computational formula

• Definitional formula emphasizes SP as the sum of two difference scores

• Computational formula results in easier calculations

• SP computational formula:

n

YXXYSP

Pearson Correlation Calculation

• Ratio comparing the covariability of X and Y (numerator) with the variability of X and Y separately (denominator)

YX SSSS

SPr

Figure 14.4 Example 14.3 Scatterplot

Pearson Correlation andz-Scores

• Pearson correlation formula can be expressed as a relationship of z-scores.

N

zz

n

zzr

YX

YX

:Population

1 :Sample

Learning Check• A scatterplot shows a set of data points that fit

very loosely around a line that slopes down to the right. Which of the following values would be closest to the correlation for these data?

• 0.75A

• 0.35B

• -0.75C

• -0.35D

Learning Check - Answer• A scatterplot shows a set of data points that fit

very loosely around a line that slopes down to the right. Which of the following values would be closest to the correlation for these data?

• 0.75A

• 0.35B

• -0.75C

• -0.35D

Learning Check

• Decide if each of the following statements is True or False

• A set of n = 10 pairs of X and Y scores has ΣX = ΣY = ΣXY = 20. For this set of scores, SP = –20

T/F

• If the Y variable decreases when the X variable decreases, their correlation is negative

T/F

Learning Check - Answers

20402010

)20)(20(20 SP

14.3 Using and Interpreting the Pearson Correlation

• Correlations used for:

– Prediction

– Validity

– Reliability

– Theory verification

Interpreting Correlations

• Correlation describes a relationship but does not demonstrate causation

• Establishing causation requires an experiment in which one variable is manipulated and others carefully controlled

• Example 14.4 (and Figure 14.5) demonstrates the fallacy of attributing causation after observing a correlation

Figure 14.5 Correlation: Churches and Serious Crimes

Correlations and Restricted Range of Scores

• Correlation coefficient value (size) will be affected by the range of scores in the data

• Severely restricted range may provide a very different correlation than would a broader range of scores

• To be safe, never generalize a correlation beyond the sample range of data

Figure 14.6 Restricted Score Range Influences Correlation

Correlations and Outliers

• An outlier is an extremely deviant individual in the sample

• Characterized by a much larger (or smaller) score than all the others in the sample

• In a scatter plot, the point is clearly different from all the other points

• Outliers produce a disproportionately large impact on the correlation coefficient

Figure 14.7 Outlier Influences Size of Correlation

Correlations and the Strength of the Relationship

• A correlation coefficient measures the degree of relationship on a scale from 0 to 1.00

• It is easy to mistakenly interpret this decimal number as a percent or proportion

• Correlation is not a proportion

• Squared correlation may be interpreted as the proportion of shared variability

• Squared correlation is called the coefficient of determination

Coefficient of Determination

• Coefficient of determination measures the proportion of variability in one variable that can be determined from the relationship with the other variable (shared variability)

2rionDeterminat of oefficientC

Figure 14.8 Three Amounts of Linear Relationship Example

14.4 Hypothesis Tests with the Pearson Correlation

• Pearson correlation is usually computed for sample data, but used to test hypotheses about the relationship in the population

• Population correlation shown by Greek letter rho (ρ)

• Non-directional: H0: ρ = 0 and H1: ρ ≠ 0Directional: H0: ρ ≤ 0 and H1: ρ > 0 orDirectional: H0: ρ ≥ 0 and H1: ρ < 0

Figure 14.9 Correlation in Sample vs. Population

Correlation Hypothesis Test

• Sample correlation r used to test population ρ

• Degrees of freedom (df) = n – 2

• Hypothesis test can be computed using either t or F; only t shown in this chapter

• Use t table to find critical value with df = n - 2

)2(

)1( 2

n

r

rt

In the Literature

• Report

– Whether it is statistically significant

• Concise test results

– Value of correlation

– Sample size

– p-value or level

– Type of test (one- or two-tailed)

• E.g., r = -0.76, n = 48, p < .01, two tails

Partial Correlation

• A partial correlation measures the relationship between two variables while mathematically controlling the influence of a third variable by holding it constant

)1)(1(

)(

22yzxz

yzxyxy

zxy

rr

rrrr

Figure 14.10 Controlling the Impact of a Third Variable

14.5 Alternatives to the Pearson Correlation

• Pearson correlation has been developed

– For data having linear relationships

– With data from interval or ratio measurement scales

• Other correlations have been developed

– For data having non-linear relationships

– With data from nominal or ordinal measurement scales

Spearman Correlation

• Spearman (rs) correlation formula is used with data from an ordinal scale (ranks)

– Used when both variables are measured on an ordinal scale

– Also may be used if measurement scales is interval or ratio when relationship is consistently directional but may not be linear

Figure 14.11 Consistent Nonlinear Positive Relationship

Figure 14.12 Scatterplot Showing Scores and Ranks

Ranking Tied Scores

• Tie scores need ranks for Spearman correlation

• Method for assigning rank

– List scores in order from smallest to largest

– Assign a rank to each position in the list

– When two (or more) scores are tied, compute the mean of their ranked position, and assign this mean value as the final rank for each score.

Special Formula for the Spearman Correlation

• The ranks for the scores are simply integers

• Calculations can be simplified

– Use D as the difference between the X rank and the Y rank for each individual to compute the rs

statistic

)1(

61

2

2

nn

Drs

Point-Biserial Correlation

• Measures relationship between two variables

– One variable has only two values(called a dichotomous or binomial variable)

• Effect size for independent samples t-test in Chapter 10 can be measures by r2

– Point-biserial r2 has same value as the r2

computed from t-statistic

– t-statistic tests significance of the mean difference

– r statistic measures the correlation size

Point-Biserial Correlation

• Applicable in the same situation as the independent-measures t test in Chapter 10

– Code one group 0 and the other 1 (or any two digits) as the Y score

– t-statistic evaluates the significance of mean difference

– Point-Biserial r measures correlation magnitude

– r2 quantifies effect size

Phi Coefficient

• Both variables (X and Y) are dichotomous

– Both variables are re-coded to values 0 and 1 (or any two digits)

– The regular Pearson formulas is used to calculate r

– r2 (coefficient of determination) measures effect size (proportion of variability in one score predicted by the other)

Learning Check• Participants were classified as “morning people”

or “evening people” then measured on a 50-point conscientiousness scale. Which correlation should be used to measure the relationship?

• Pearson correlationA

• Spearman correlationB

• Point-biserial correlationC

• Phi-coefficientD

Learning Check - Answer• Participants were classified as “morning people”

or “evening people” then measured on a 50-point conscientiousness scale. Which correlation should be used to measure the relationship?

• Pearson correlationA

• Spearman correlationB

• Point-biserial correlationC

• Phi-coefficientD

Learning Check

• Decide if each of the following statements is True or False

• The Spearman correlation is used with dichotomous dataT/F

• In a non-directional significance test of a correlation, the null hypothesis states that the population correlation is zero

T/F

Learning Check - Answers

• The Spearman correlation uses ordinal (ranked) dataFalse

• Null hypothesis assumes no relationship; ρ = zero indicates no relationship in the population

True

14.6 Introduction to Linear Equations and Regression

• The Pearson correlation measures a linear relationship between two variables

• Figure 14.13 makes the relationship obvious

• The line through the data

– Makes the relationship easier to see

– Shows the central tendency of the relationship

– Can be used for prediction

• Regression analysis precisely defines the line

Figure 14.13 Regression line

Linear Equations

• General equation for a line

– Equation: Y = bX + a

– X and Y are variables

– a and b are fixed constant

Figure 14.14Linear Equation Graph

Regression

• Regression is a method of finding an equation describing the best-fitting line for a set of data

• How to define a “best fitting” straight line when there are many possible straight lines?

• The answer: a line that is the best fit for the actual data that minimizes prediction errors

Regression

• Ŷ is the value of Y predicted by the regression equation (regression line) for each value of X

• (Y- Ŷ) is the distance each data point is from the regression line: the error of prediction

• The regression procedure produces a line that minimizes total squared error of prediction

• This method is called the least-squared-error solution

Figure 14.15 Y-Ŷ Distance: Actual Data Point Minus Predicted Point

Regression Equations

• Regression line equation: Ŷ = bX + a

• The slope of the line, b, can be calculated

• The line goes through (MX,MY) thereforeX

Y

X s

srb

SS

SPb or

XY bMMa

Figure 14.16 Data Points and Regression Line: Example 14.13

Standard Error of Estimate

• Regression equation makes a prediction

• Precision of the estimate is measured by the standard error of estimate (SEoE)

SEoE =2

)ˆ( 2

n

YY

df

SSresidual

Figure 14.17 Regression Lines: Perfectly Fit vs. Example 14.13

Relationship Between Correlation and Standard Error of Estimate

• As r goes from 0 to 1, SEoE decreases to 0

• Predicted variability in Y scores:SSregression = r2 SSY

• Unpredicted variability in Y scores:SSresidual = (1 - r2) SSY

• Standard Error of Estimate based on r:

2

)1( 2

n

SSr

df

SS Yresidual

Testing Regression Significance

• Analysis of Regression

– Similar to Analysis of Variance

– Uses an F-ratio of two Mean Square values

– Each MS is a SS divided by its df

• H0: the slope of the regression line (b or beta) is zero

Mean Squares and F-ratio

residual

residualresidual

df

SSMS

regression

regression

regressiondf

SSMS

residual

regression

MS

MSF

Figure 14.18 Partitioning SSand df in Regression Analysis

Learning Check

• A linear regression has b = 3 and a = 4. What is the “predicted Y” (Ŷ) for X = 7?

• 14A

• 25B

• 31C

• Cannot be determined D

Learning Check - Answer

• A linear regression has b = 3 and a = 4. What is the predicted Y for X = 7?

• 14A

• 25B

• 31C

• Cannot be determined D

Learning Check

• Decide if each of the following statements is True or False

• It is possible for the regression equation to place none of the actual data points on the regression line

T/F

• If r = 0.58, the linear regression equation predicts about one third of the variance in the Y scores

T/F

Learning Check - Answers

• The line estimates where points should be but there are almost always prediction errors

True

• When r = .58, r2 = .336 (≈1/3) True

Figure 14.19 SPSS Output for Example 14.13

Figure 14.20 SPSS Output for Examples 14.13—14.15

Figure 14.21 Scatter Plot for Data of Demonstration 14.1

AnyQuestions

?

Concepts?

Equations?

top related