chapter 14 correlation and regression - kau...chapter 14 correlation and regression powerpoint...

Chapter 14Correlation and Regression

PowerPoint Lecture Slides

Essentials of Statistics for the Behavioral Sciences Eighth Edition

by Frederick J. Gravetter and Larry B. Wallnau

Chapter 14 Learning Outcomes

• Understand Pearson r as measure of variables’ relationship1

• Compute Pearson r using definitional or computational formula2

• Use and interpret Pearson r; understand assumptions & limitations3

• Test hypothesis about population correlation (ρ) with sample r4

• Understand the concept of a partial correlation5

Chapter 14 Learning Outcomes (continued)

• Explain/compute Spearman correlation coefficient (ranks)6

• Explain/compute point-biserial correlation coefficient (one dichotomous variable)7

• Explain/compute phi-coefficient for two dichotomous variables8

• Explain/compute linear regression equation to predict Y values9

• Evaluate significance of regression equation 10

Tools You Will Need

• Sum of squares (SS) (Chapter 4)

– Computational formula

– Definitional formula

• z-Scores (Chapter 5)

• Hypothesis testing (Chapter 8)

• Analysis of Variance (Chapter 12)

– MS values and F-ratios

14.1 Introduction to Correlation

• Measures and describes the relationship between two variables

• Characteristics of relationships

– Direction (negative or positive; indicated by the sign, + or – of the correlation coefficient)

– Form (linear is most common)

– Strength or consistency (varies from 0 to 1)

• Characteristics are all independent

Figure 14.1 Scatterplot for Correlational Data

Figure 14.2 Positive and Negative Relationships

Figure 14.3 Different Linear Relationship Values

14.2 The Pearson Correlation

• Measures the degree and the direction of the linear relationship between two variables

• Perfect linear relationship

– Every change in X has a corresponding change in Y

– Correlation will be –1.00 or +1.00

y separatelY and X of variablity

Y and X ofity covariabilr

Sum of Products (SP)

• Similar to SS (sum of squared deviations)

• Measures the amount of covariabilitybetween two variables

• SP definitional formula:

))(( YX MYMXSP

SP – Computational formula

• Definitional formula emphasizes SP as the sum of two difference scores

• Computational formula results in easier calculations

• SP computational formula:

YXXYSP

Pearson Correlation Calculation

• Ratio comparing the covariability of X and Y (numerator) with the variability of X and Y separately (denominator)

YX SSSS

Figure 14.4 Example 14.3 Scatterplot

Pearson Correlation andz-Scores

• Pearson correlation formula can be expressed as a relationship of z-scores.

:Population

1 :Sample

Learning Check• A scatterplot shows a set of data points that fit

very loosely around a line that slopes down to the right. Which of the following values would be closest to the correlation for these data?

• 0.75A

• 0.35B

• -0.75C

• -0.35D

Learning Check - Answer• A scatterplot shows a set of data points that fit

very loosely around a line that slopes down to the right. Which of the following values would be closest to the correlation for these data?

• 0.75A

• 0.35B

• -0.75C

• -0.35D

Learning Check

• Decide if each of the following statements is True or False

• A set of n = 10 pairs of X and Y scores has ΣX = ΣY = ΣXY = 20. For this set of scores, SP = –20

• If the Y variable decreases when the X variable decreases, their correlation is negative

Learning Check - Answers

20402010

)20)(20(20 SP

14.3 Using and Interpreting the Pearson Correlation

• Correlations used for:

– Prediction

– Validity

– Reliability

– Theory verification

Interpreting Correlations

• Correlation describes a relationship but does not demonstrate causation

• Establishing causation requires an experiment in which one variable is manipulated and others carefully controlled

• Example 14.4 (and Figure 14.5) demonstrates the fallacy of attributing causation after observing a correlation

Figure 14.5 Correlation: Churches and Serious Crimes

Correlations and Restricted Range of Scores

• Correlation coefficient value (size) will be affected by the range of scores in the data

• Severely restricted range may provide a very different correlation than would a broader range of scores

• To be safe, never generalize a correlation beyond the sample range of data

Figure 14.6 Restricted Score Range Influences Correlation

Correlations and Outliers

• An outlier is an extremely deviant individual in the sample

• Characterized by a much larger (or smaller) score than all the others in the sample

• In a scatter plot, the point is clearly different from all the other points

• Outliers produce a disproportionately large impact on the correlation coefficient

Figure 14.7 Outlier Influences Size of Correlation

Correlations and the Strength of the Relationship

• A correlation coefficient measures the degree of relationship on a scale from 0 to 1.00

• It is easy to mistakenly interpret this decimal number as a percent or proportion

• Correlation is not a proportion

• Squared correlation may be interpreted as the proportion of shared variability

• Squared correlation is called the coefficient of determination

Coefficient of Determination

• Coefficient of determination measures the proportion of variability in one variable that can be determined from the relationship with the other variable (shared variability)

2rionDeterminat of oefficientC

Figure 14.8 Three Amounts of Linear Relationship Example

14.4 Hypothesis Tests with the Pearson Correlation

• Pearson correlation is usually computed for sample data, but used to test hypotheses about the relationship in the population

• Population correlation shown by Greek letter rho (ρ)

• Non-directional: H0: ρ = 0 and H1: ρ ≠ 0Directional: H0: ρ ≤ 0 and H1: ρ > 0 orDirectional: H0: ρ ≥ 0 and H1: ρ < 0

Figure 14.9 Correlation in Sample vs. Population

Correlation Hypothesis Test

• Sample correlation r used to test population ρ

• Degrees of freedom (df) = n – 2

• Hypothesis test can be computed using either t or F; only t shown in this chapter

• Use t table to find critical value with df = n - 2

In the Literature

• Report

– Whether it is statistically significant

• Concise test results

– Value of correlation

– Sample size

– p-value or level

– Type of test (one- or two-tailed)

• E.g., r = -0.76, n = 48, p < .01, two tails

Partial Correlation

• A partial correlation measures the relationship between two variables while mathematically controlling the influence of a third variable by holding it constant

)1)(1(

22yzxz

yzxyxy

Figure 14.10 Controlling the Impact of a Third Variable

14.5 Alternatives to the Pearson Correlation

• Pearson correlation has been developed

– For data having linear relationships

– With data from interval or ratio measurement scales

• Other correlations have been developed

– For data having non-linear relationships

– With data from nominal or ordinal measurement scales

Spearman Correlation

• Spearman (rs) correlation formula is used with data from an ordinal scale (ranks)

– Used when both variables are measured on an ordinal scale

– Also may be used if measurement scales is interval or ratio when relationship is consistently directional but may not be linear

Figure 14.11 Consistent Nonlinear Positive Relationship

Figure 14.12 Scatterplot Showing Scores and Ranks

Ranking Tied Scores

• Tie scores need ranks for Spearman correlation

• Method for assigning rank

– List scores in order from smallest to largest

– Assign a rank to each position in the list

– When two (or more) scores are tied, compute the mean of their ranked position, and assign this mean value as the final rank for each score.

Special Formula for the Spearman Correlation

• The ranks for the scores are simply integers

• Calculations can be simplified

– Use D as the difference between the X rank and the Y rank for each individual to compute the rs

statistic

Point-Biserial Correlation

• Measures relationship between two variables

– One variable has only two values(called a dichotomous or binomial variable)

• Effect size for independent samples t-test in Chapter 10 can be measures by r2

– Point-biserial r2 has same value as the r2

computed from t-statistic

– t-statistic tests significance of the mean difference

– r statistic measures the correlation size

Point-Biserial Correlation

• Applicable in the same situation as the independent-measures t test in Chapter 10

– Code one group 0 and the other 1 (or any two digits) as the Y score

– t-statistic evaluates the significance of mean difference

– Point-Biserial r measures correlation magnitude

– r2 quantifies effect size

Phi Coefficient

• Both variables (X and Y) are dichotomous

– Both variables are re-coded to values 0 and 1 (or any two digits)

– The regular Pearson formulas is used to calculate r

– r2 (coefficient of determination) measures effect size (proportion of variability in one score predicted by the other)

Learning Check• Participants were classified as “morning people”

or “evening people” then measured on a 50-point conscientiousness scale. Which correlation should be used to measure the relationship?

• Pearson correlationA

• Spearman correlationB

• Point-biserial correlationC

• Phi-coefficientD

Learning Check - Answer• Participants were classified as “morning people”

or “evening people” then measured on a 50-point conscientiousness scale. Which correlation should be used to measure the relationship?

• Pearson correlationA

• Spearman correlationB

• Point-biserial correlationC

• Phi-coefficientD

Learning Check

• The Spearman correlation is used with dichotomous dataT/F

• In a non-directional significance test of a correlation, the null hypothesis states that the population correlation is zero

• The Spearman correlation uses ordinal (ranked) dataFalse

• Null hypothesis assumes no relationship; ρ = zero indicates no relationship in the population

14.6 Introduction to Linear Equations and Regression

• The Pearson correlation measures a linear relationship between two variables

• Figure 14.13 makes the relationship obvious

• The line through the data

– Makes the relationship easier to see

– Shows the central tendency of the relationship

– Can be used for prediction

• Regression analysis precisely defines the line

Figure 14.13 Regression line

Linear Equations

• General equation for a line

– Equation: Y = bX + a

– X and Y are variables

– a and b are fixed constant

Figure 14.14Linear Equation Graph

Regression

• Regression is a method of finding an equation describing the best-fitting line for a set of data

• How to define a “best fitting” straight line when there are many possible straight lines?

• The answer: a line that is the best fit for the actual data that minimizes prediction errors

Regression

• Ŷ is the value of Y predicted by the regression equation (regression line) for each value of X

• (Y- Ŷ) is the distance each data point is from the regression line: the error of prediction

• The regression procedure produces a line that minimizes total squared error of prediction

• This method is called the least-squared-error solution

Figure 14.15 Y-Ŷ Distance: Actual Data Point Minus Predicted Point

Regression Equations

• Regression line equation: Ŷ = bX + a

• The slope of the line, b, can be calculated

• The line goes through (MX,MY) thereforeX

SPb or

XY bMMa

Figure 14.16 Data Points and Regression Line: Example 14.13

Standard Error of Estimate

• Regression equation makes a prediction

• Precision of the estimate is measured by the standard error of estimate (SEoE)

SEoE =2

)ˆ( 2

SSresidual

Figure 14.17 Regression Lines: Perfectly Fit vs. Example 14.13

Relationship Between Correlation and Standard Error of Estimate

• As r goes from 0 to 1, SEoE decreases to 0

• Predicted variability in Y scores:SSregression = r2 SSY

• Unpredicted variability in Y scores:SSresidual = (1 - r2) SSY

• Standard Error of Estimate based on r:

SS Yresidual

Testing Regression Significance

• Analysis of Regression

– Similar to Analysis of Variance

– Uses an F-ratio of two Mean Square values

– Each MS is a SS divided by its df

• H0: the slope of the regression line (b or beta) is zero

Mean Squares and F-ratio

residual

residualresidual

regression

regressiondf

residual

regression

Figure 14.18 Partitioning SSand df in Regression Analysis

Learning Check

• A linear regression has b = 3 and a = 4. What is the “predicted Y” (Ŷ) for X = 7?

• 14A

• 25B

• 31C

• Cannot be determined D

Learning Check - Answer

• A linear regression has b = 3 and a = 4. What is the predicted Y for X = 7?

• 14A

• 25B

• 31C

• Cannot be determined D

Learning Check

• It is possible for the regression equation to place none of the actual data points on the regression line

• If r = 0.58, the linear regression equation predicts about one third of the variance in the Y scores

• The line estimates where points should be but there are almost always prediction errors

• When r = .58, r2 = .336 (≈1/3) True

Figure 14.19 SPSS Output for Example 14.13

Figure 14.20 SPSS Output for Examples 14.13—14.15

Figure 14.21 Scatter Plot for Data of Demonstration 14.1

AnyQuestions

Concepts?

Equations?

chapter 14 correlation and regression - kau...chapter 14 correlation and regression powerpoint...

Documents

第十章线性相关与回归 ( linear correlation & ...

linear model | spring 2013 chapter 4. multiple regression...

chapter 1 회귀분석 -...

principles of biostatistics chapter 17 correlation

chapter 15 regression analysis...

copyright © 2011 pearson education, inc. regression...

chapter 12. logistic regression - welcome to health...

chapter 4 다중회귀 -...

chapter 6 multiple regression i - national university of

correlation, reliability and regression chapter 7

chapter 2 · 2020. 11. 22. · chapter 2 regression with...

multiple linear regression model - iit...

chapter 5 · chapter 5 panel regression model dr. woraphon...

chapter 6 regression analysis under linear...

chapter 9hosting03.snu.ac.kr/~hokim/int/2014/chap9.pdf ·...

applied econometrics using matlab chapter 4 regression ...

inferential statistics - subhasish...

correlation &...

regression analysis -...

chapter 11 simple linear regression analysis (...