chapter 14 correlation and regression - kau...chapter 14 correlation and regression powerpoint...
Post on 13-Aug-2021
9 Views
Preview:
TRANSCRIPT
Chapter 14Correlation and Regression
PowerPoint Lecture Slides
Essentials of Statistics for the Behavioral Sciences Eighth Edition
by Frederick J. Gravetter and Larry B. Wallnau
Chapter 14 Learning Outcomes
• Understand Pearson r as measure of variables’ relationship1
• Compute Pearson r using definitional or computational formula2
• Use and interpret Pearson r; understand assumptions & limitations3
• Test hypothesis about population correlation (ρ) with sample r4
• Understand the concept of a partial correlation5
Chapter 14 Learning Outcomes (continued)
• Explain/compute Spearman correlation coefficient (ranks)6
• Explain/compute point-biserial correlation coefficient (one dichotomous variable)7
• Explain/compute phi-coefficient for two dichotomous variables8
• Explain/compute linear regression equation to predict Y values9
• Evaluate significance of regression equation 10
Tools You Will Need
• Sum of squares (SS) (Chapter 4)
– Computational formula
– Definitional formula
• z-Scores (Chapter 5)
• Hypothesis testing (Chapter 8)
• Analysis of Variance (Chapter 12)
– MS values and F-ratios
14.1 Introduction to Correlation
• Measures and describes the relationship between two variables
• Characteristics of relationships
– Direction (negative or positive; indicated by the sign, + or – of the correlation coefficient)
– Form (linear is most common)
– Strength or consistency (varies from 0 to 1)
• Characteristics are all independent
Figure 14.1 Scatterplot for Correlational Data
Figure 14.2 Positive and Negative Relationships
Figure 14.3 Different Linear Relationship Values
14.2 The Pearson Correlation
• Measures the degree and the direction of the linear relationship between two variables
• Perfect linear relationship
– Every change in X has a corresponding change in Y
– Correlation will be –1.00 or +1.00
y separatelY and X of variablity
Y and X ofity covariabilr
Sum of Products (SP)
• Similar to SS (sum of squared deviations)
• Measures the amount of covariabilitybetween two variables
• SP definitional formula:
))(( YX MYMXSP
SP – Computational formula
• Definitional formula emphasizes SP as the sum of two difference scores
• Computational formula results in easier calculations
• SP computational formula:
n
YXXYSP
Pearson Correlation Calculation
• Ratio comparing the covariability of X and Y (numerator) with the variability of X and Y separately (denominator)
YX SSSS
SPr
Figure 14.4 Example 14.3 Scatterplot
Pearson Correlation andz-Scores
• Pearson correlation formula can be expressed as a relationship of z-scores.
N
zz
n
zzr
YX
YX
:Population
1 :Sample
Learning Check• A scatterplot shows a set of data points that fit
very loosely around a line that slopes down to the right. Which of the following values would be closest to the correlation for these data?
• 0.75A
• 0.35B
• -0.75C
• -0.35D
Learning Check - Answer• A scatterplot shows a set of data points that fit
very loosely around a line that slopes down to the right. Which of the following values would be closest to the correlation for these data?
• 0.75A
• 0.35B
• -0.75C
• -0.35D
Learning Check
• Decide if each of the following statements is True or False
• A set of n = 10 pairs of X and Y scores has ΣX = ΣY = ΣXY = 20. For this set of scores, SP = –20
T/F
• If the Y variable decreases when the X variable decreases, their correlation is negative
T/F
Learning Check - Answers
20402010
)20)(20(20 SP
14.3 Using and Interpreting the Pearson Correlation
• Correlations used for:
– Prediction
– Validity
– Reliability
– Theory verification
Interpreting Correlations
• Correlation describes a relationship but does not demonstrate causation
• Establishing causation requires an experiment in which one variable is manipulated and others carefully controlled
• Example 14.4 (and Figure 14.5) demonstrates the fallacy of attributing causation after observing a correlation
Figure 14.5 Correlation: Churches and Serious Crimes
Correlations and Restricted Range of Scores
• Correlation coefficient value (size) will be affected by the range of scores in the data
• Severely restricted range may provide a very different correlation than would a broader range of scores
• To be safe, never generalize a correlation beyond the sample range of data
Figure 14.6 Restricted Score Range Influences Correlation
Correlations and Outliers
• An outlier is an extremely deviant individual in the sample
• Characterized by a much larger (or smaller) score than all the others in the sample
• In a scatter plot, the point is clearly different from all the other points
• Outliers produce a disproportionately large impact on the correlation coefficient
Figure 14.7 Outlier Influences Size of Correlation
Correlations and the Strength of the Relationship
• A correlation coefficient measures the degree of relationship on a scale from 0 to 1.00
• It is easy to mistakenly interpret this decimal number as a percent or proportion
• Correlation is not a proportion
• Squared correlation may be interpreted as the proportion of shared variability
• Squared correlation is called the coefficient of determination
Coefficient of Determination
• Coefficient of determination measures the proportion of variability in one variable that can be determined from the relationship with the other variable (shared variability)
2rionDeterminat of oefficientC
Figure 14.8 Three Amounts of Linear Relationship Example
14.4 Hypothesis Tests with the Pearson Correlation
• Pearson correlation is usually computed for sample data, but used to test hypotheses about the relationship in the population
• Population correlation shown by Greek letter rho (ρ)
• Non-directional: H0: ρ = 0 and H1: ρ ≠ 0Directional: H0: ρ ≤ 0 and H1: ρ > 0 orDirectional: H0: ρ ≥ 0 and H1: ρ < 0
Figure 14.9 Correlation in Sample vs. Population
Correlation Hypothesis Test
• Sample correlation r used to test population ρ
• Degrees of freedom (df) = n – 2
• Hypothesis test can be computed using either t or F; only t shown in this chapter
• Use t table to find critical value with df = n - 2
)2(
)1( 2
n
r
rt
In the Literature
• Report
– Whether it is statistically significant
• Concise test results
– Value of correlation
– Sample size
– p-value or level
– Type of test (one- or two-tailed)
• E.g., r = -0.76, n = 48, p < .01, two tails
Partial Correlation
• A partial correlation measures the relationship between two variables while mathematically controlling the influence of a third variable by holding it constant
)1)(1(
)(
22yzxz
yzxyxy
zxy
rr
rrrr
Figure 14.10 Controlling the Impact of a Third Variable
14.5 Alternatives to the Pearson Correlation
• Pearson correlation has been developed
– For data having linear relationships
– With data from interval or ratio measurement scales
• Other correlations have been developed
– For data having non-linear relationships
– With data from nominal or ordinal measurement scales
Spearman Correlation
• Spearman (rs) correlation formula is used with data from an ordinal scale (ranks)
– Used when both variables are measured on an ordinal scale
– Also may be used if measurement scales is interval or ratio when relationship is consistently directional but may not be linear
Figure 14.11 Consistent Nonlinear Positive Relationship
Figure 14.12 Scatterplot Showing Scores and Ranks
Ranking Tied Scores
• Tie scores need ranks for Spearman correlation
• Method for assigning rank
– List scores in order from smallest to largest
– Assign a rank to each position in the list
– When two (or more) scores are tied, compute the mean of their ranked position, and assign this mean value as the final rank for each score.
Special Formula for the Spearman Correlation
• The ranks for the scores are simply integers
• Calculations can be simplified
– Use D as the difference between the X rank and the Y rank for each individual to compute the rs
statistic
)1(
61
2
2
nn
Drs
Point-Biserial Correlation
• Measures relationship between two variables
– One variable has only two values(called a dichotomous or binomial variable)
• Effect size for independent samples t-test in Chapter 10 can be measures by r2
– Point-biserial r2 has same value as the r2
computed from t-statistic
– t-statistic tests significance of the mean difference
– r statistic measures the correlation size
Point-Biserial Correlation
• Applicable in the same situation as the independent-measures t test in Chapter 10
– Code one group 0 and the other 1 (or any two digits) as the Y score
– t-statistic evaluates the significance of mean difference
– Point-Biserial r measures correlation magnitude
– r2 quantifies effect size
Phi Coefficient
• Both variables (X and Y) are dichotomous
– Both variables are re-coded to values 0 and 1 (or any two digits)
– The regular Pearson formulas is used to calculate r
– r2 (coefficient of determination) measures effect size (proportion of variability in one score predicted by the other)
Learning Check• Participants were classified as “morning people”
or “evening people” then measured on a 50-point conscientiousness scale. Which correlation should be used to measure the relationship?
• Pearson correlationA
• Spearman correlationB
• Point-biserial correlationC
• Phi-coefficientD
Learning Check - Answer• Participants were classified as “morning people”
or “evening people” then measured on a 50-point conscientiousness scale. Which correlation should be used to measure the relationship?
• Pearson correlationA
• Spearman correlationB
• Point-biserial correlationC
• Phi-coefficientD
Learning Check
• Decide if each of the following statements is True or False
• The Spearman correlation is used with dichotomous dataT/F
• In a non-directional significance test of a correlation, the null hypothesis states that the population correlation is zero
T/F
Learning Check - Answers
• The Spearman correlation uses ordinal (ranked) dataFalse
• Null hypothesis assumes no relationship; ρ = zero indicates no relationship in the population
True
14.6 Introduction to Linear Equations and Regression
• The Pearson correlation measures a linear relationship between two variables
• Figure 14.13 makes the relationship obvious
• The line through the data
– Makes the relationship easier to see
– Shows the central tendency of the relationship
– Can be used for prediction
• Regression analysis precisely defines the line
Figure 14.13 Regression line
Linear Equations
• General equation for a line
– Equation: Y = bX + a
– X and Y are variables
– a and b are fixed constant
Figure 14.14Linear Equation Graph
Regression
• Regression is a method of finding an equation describing the best-fitting line for a set of data
• How to define a “best fitting” straight line when there are many possible straight lines?
• The answer: a line that is the best fit for the actual data that minimizes prediction errors
Regression
• Ŷ is the value of Y predicted by the regression equation (regression line) for each value of X
• (Y- Ŷ) is the distance each data point is from the regression line: the error of prediction
• The regression procedure produces a line that minimizes total squared error of prediction
• This method is called the least-squared-error solution
Figure 14.15 Y-Ŷ Distance: Actual Data Point Minus Predicted Point
Regression Equations
• Regression line equation: Ŷ = bX + a
• The slope of the line, b, can be calculated
• The line goes through (MX,MY) thereforeX
Y
X s
srb
SS
SPb or
XY bMMa
Figure 14.16 Data Points and Regression Line: Example 14.13
Standard Error of Estimate
• Regression equation makes a prediction
• Precision of the estimate is measured by the standard error of estimate (SEoE)
SEoE =2
)ˆ( 2
n
YY
df
SSresidual
Figure 14.17 Regression Lines: Perfectly Fit vs. Example 14.13
Relationship Between Correlation and Standard Error of Estimate
• As r goes from 0 to 1, SEoE decreases to 0
• Predicted variability in Y scores:SSregression = r2 SSY
• Unpredicted variability in Y scores:SSresidual = (1 - r2) SSY
• Standard Error of Estimate based on r:
2
)1( 2
n
SSr
df
SS Yresidual
Testing Regression Significance
• Analysis of Regression
– Similar to Analysis of Variance
– Uses an F-ratio of two Mean Square values
– Each MS is a SS divided by its df
• H0: the slope of the regression line (b or beta) is zero
Mean Squares and F-ratio
residual
residualresidual
df
SSMS
regression
regression
regressiondf
SSMS
residual
regression
MS
MSF
Figure 14.18 Partitioning SSand df in Regression Analysis
Learning Check
• A linear regression has b = 3 and a = 4. What is the “predicted Y” (Ŷ) for X = 7?
• 14A
• 25B
• 31C
• Cannot be determined D
Learning Check - Answer
• A linear regression has b = 3 and a = 4. What is the predicted Y for X = 7?
• 14A
• 25B
• 31C
• Cannot be determined D
Learning Check
• Decide if each of the following statements is True or False
• It is possible for the regression equation to place none of the actual data points on the regression line
T/F
• If r = 0.58, the linear regression equation predicts about one third of the variance in the Y scores
T/F
Learning Check - Answers
• The line estimates where points should be but there are almost always prediction errors
True
• When r = .58, r2 = .336 (≈1/3) True
Figure 14.19 SPSS Output for Example 14.13
Figure 14.20 SPSS Output for Examples 14.13—14.15
Figure 14.21 Scatter Plot for Data of Demonstration 14.1
AnyQuestions
?
Concepts?
Equations?
top related