spss workshop

SPSS Workshop

Research Support CenterChongming Yang

Causal Inference

• If A, then B, under condition C

• If A, 95% Probability B, under condition C

Student T Test(William S. Gossett’s pen name = student)

• Assumptions– Small Sample – Normally Distributed

• t distributions: t = [ x - μ ] / [ s / sqrt( n ) ]

df = degrees of freedom=number of independent observations

Type of T Tests

• One sample – test against a specific (population) mean

• Two independent samples – compare means of two independent samples that

represent two populations

• Paired – compare means of repeated samples

One Sample T Test

• Conceputally convert sample mean to t score and examine if t falls within acceptable region of distribution

x ut

s

n

Two Independent Samples

1 2

2 21 1 2 2

1 2 1 2

( 1) ( 1) 1 1( )

2

x xt

n s n sn n n n

Paired Observation Samples • d = difference value between first and second

observations

d

dt

S

n

Multiple Group Issues

• Groups A B C comparisons – AB AC BC – .95 .95 .95

• Joint Probability that one differs from another – .95*.95*.95 = .91

Analysis of Variance(ANOVA)

• Completely randomized groups • Compare group variances to infer group mean

difference• Sources of Total Variance– Within Groups– Between Groups

• F distribution– SSB = between groups sum squares– SSW = within groups sum squares

1

2

SSB

dfF

SSW

df

Fisher-Snedecor Distribution

F Test

• Null hypothesis: • Given df1 and df2, and F value, • Determine if corresponding probability is

within acceptable distribution region

Issues of ANOVA

• Indicates some group difference• Does not reveal which two groups differ• Needs other tests to identify specific group

difference– Hypothetical comparisons Contrast – No Hypothetical comparisons Post Hoc

• ANOVA has been replaced by multiple regressions, which can also be replaced by General Linear Modeling (GLM)

Multiple Linear Regression

• Causes cab be continuous or categorical • Effect is continuous measure

• Mild causal terms predictors• Objective identify important

0 1 1 2 2 3 3... k ky x x x x

Assumptions of Linear Regression

• Y and X have linear relations • Y is continuous or interval & unbounded• expected or mean of = 0 • = normally distributed

not correlated with predictors• Predictors should not be highly correlated• No measurement error in all variables

Least Squares Solution

• Choose to minimize the sum of square of difference between observed and model estimated/predicted

• Through solving many equations

2ˆ( )i iy y

Explained Variance in

22 2

22

2

( )ˆ( )

( )

ii i i

ii

yy y y

nRy

yn

Standard Error of

2

2 2

ˆ( ) 1

1 ( ) (1 )i i

yi yiSE

n k x x R

T Test significant of

• t = / SE• If t > a critical value & p <.05 • Then is significantly different from zero

Confidence Intervals of

Standardized Coefficient()

• Make s comparable among variables on the same scale (standardized scores)

stdxeta

stdy

Interpretation of

• If x increases one unit, y increases unit, given other values of X

Model Comparisons

• Complete Model:

• Reduced Model:

• Test F = Msdrop / MSE– MS = mean square– MSE = mean square error

0 1 1 2 2 3 3... k ky x x x x

0 1 1 2 2... g gy x x x

Variable Selection

• Select significant from a pool of predictors• Stepwise undesirable, see http://

en.wikipedia.org/wiki/Stepwise_regression

• Forward • Backward (preferable)

http://en.wikipedia.org/wiki/Stepwise_regression

http://en.wikipedia.org/wiki/Stepwise_regression

Dummy-coding of Nominal • R = Race(1=white, 2=Black, 3=Hispanic, 4=Others)

R d1 d2 d3 1 1 0 0 1 1 0 0 2 0 1 0 2 0 1 0 3 0 0 1 3 0 0 1 4 0 0 0 4 0 0 0

• Include all dummy variables in the model, even if not every one is significant.

Interaction

• Create a product term X2X3

• Include X2 and X3 even effects are not significant

• Interpret interaction effect: X2 effect depends on the level of X3.

0 1 1 2 2 3 3 4 2 3... k ky x x x x x x

Plotting Interaction

• Write out model with main and interaction effects,

• Use standardized coefficient• Plug in some plausible numbers of interacting

variables and calculate y• Use one X for X dimension and Y value for the Y

dimension• See examples http://

frank.itlab.us/datamodel/node104.html

http://frank.itlab.us/datamodel/node104.html



Diagnostic

• Linear relation of predicted and observed (plotting

• Collinearity • Outliers• Normality of residuals (save residual as new

variable)

Repeated Measures (MANOVA, GLM)

• Measure(s) repeated over time • Change in individual cases (within)?• Group differences (between, categorical x)?• Covariates effects (continuous x)? • Interaction between within and between

variables?

Assumptions

• Normality• Sphericity: Variances are equal across groups

so that • Total sum of squares can be partitioned more

precisely into – Within subjects– Between subjects– Error

Model

• = grand mean• = constant of individual i• = constant of jth treatment• = error of i under treatment j• = interaction

ij i j ij ijy

F Test of Effects

• F = MSbetween / Mswithin (simple repeated)

• F = Mstreatment / Mserror (with treatment)

• F = Mswithin / Msinteraction (with interaction)

Four Types Sum-Squares

• Type I balanced design• Type II adjusting for other effects • Type III no empty cell unbalanced design• Type VI empty cells

Exercise

• http://www.ats.ucla.edu/stat/spss/seminars/Repeated_Measures/default.htm

• Copy data to spss syntax window, select and run

• Run Repeated measures GLM

http://www.ats.ucla.edu/stat/spss/seminars/Repeated_Measures/default.htm




spss workshop

Documents