ch 8 residual analysistopost
TRANSCRIPT
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 1/20
Residual AnalysisChapter 8
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 2/20
Model Assumptions Independence (response variables yi are
independent)- this is a design issue
Normality (response variables are normallydistributed)
Homoscedasticity (the response variables
have the same variance)
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 3/20
Best way to chec assumptions! chec the
assumptions on the random errors "hey are independent
"hey are normally distributed
"hey have a constant variance σ2 for all settings of the
independent variables (Homoscedasticity)
"hey have a zero mean.
I# these assumptions are satis#ied$ we may use the normal density
as the woring appro%imation #or the random component& 'o$the residuals are distributed as!
i N(*$+,)
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 4/20
lotting Residuals "o chec #or homoscedasticity (constant variance)!
roduce a scatterplot o# the standardi.ed residuals against
the #itted values&
roduce a scatterplot o# the standardi.ed residuals against
each o# the independent variables&
I# assumptions are satis#ied$ residuals should vary
randomly around .ero and the spread o# the residualsshould be about the same throughout the plot (no
systematic patterns&)
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 5/20
Homoscedasticity is probably violated i#/
"he residuals seem to increase or decrease inaverage magnitude with the #itted values$ it is
an indication that the variance o# the residualsis not constant&
"he points in the plot lie on a curve around
.ero$ rather than #luctuating randomly& A #ew points in the plot lie a long way #rom
the rest o# the points&
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 6/20
Heteroscedasticity Not #atal to an analysis0 the analysis is
weaened$ not invalidated& 1etected with scatterplots and recti#ied through
trans#ormation&
http!22www&p#c&c#s&nrcan&gc&ca2pro#iles2wulder2mvstats2trans#orm3e&html
http!22www&ru#&rice&edu2lane2stat3sim2trans#ormations2inde%&html
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 7/20
4autions with trans#ormations 1i##iculty o# interpretation o# trans#ormed
variables&
"he scale o# the data in#luences the utility o#trans#ormations& I# a scale is arbitrary$ a trans#ormation can be more
e##ective
I# a scale is meaning#ul$ the di##iculty o# interpretationincreases
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 8/20
Normality "he random errors can be regarded as a random
sample #rom a N(*$+,) distribution$ so we can chec
this assumption by checing whether the residuals
might have come #rom a normal distribution&
5e should loo at the standardi.ed residuals
6ptions #or looing at distribution!
Histogram$ 'tem and lea# plot$ Normal plot o# residuals
http!22statmaster&sdu&d2courses2st7772module*82inde%&html
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 9/20
Histogram or 'tem and 9ea# lot 1oes distribution o# residuals appro%imate a
normal distribution:
Regression is robust with respect to
nonnormal errors (in#erences typically still
valid unless the errors come #rom a highlysewed distribution)
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 10/20
Normal lot o# Residuals A normal probability plot is #ound by plotting the
residuals o# the observed sample against the
corresponding residuals o# a standard normal
distribution N (*$7) I# the plot shows a straight line$ it is reasonable to
assume that the observed sample comes #rom a normal
distribution&
I# the points deviate a lot #rom a straight line$ there is
evidence against the assumption that the random errors
are an independent sample #rom a normal distribution& http!22www&symar&com2resources2tools2normal3test3plot&asp
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 11/20
!;ariance o# the random error ε I# this value e<uals zero
all random errors e<ual *
rediction e<uation ( ŷ) will be e<ual to mean value E(y) I# this value is large
9arge (absolute values) o# ε
9arger deviations between ŷ and the mean value E(y).
"he larger the value o# $ the greater the error in estimating
the model parameters and the error in predicting a value o#
y #or speci#ic values o# x&
,σ
,σ
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 12/20
;ariance is estimated with (M'=)
"he units o# the estimated variance are squared units o# the dependent variable y.
>or a more meaning#ul measure o# variability$we use s or Root M'=.
"he interval ?, s provides a rough estimationwith which the model will predict #uturevalues o# y #or given values o# %&
, s
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 13/20
5hy do residual analysis: >ollowing any modeling procedure$ it is a
good idea to assess the validity o# your model&
Residuals and diagnostic statistics allow youto identi#y patterns that are either poorly #it bythe model$ have a strong in#luence upon theestimated parameters$ or which have a high
leverage& It is help#ul to interpret thesediagnostics @ointly to understand any potential problems with the model&
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 14/20
6utliers 5hat!
An observation with a residual that is larger than s or astandardi.ed residual larger than (absolute value)
5hy! A data entry or recording error$ 'ewness o# the
distribution$ 4hance$ nassignable causes
"hen what:
=liminate: 4orrect: Analy.e them: How much in#luence do they have: How do I now: Minitab 'torage! 1iagnostics (9everages$ 4ooCs
1istance$ 1>>I"')
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 15/20
9everages- p& 8* Identi#ies observations with unusual or outlying x-
values&
9everages #all between * and 7& A value greater than,(p2n) is large enough to suggest you should
e%amine the corresponding observation& p! number o# predictors (including constant)
n! number o# observations Minitab identi#ies observations with leverage over
(p2n) with an D in the table o# unusual observations
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 16/20
4ooCs 1istance (1)- p& 8*E An overall measure o# the combined impact o# each
observation on the #itted values& 4alculated using
leverage values and standardi.ed residuals&
4onsiders whether an observation is unusual with
respect to both x- and y-values&
6bservations with large 1 values may be outliers&
4ompare 1 to >-distribution with (p$ n-p) degrees o##reedom& 1etermine corresponding percentile& 9ess than ,*F- little in#luence
Greater than E*F- ma@or in#luence
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 17/20
1>>I"'- p& 8* "he di##erence between the predicted value when all
observations are included and when the ith observation isdeleted&
4ombines leverage and 'tudenti.ed residual (deleted tresiduals) into one overall measure o# how unusual anobservation is
Represent roughly the number o# standard deviations that the#itted value changes when each case is removed #rom the dataset& 6bservations with 1>>I"' values greater than , times
the s<uare root o# (p2n) are considered large and should bee%amined& p is number o# predictors (including the constant) n is the number o# observations
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 18/20
'ummary o# what to loo #or 'tart with the plot and brush outliers
9oo #or values that stand out in diagnostic measures
Rules o# thumb 9everages (HI)! values greater than ,(p2n)
4ooCs 1istance! values greater than E*F o# comparable >
(p$ n-p) distribution
1>>I"'! values greater than n
p
,
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 19/20
9etCs do an e%ample
Returning to the Naval Base data set
7/23/2019 Ch 8 Residual Analysistopost
http://slidepdf.com/reader/full/ch-8-residual-analysistopost 20/20
>inally$ are the residuals correlated:
1urbin-5atson d statistic (p& 87E)
Range! * d 8
ncorrelated! d is close to , ositively correlated! d is closer to .ero
Negatively correlated! d is closer to 8&