Comparison of Reliability Measures under Factor Analysis and Item
Response Theory
—Ying Cheng,Ke-Hai Yuan, and Cheng Liu
Presented by Zhu Jinxin
Outline of the Presentation• Introduction of four reliability
coefficients: a, w, p, and r• The relationship among them• Conclusion and discussion
Cronbach’s alpha
• One of the definitions is
• K is the number of components (items or testlets)• sX
2 is the variance of the observed total test scores,
• sYi2 is the variance of component i for the current
sample of persons.
Cronbach’s alpha’s feature
• It is most widely used• Raw sum score is used• a may underestimates reliability
at population level, when the assumption of essential tau-equivalency is violated
about Tau-equivalency
In this case, the reliability is underestimated by , a which is only a lower-bound estimate of the true reliability of scale when measures are congeneric .
& w r in congeneric measuresin Single-factor model
Variance of true score
Variance of unweighted composite score
feature of w
1.It neglects that people with the same sum score can have completely deferent response patterns. 2.w≧ , a when
Reliability in IRT• The variance of the MLE is (approximately) given by
the inverse of the information• The variance of q is 1 in MLE, in which
• The study use information in a broader sense by equating it with the inverse of a variance even when the parameter estimate is not an MLE
• so
Reliability in IRT• With a single parameter, I, the information is
defined as the negative expected value of the second derivative of the log likelihood function.
• The IRT models directly relate the discrete responses to an underlying latent factor.
• When q is normally distributed, the normal ogive IRT models are equivalent to the item factor analysis model.
Reliability in IRT• For binary response The information is defined as the negative
expected value of the second derivative of the log likelihood function:
For each item
For test
Reliability in IRT• For binary response the reliability is
and (the deduction is put in the appedix)
Reliability in IRT• For response of ordered categories, supposing the
continuous response to item j is discretized by g threshold.
• The information of jth item is given by
The relationship
• r≧w≧a• • It is expected that
• There is no dominant relationship between p(2)
• Simulation demonstrated that, as the number of response increase, p can exceed w in practice.
Conclusion
• Keep as many many response categories as possible and use ML factor score.
• However, after having a certain number of response options, it may not be worth adding more.
Discussion
• Only graded response (order categories) models is studied. (comparing to other types polytomous IRT models)
• Only unidimensional models are studied.