unraveling the mysteries of setting standards and scaled scores julie miles phd, 10.27.2011
TRANSCRIPT
Unraveling the Mysteries of Setting Standards and Scaled ScoresJulie Miles PhD, 10.27.2011
Presentation Title runs here l 00/00/003
Overview of the Session
1. What is Standard Setting?– Basic Vocabulary– Definition– Performance Level Descriptions– Threshold Descriptions– When Does It Occur?– Methods Used in Virginia
2. The Connection to Scaled Scores– Converting Raw Scores to Scaled Scores– Example Conversion
3. From Scaled Scores to Equated Forms– How Are Scaled Scores Connected to Equating?– The Basics of Equating– Recap of How It All Comes Together
Presentation Title runs here l 00/00/004
What Is Standard Setting?
1
Presentation Title runs here l 00/00/005
What is Standard Setting?Basic Vocabulary
Content Standards: the content and skills that students are expected to know and be able to do. Performance Levels (Achievement Levels, Performance Categories): Labels for levels of student achievement (e.g., below basic, basic, proficient and advanced).Performance Level Descriptors (PLDs): Descriptions of the competencies associated with each level of achievement. Cut Scores (Performance Standards): Scores on an assessment that separate one level of achievement from another.
Presentation Title runs here l 00/00/006
What is Standard Setting?Definition
A judgmental process which has a variety of steps and includes relevant stakeholders throughout. Steps in this process typically include:
1. Identifying the relevant knowledge and skills to be taught and assessed at each grade/content area to support the goals of the state
2. Defining the expectations associated with each Performance Level3. Convening a committee of educators to provide content-based
recommendations for cut scores at each grade or subject area 4. Review of cut score recommendations and adoption by the State
Board of Education
Presentation Title runs here l 00/00/007
What is Standard Setting?Performance Level Descriptors (PLDs)
Define the knowledge, skills, and abilities (KSAs) that are expected of the students to gain entry into specific performance levels (e.g., Proficient or Advanced)
• The main goal of standard setting is to quantify or operationalize the Performance Level Descriptors.
EXAMPLE Proficient PLD: Explain the role of geography in the political, cultural, and economic development of Virginia and the United States
Presentation Title runs here l 00/00/008
What is Standard Setting?Threshold Descriptions (TDs)
Define what students who are “just over the threshold” in a performance level (e.g., a student scoring a 400 or 401 or 500 or 501) should be able to demonstrate in terms of KSAs.
• These are the borderline or minimally qualified students in terms of performance
EXAMPLE Proficient PLD: Explain the role of geography in the political, cultural, and economic development of Virginia and the United States
EXAMPLE “Just-Barely” Proficient TD: Identify and explain major geographic features on maps. Interpret charts based on background geographic information.
Presentation Title runs here l 00/00/009
What is Standard Setting?When Does It Occur?
Design and Implementation of Revised SOL Tests
YEAR TWO
YEAR ONE
YEAR THREE
Report Assessment
Results
Standard Setting Meeting
Score Operational/Field Test
Items
Spring 2011 SOL Administration
New Item Development
SOL Test Form Development
(First operational assessments aligned to new Curriculum)
Field-test Item Analysis and Review
Spring 2010 SOL Test Administration
(aligned to old curriculum)
Embedded Field-Testing of New SOL
items
New Item Content Review
Revision of Content Standards
Revise Curriculum Frameworks
Develop Item and Test
Specifications
Presentation Title runs here l 00/00/0010
What is Standard Setting?Methods Used in Virginia
Virginia predominantly uses “Modified Angoff” (SOL and VMAST),“Body of Work” (VAAP), and “Reasoned Judgment” (VGLA) methods. All methods typically have similar components:
1. Overview of standard setting2. Review of test blueprint and performance level descriptions3. Creation of the threshold descriptions4. Overview of actual test administered to students5. Three rounds of judgments by committee:
• MC Tests: should a ‘just-barely’ student get the item correct 2 out of 3 times?
• VGLA: how many points should a ‘just-barely’ student earn on this SOL?
• VAAP: which performance level does a COE represents?6. Final round results in cut score recommendations that are provided to the
SBOE.• The number of correct answers needed to gain entry into each
performance level.
Presentation Title runs here l 00/00/0011
The Connection to Scaled Scores
2
Presentation Title runs here l 00/00/0012
The Connection to Scaled ScoresConverting Raw Scores to Scaled Scores
The recommendations for a cut score from standard setting are in a raw score metric. But this is not helpful from year-to-year.
• Student ability is different from student to student• Test forms change from year-to-year (and within year)
– A raw score of 36 on a slightly easier test does not indicate the same level of achievement as a raw score of 36 on a slightly more difficult test.
Need a metric that is stable from year-to-year!• This is where I earn my keep • The metric is based on item response theory (IRT) and it is called
“theta.” This theta value (associated with raw score) is converted to a scaled score that remains stable from year-to-year so that 400 is comparable to 400 regardless of the student, year, or form.
Presentation Title runs here l 00/00/0013
The Connection to Scaled ScoresExample Conversion to Scaled Scores
Algebra II
where θa is the value of theta (2.616) corresponding to the raw score (45) at the pass/advanced level and θp is the value of theta (.6416) corresponding to the raw score (30) at the pass/proficient level.
Solving for a yields:
And substituting the values of theta corresponding to the raw score cuts gives:
Solving for b yields:
And substituting the values of θp and a gives
ba
ba
p
a
400
500 pa
a
100
659.506416.6157.2
100
a
pab 400
497.3676416.659.50400 b
999.399497.367)6416(.659.50Score Scaled
Presentation Title runs here l 00/00/0014
From Scaled Scores to Equated Forms
3
Presentation Title runs here l 00/00/0015
From Scaled Scores to Equated FormsHow are Scaled Scores Connected to Equating?
• When a test is built, the item difficulties (in the Rasch metric) are known from the field test statistical analyses.
• The tests are built to Rasch difficulty targets for the overall test and all reporting categories based on the standard setting form.
• Even though an attempt is made to construct test forms of equal Rasch-based difficulty from form to form and year to year, there will be small variations in difficulty.
• When building tests, the IRT model makes it possible to estimate the raw score that corresponds to a scale score of 400.
• Each core form of a test is equated to the established scale so that the scores indicate the same level of achievement regardless of the core form taken.
Presentation Title runs here l 00/00/0016
From Scaled Scores to Equated FormsThe Basics of Equating
Common-Item Nonequivalent Groups Design
The common-item set is constructed as a “mini version” of the total test.
Year 1 Year 2
Test X Test Y
Item C1 Common Item C1
Item … Items Item …
Item C10 Item C10
Item X1 Item Y1
Item X2 Item Y2
Item … Item …
Item X50 Item Y50
Presentation Title runs here l 00/00/0017
From Scaled Scores to Equated FormsThe Basics of Equating
Year 1 (more difficult) Year 2 (less difficult)
b Test X Test Y b
Mean b = 0.5
-1.0 Item C1 Common Items Item C1 -1.3 Mean b = 0.2
… Item … Difference = Item … …
0.8 Item C10 0.5 - 0.2 = 0.3 Item C10 0.5
… Item X1 Item Y1 -1.5
… Item X2 Item Y2 -0.6
… Item … Item … …
… Item X50 Item Y50 1.3
Presentation Title runs here l 00/00/0018
Recap of How It All Comes Together
Scores
Test is Equated
Test is Scaled
Cut Scores are adopted by SBOE
Cut Scores are Recommended
Test Is Developed
Questions?