resnickbergersystemmodel

Upload: gothamschoolsorg

Post on 29-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 ResnickBergerSystemModel

    1/60

    Created by Educational Testing Service (ETS) to forward a larger social mission, the Center for K 12 Assessment &

    Performance Management has been given the directive to serve as an independent catalyst and resource for the

    improvement of measurement and data systems to enhance student achievement.

    pyright 2010 Wireless Generation, Inc. and Institute for Learning.. All rights reserved. No reproduction, use or distribution of any part of this material without the specific authorization of Educational Testing Service. 1

    An American

    Examination System

    Lauren B. Resnick and Larry Berger

  • 8/8/2019 ResnickBergerSystemModel

    2/60

  • 8/8/2019 ResnickBergerSystemModel

    3/60

  • 8/8/2019 ResnickBergerSystemModel

    4/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    4

    The Problem

    Overthepasttwodecades,ourcountryhasbeentryingtobuildastandardsbasedaccountability

    systemasafoundationforamoreequitableandhigherachievingeducationsystem.Inpractice,

    however,we

    have

    created

    a

    test

    based

    accountability

    system

    that

    does

    not

    reflect

    the

    standards

    we

    aimedforatthebeginningofthe1990s,muchlesstodaysfewer,clearer,higherCommonCore

    Standards.

    Severalstudies,usingseveraldifferentmethodologies,haveshownthatthestatetestsdonotmeasure

    thehigherorderthinking,problemsolving,andcreativityneededforstudentstosucceedinthe21st

    century.Thesetests,withonlyafewexceptions,systematicallyoverrepresentbasicskillsand

    knowledgeandomitthecomplexknowledgeandreasoningweareseekingforcollegeandcareer

    readiness.2

    Themisrepresentationofstandardsbymostcurrentaccountabilitytestshashadnegativeeffectson

    teaching

    and

    learning,

    especially

    for

    poor

    and

    minority

    students.

    The

    tests

    carry

    consequences,

    and

    manyeducatorsservingpoorstudentsaimtoraisetestscoresinthemostdirectinsomecases,the

    onlywaytheyknow:Theyprovidepracticeonexercisesthatsubstantiallymatchtheformatand

    contentoftheirstatesendofyearaccountabilitytests.Theseexercisesoftendepartsubstantiallyfrom

    bestinstructionalpractice.Somestudieshavedocumentedasystematicdeclinefromfalltospringinthe

    qualityofinstruction.Inreading,forexample,thecomplexityoftextsthatstudentsengagewithis

    lowerinthesameclassroomswiththesamechildreninMarchthaninOctober.Andthereisless

    discussionoftextandwordmeaningasteachersdirectchildrenthroughworkbookexercisesthatmimic

    statetestitems(Anagnostopoulos,2003;Koretz&Hamilton,2006;McNeill,2002).Principalsanddistrict

    administratorsencouragethispractice.Theyintroduceinterimassessmentsthatlargelymirrortheend

    ofyear

    tests

    rather

    than

    model

    the

    kinds

    of

    performance

    intended

    by

    the

    standards.

    They

    do

    this

    becausethetestscount,andtheyareafraidthatwithoutpractice,studentswillnotdowellenoughto

    meetadequateyearlyprogress(AYP)requirements.

    Callsnowaboundforevenmorefrequenttestingandforfocusingteachersattentionearlyandoftenon

    whichitemstheirstudentsarehavingdifficultyansweringontheinterimassessments.Butunlessthe

    processisguidedbyafundamentalunderstandingofwhatkindofteachinghelpschildrenacquire

    robustcompetence,weshouldnotbesurprisedwhenthemostfrequentresponsetoweakearlytest

    scoresistopracticethetest.Thoughnooneintendedtodoso,wehavecreatedatestingbindthat,asit

    tightens,drivesattentionawayfromtheintendedstandards.Theeffectsaregreatestinthepoorest

    schools.Thenationscurrentapproachtoraisingachievementandincreasingequityintheeducation

    systemis

    having

    an

    effect

    opposite

    from

    the

    intended

    one.

    It

    is

    trapping

    poor

    children

    in

    a

    basic

    skills

    teachingprogramthatgivesthemlittlechancetoacquirethedeeperknowledgeandabilitiesweseek

    foreveryone.Anditmaybeloweringthelearningopportunitiesevenformanymoreprivilegedchildren

    asschoolsturntheirenergiestothetestbasedbasicskillsprogram.

    2Theproblemcannotbefixedbychangingcutscoressothatstatesnolongerdeemasbeingproficienttest

    performancesthatbarelymeetNAEPstandardsforbasiclevelsofachievement.Thetestsarefundamentally

    misalignedwith21st

    centuryexpectations.Forananalysis,seeResnick,Stein,andCoon(2008).

  • 8/8/2019 ResnickBergerSystemModel

    5/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    5

    Manyeducators,parents,andcitizenshaverespondedbyclamoringforanendtotestbased

    accountability.WitnesstheonesidedreactiontoarecenteditorialintheNewYorkTimeswrittenby

    SusanEngel(2010)callingforlesstestingandmoreplay(andbyimplication,lessdirectinstruction)for

    children.Astreamofsupportivecommentarybyreadersensuedbutnoneexpressingconcernabout

    howto

    educate

    poor

    children,

    minority

    children,

    or

    English

    language

    learners

    to

    college

    ready

    levels

    of

    achievement.MostofthechildrenofresponderstoEngelsarticlewouldnotbeharmedmighteven

    benefitbyaweakenedaccountabilitysystem.ButtheotherstheonesnoonespokeforintheNew

    YorkTimesexchangecouldloseeventheslenderchanceswenowofferthem.

    A Solu t ion

    Testingandaccountabilityshouldremainattheheartofnationaleducationpolicy.Equityandnational

    prosperitydependonasystemthatwillstretcheducators,theeducationsystem,andcommunitiesto

    worktowardhighachievementandthatwillenableclearaccountabilitywhenachievementgoalsare

    missed.Butthereshouldbenewformsofassessment,functioninginnewwayswithintheeducation

    system,tomeettheneeds.Asearlyas1992,scholarsshowedhowinmanycountriesoftheworld,

    tightlylinkedexaminationandcurriculumsystemskeptaspirationshigh,guidedteachersintheirwork,

    andsometimescreatedpathwaysforyoungpeoplewhodidnotcomefromprivilegedfamilies

    (Resnick&Resnick,1992).Thesecretlayinchargingteacherstopreparetheirstudentsforexamsand

    makingsurethattheexamswereworthstudyingfor.Forthesystemtowork,teachersandstudents

    neededtohavearoughideaofthekindsofquestionsthatwouldbeposedontheexamsalthoughnot

    thespecificquestionsthatwouldappear.Thesystemsalsorequiredtrustthatexamgradeswouldbe

    fairthatis,studentswouldlikelyreceivethesamegradenomatterwhoscoredtheirwrittenwork

    (writtenessayspredominatedovershortanswerandmultiplechoiceitemsbecausethecountries

    valuedthekindsofthinkingthatweredisplayedinsuchessays).Systemsforcheckingongradefairness

    (andallowingchallengesinafewcases)variedamongthecountriesstudied,butallfoundwaysof

    maintainingpublictrustinthesystem.

    Inthispaper,weoutlineanAmericanExaminationSystem,onethatreflectskeyaspectsofthe

    substantive,cognitivelydemandingEuropeansystems,whilemaintainingstandardsofpsychometric

    rigornecessarytosupportAmericasaccountability,comparability,andequityagendas.

    TheAmericanExaminationSystemwehaveinmind:

    modelsthekindsofinstructionthatarevaluedsothatpreparingstudentsforassessmentworksforratherthanagainsthighcognitivedemandinstruction;

    situatesexamswithinthestreamofongoinginstructionsothatassessmentssupportteachingratherthandistractfromit;

    ensurescontentandinstructionalvalidityofallassessmentssothatthealignmentproblemsthathaveplaguedstatetestingsystemscanberesolved;

    providesreliableandvalidaccountabilitymeasuresforstudent,school,andeducatorperformance;

    includesdiagnostictoolsforinstructiontomeetindividualstudentneeds;

  • 8/8/2019 ResnickBergerSystemModel

    6/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    6

    leveragesadvanceddatacollectionandcomputationalresourcestomasspersonalizetheformativeassessments,improvingtheirprecisionandusefulness.

    TheAmericanExaminationSystemweoutlinewouldbeeducativeforthosewhouseit.Itwouldnotjust

    tellus

    how

    well

    students,

    teachers,

    and

    schools

    are

    performing,

    but

    also

    teach

    teachers

    how

    to

    teach,

    teachstudentshowtolearn,andteacheducationorganizationshowtodevelopteachingexpertise.It

    wouldmeetthiseducativegoalthroughasystemthatcombinesdistributedaccountabilityexamslinked

    tospecifictopicsforinstructionwithdiagnostic,formativeassessmentsdesignedforteacheruseduring

    instruction.

    Anonlineplatformwillmakeitpossibletodeployandmanagealloftheseelementsatscaleinacost

    effectivewaywhileminimizingadditionalburdensforteachers,students,andadministrators.This

    onlineplatformwouldbemuchmorethanasystemforadministering,scoring,andreportingon

    assessments.Itcansurroundthewhatofassessmentoutcomeswithusefulrepresentationsofsowhat?

    (professionaldevelopment)

    and

    now

    what?

    (more

    targeted

    instructional

    resources)

    so

    that

    everyone

    focusesontheconsequentialandinstructionalvalidityofassessmentresultsandnotjustthe

    accountabilitypressure.

    Distributed Accountability Exams (DAEs)

    Accountabilitydatainthissystemwouldbederivedfromexamsthatareadministeredatintervals

    throughouttheschoolyear,occurringafterstudentshavecompletedaunitofstudyonparticular

    contentandskillsasidentifiedintheCommonCoreStandardsandstatestandards.Accountabilitydata

    wouldbereportedonthebasisofindividualstudent,subgroup,class,school,anddistrict,aswellas

    acrossclasses,schools,anddistricts.Thetypesoftasksontheexamswouldbelargelyfamiliarto

    students,who

    would

    have

    worked

    on

    similar

    tasks

    in

    the

    course

    of

    instruction.

    But

    neither

    teachers

    nor

    studentswouldknowpriortotheDAEexactlywhatquestionswouldappear.Basedonwhatisrequired

    fromthenewCommonCoreStandards,weexpectthreetofiveDAEsperyearinmathematicsand

    literacyateachgrade,witheachexamassessingmaterialcoveredthrough37weeksofinstruction,but

    thespecificsofnumberandtimingwouldneedtobeworkedoutwithstates.

    TheDAEswouldmodelthekindofhighcognitivedemandperformancesintendedbytheCommonCore

    Standardsandrigorousstatestandards,aswellastestbasicproceduralskills.Inliteracy,theywould

    includeextendedwrittenworkandotheropenendedexpressionsofstudentreasoningandthinking;in

    mathematics,theywouldincludedrawings,graphs,mathematicalexpressions,andexplanations.They

    would

    assess

    basic

    knowledge

    both

    within

    these

    constructed

    performances

    and,

    where

    appropriate,

    in

    clustersofmultiplechoiceitems.Inadditiontomodelinghighcognitivedemandinstruction,theDAEs

    wouldreflectwhatshouldbetaught(specifictopicsdeterminedbystateandCommonCoreStandards).

    TheCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystemthatis

    closelytiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecoregrade

  • 8/8/2019 ResnickBergerSystemModel

    7/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    7

    levelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.3Theyarespecified

    atagranularsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly

    meaningfulassessments.

    Tasks

    or

    items

    for

    the

    DAEs

    would

    be

    pre

    tested

    and

    calibrated

    using

    standard

    classical

    and

    multi

    dimensionalitemresponsetheory(IRT)frameworks.Inaddition,eachDAEwouldundergoarigorous

    processofestablishingcontentvalidityandinstructionalvalidityprocessesthattesttheoryoftencalls

    forbutarenotpartofstandardprocedureinmostinstancesofeducationtestdesign.Astheproject

    matures,taskswouldbecollectedintoitembanksforuseinfutureconstructionofDAEs.Informationon

    studentperformancedata,instructionaltargets,andtheformsofinstructionthatresultreliablyin

    studentlearningwouldbesharedwithstakeholdersincludingparentsandstudents,teachers,schools,

    testingadministrators,andthoseresponsibleforpreparingandselectingteachers.

    Ideally,everystudentwouldtakeeachDAEwhenheorsheisreadyandnotbefore.Sothelongterm

    goalshouldbetohavesufficientalternateexamsthatstudentshavemorethanonechancetotakean

    exam(as

    they

    do

    for

    New

    York

    State

    Regents).

    Attheoutset,amorelimitedsetofequivalentexamstwoversionsofeachDAEwouldbedeveloped.

    Thetwoversions,oneadministeredbeforeinstructionandoneafterwards,wouldbeusedbythe

    assessmentdeveloperstoestablishinstructionalvalidityoftheexams.Availabilityofmultipleformsof

    theDAEswouldallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,along

    withteacherandschooleffectiveness.Inaddition,preinstructionresultscouldbeusedbyteachersas

    partoftheformativedatatheyusetoplananinstructionalunit.

    Figure1,adiagramofhowtheDAEsmightprogressthroughtheschoolyear,showshowDAEsinteract

    withformativeassessments(describedinsubsequentsections)thatarealsointegratedintothesystem.

    Figure1.ExampleofHowDistributedAccountabilityExams(DAEs)MightProgressinaSchoolYear.

    3Someofthelearningsequencesinthestandardsarebasedonresearchconductedbymultiplescholarsoverthreedecades.

    Othersarebasedonwellhonedintuitivejudgmentsbyexpertscholarsandpractitioners.Allwillrequirefurthervalidationin

    useoverthecomingyears.Whatisnewandimportantinthecurrentcorestandardseffortisthatthestandardsareorganized

    intomultidimensionalsequencesoflearningthatcaninformbothassessmentandinstruction.

  • 8/8/2019 ResnickBergerSystemModel

    8/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    8

    ThesixthandseventhgradeCommonCoreStandardsformathematicsspecifyfivecontentareas:

    RatiosandProportionalRelationships TheNumberSystem ExpressionsandEquations Geometry(insixthgrade,PropertiesofArea,SurfaceArea,andVolumeareexplicitly

    named)

    StatisticsandProbabilityTheRatiosandProportionalRelationshipssectionforsixthgrademathematics(seeAppendixA)includes

    twoparallelsetsofstandards,oneforMathematicalUnderstandingandoneforMathematicalSkill.In

    addition,thereisasetofstandardsforMathematicalPracticethatthestandardswritersintendtoapply

    atall

    grade

    levels,

    although

    it

    is

    understood

    that

    the

    student

    performances

    representing

    good

    mathematicalpracticewilllooksubstantiallydifferentatdifferentage/gradelevels.OurDAEswould

    provideavalidandreliablepictureofhowstudentsareprogressingontheMathematicalPractice

    standardsaswellasonthespecificcontentstandards.

    Figure2displaysthesixthgradestandardsinavisualizationwecallthehoneycombthatspecifiesour

    hypothesesabouttheinterdependenciesamongthem.Thehoneycomb,whichwedescribemorefully

    below,servesasavisualrepresentation(interactivemap)oftheinstructionalandassessmentspacethat

    needstobetraversedinallgrades,includingthesixth,andalsoasaframeforassemblingdataon

    studentperformanceinamannerthatwillsupportinferencesabouttheprogressofindividualstudents,

    classesof

    students,

    schools,

    and

    school

    districts.

    Takentogether,theMathematicalUnderstanding,MathematicalSkill,andMathematicalPractice

    standardsinformandconstraintheassessmentsthatwouldbebuiltfortheDistributedAccountability

    Exams.Assume,forpurposesofdevelopinganexample,thatthesixthandseventhgrademathematics

    teachingwillbedividedintofiveunitsofinstruction,oneunitforeachofthefivecontentareas.One

    wouldthusneedfivecontentspecificexamsinmathematicseachyearforsixthandseventhgrades.The

    exams(liketheinstructionalunitstheyreference)mightnotbeofequallength,becausesomeofthe

    standardscovermorematerialthanothers.Butweenvisionexamsof40to75minutesinlength,each

    gearedtoateachingunitof3to7weeks.

    An

    example

    of

    an

    exam

    covering

    the

    sixth

    grade

    unit

    on

    Ratios

    and

    Proportional

    Relationships

    is

    includedinAppendixA.

  • 8/8/2019 ResnickBergerSystemModel

    9/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    9

    Figure2.VisualizationofSixthGradeStandardsasaHoneycomb.

    An Engl i sh Languag e Ar ts Exam ple

    TheEnglishlanguagearts/literacystandardscansimilarlybeusedtospecifysequencesofinstructional

    unitsandassessments.Thecorestandardsareorganizedingradebandsratherthangradebygrade.As

    inmathematics,skillsandunderstandingsareexpectedtodevelopovermultipleyears.Inaddition,

    guidelinesexistforchoosingtextsthatusemodernquantitativemethodstocharacterizethecognitive

    andlinguisticcomplexityofwritinginseveraldifferentgenres.

    UsingalloftheseresourcesoftheCommonCoreStandards,wehavesketcheddistributedexamsfor

    Englishlanguagearts;oneexampleofanexamisinAppendixB.

    Validity and Reliability in Distributed Examinations

    TheDAEswouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.Eachexamwouldprovide

    areliableestimateofstudentknowledgeonthecontentofaninstructionalunitthatisexplicitly

    targetedtoastandard,orsetofstandards,intheCore.Thecollectionofexamscoresforayear(e.g.,

    fivemathematicsexamsineachofGrades6and7)wouldprovideavalidestimateoftheextentto

    whichastudent(class,school)hasmasteredthecontentspecifiedbythestandardsforthatyear.

  • 8/8/2019 ResnickBergerSystemModel

    10/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    10

    Cont ent Va l id i t y

    Theexamswouldmatchclosely,inbothcontentandform,thecontentthatisexpectedtobetaughtin

    eachoftheinstructionalunits.Newinstructionalunits,explicitlylinkedtotheCorestandards,wouldbe

    created

    to

    anchor

    the

    content

    validity

    of

    the

    units.

    Teams

    of

    independent

    content

    and

    instructional

    expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsandareof

    highinstructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodel

    instructionalunits.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandards

    thatnowtroublesmanystateassessments.(Stateswouldnot,however,berequiredtousethemodel

    instructionalunitsintheiractualclassrooms.)

    I ns t ruc t i ona l Val i d i t y

    Assessmentsareconsideredinstructionallyvalidwhenstudentperformanceimprovesafterquality

    instructiononthecontentoftheassessment.Althoughinstructionalvalidityispartofthegoldstandard

    for

    educational

    testing,

    it

    is

    almost

    never

    established

    in

    current

    assessment

    practice.

    We

    can

    do

    better.

    Wewillapplystrategiesofinvivo(liveclassroom)researchdevelopedbythePittsburghScienceof

    LearningCenter.Thesescientific(experimentbased)researchstrategiescanbeusedtoestablish

    whethereachparticularDAE,infact,respondstogoodteaching.Statesandschooldistrictsusingthe

    DAEsystemwouldbeabletovalidateDAEsagainstbestpracticeinstructiondevelopedbytheirmost

    effectiveteachers.

    Rel iabi l i ty

    DAEswouldcontainamixofshortconstructedresponseitemsandmoreextendedwrittenresponses,

    alongwithsetsofmultiplechoiceitemsasappropriatetothestandardbeingexamined.Shortandlong

    constructed

    response

    components

    would

    require

    human

    scoring.

    Research

    has

    established

    that

    when

    constructedresponsetasksarewelltargeted,scoringrubricsarespecific,andgradersaretrained,ahigh

    levelofinterraterreliabilitycanbeattained(Mariano&Junker,2007;Patz,Junker,Johnson,&Mariano,

    2002;Rayn&Shepard,2008).

    Studentresponsesonconstructedresponseitemscouldbegradedlocally(withinthesameschoolbut

    notbythestudentsownteacher)orbygeographicallyandsociallyremotescorers(includingteachers

    elsewhereinthedistrictorstate).Thesegradescouldbevalidatedusingoneofanumberofmethods

    thathavebeenusedinEuropeancountries(e.g.,crossschoolorcrossstategradingexercises;re

    gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationingradingexamsandthe

    relatedvalidationexercises(someofwhichcouldbefacetoface)createsagoodprocessfor

    professionallearning,

    one

    that

    many

    countries

    use.

    DAEsopenthepossibilityforincreaseduseofconstructedresponsesbecausetheyaredistributedover

    thecourseoftheyear,yieldingseveraltimesmoreopportunitytocollectdatathancurrentendofyear

  • 8/8/2019 ResnickBergerSystemModel

    11/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    11

    tests.Thisalsobringsbenefitsintermsofincreasedtestreliability.4Yettoobtainthesemorereliable

    results,studentswouldnothavetositfora5hourexamoreventakeanendofyearexam,depending

    onhowaparticularstatesystemisdesigned.Theyjustwouldhavetotakeunitexamsastheynormally

    wouldinthecourseofteaching,butnowwiththeunitexamcontributingtoanoverallaccountability

    score.

    Inaddition,weproposetouseearlierassessmentdatatohelpproducemorepreciseproficiency

    estimatesforeachDAE.Thisapproach,similartowhatisusedinsomeonlinetutoringsystemsand

    adaptivetestingsystems,couldmakeitpossibletoshortenmanyoftheassessmentswithnolossin

    measurementprecision(seeAppendixC).

    Distributedcontentandinstructionallyvalidatedexamsareanextlogicalstepinendingthetestingbind

    anddevelopinganassessmentsystemthatwilldetectandrewardhighquality,effectiveteaching.

    Insteadofsupportingtheuseofpracticematerialsthatmimictheoldendofyeartests,statescan

    providehighqualityinstructionaltoolsthathelpteacherspreparestudentsforDAEexaminations.5

    Therewill

    be

    no

    need

    for

    the

    current

    crop

    of

    interim

    tests

    that

    simply

    mirror

    the

    end

    of

    year

    test,

    since

    DAEsandrelatedformativeassessmentswilloccurthroughouttheschoolyearattimesthatmake

    instructionalsense.Withthissystem,wegainabilitytomeasureasetofhigherorderskillsthatarenot

    otherwiseeasilytested,includingskillsessentialtocollegeandcareerreadyperformanceinreading,

    writing,andmathematics,withoutaddingenormousburdenoftesting.

    Educative Formative Assessments

    TheAmericanExaminationSystemwillfosterarichenvironmentofformativeassessmentsthatare

    educativeinwaysthatdirectlyresemblethesummativesystem,butwithmoredirectapplicationto

    dailyandweeklyinstruction.

    TheywouldbealignedwiththelearningtrajectoriesderivedfromtheCommonCoreStandards,andthusalignedwithwhatteachersneedtoteach.

    Theywouldmodelapproachestohowtoteach,andwould,attherequestofeducators,provideteachersstructuredopportunitiesforgainingexperienceinusingthoseteaching

    methods.

    Teacherswouldmaketheseassessmentspartoftheirinstructionalroutine,ratherthanan

    addition

    to

    it.

    Data

    entry/record

    keeping

    burdens

    will

    be

    minimal,

    and

    teachers

    will

    haveeasyandquickaccesstostudent andclasslevelreportingaswellastoolsto

    4Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEstakentogetherwouldbe

    5*(0.7)/(1+4*0.7)=0.92.If,instead,halfofeachDAE'stestingtimewereusedforapretestonthenextinstructionalunitor

    simplyforcalibratingfuturetestitems,theimprovementwouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.5Foradescriptionofapproachestoprovidingthiskindofinstructionalguidanceinformsthatdonotsuppressteacher

    ingenuityandjudgment,seeResnick(inpress)andMcConachieandPetrosky(2009).

  • 8/8/2019 ResnickBergerSystemModel

    12/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    12

    understandtheinstructionalsignificanceofthatdata.Bytrackingfidelityintheuseof

    thesediagnostictools,thesystemwillhelpteacherstousethemappropriately.

    Formativeassessmenttasksthatcannotbemachinescoredwillbeaccompaniedbysimplerubricsfor

    quicklyanalyzing

    the

    student

    work.

    Teachers

    will

    be

    able

    to

    use

    digital

    devices

    to

    record

    these

    analyses.

    Throughthosedevices,theteacherswillalsobeprovidedwithsamplesofanswersthatcorrespondto

    eachlevelontherubric,tohelpthemcalibratetheirownanalyses.Asaformofprofessional

    developmentandtoimprovethereliabilityofanalyses,teacherscouldalsouploadthestudentwork

    intothesystem,alongwiththeiranalyses,togetfeedbackfromotherteachersorsubjectmatter

    experts.

    A Mathematics Example

    Educativeformativeassessmentsinmathematicswillbedesignedrecognizingthatcognitively

    demandingtaskscantypicallybesolvedinmanydifferentways.Fromadiagnosticperspective,itcanbe

    asimportant

    to

    know

    how

    a

    student

    is

    attempting

    to

    solve

    a

    problem

    as

    it

    is

    to

    know

    his

    or

    her

    answer.

    Theproblemsolvingtechniqueis,inmanycases,partofwhatisspecifiedinthestandards.The

    sequenceofhowthesetechniquesareusedovertimewilloftenindicateastudentsprogressin

    understandingconceptsandmovingalongalearningtrajectory.SotheEducativeFormative

    Assessmentswouldincludeitemsthatcapturethisinformationandempowerteacherstolearnto

    recognizethedifferentapproachesthatstudentstakeandtheirsignificancefordifferentiated

    instruction.

    AnexampleofthisapproachistheOngoingAssessmentProject(OGAP)6inmathematics,aframework

    andsystemforanalyzingmathematicalreasoningofelementaryandmiddleschoolstudentsasthey

    solveproblems.

    Teachers

    analyze

    written

    student

    work

    looking

    for

    evidence

    of

    mathematical

    reasoning

    andincreasinglevelsofsophisticationasstudentsprogressalonglearningtrajectories.Thediagnostic

    andinstructionalutilityoftheitemsareenhancedbyexaminingthethinkingandstrategiesthatwent

    intosolvingthem.Feweritemscanbeusedtoproducefarricherresultsbecausetheunderlyingthinking

    issurfacedandmadeapparenttotheteacher.Figure3illustrateshowteacherfacingsoftwareenables

    quickanalysisandrecordingofmeaningfulattributesofstudentworkcorrectnessofresponse,

    sophisticationofthereasoning(alongatrajectoryfromadditivetransitionalmultiplicativestrategies),

    andanyerrorsormisconceptionsthatemerge;thesetoolsandinterfacescanalsosupportremote

    analysiswhenstudentworkisdigitizedandrouted.

    Ingeneral,itwillbeessentialtoensurethatformativeassessmentresultsarenotincludedin

    accountabilityreportingtoeliminatetheincentivesformisuse.Weenvisionthatthestudent,class,

    andschoollevelresultswouldbeavailabletoteachers,coaches,andperhapsprincipals(toinform

    professionaldevelopmentaswellasinstruction),butnottodistrict/stateadministrators.

    6OGAPwasdevelopedasapartoftheVermontMathematicsPartnershipfundedbytheU.S.DepartmentofEducation(Award

    numberS366A020002)andtheNationalScienceFoundation(AwardnumberHER0227057).

  • 8/8/2019 ResnickBergerSystemModel

    13/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    13

    Figure3.TeacherFacingSoftware.

    However,metricsoffidelityinimplementingtheformativeassessments(andtheirassociated

    instructionalrecommendations)couldbeusedaspartofteacher/schoolperformance

    management/accountability.For

    instance,

    are

    teachers

    doing

    progress

    monitoring

    with

    the

    frequency

    appropriateforeachstudent,giventhelongitudinaldataaboutthatstudent?Principalsand

    district/stateofficialsshouldhaveaccesstothistypeofinformationinrealtime,sotheycanspotwhere

    theremaybeweakinstructionalcapacityandprovidetimelyinterventions(includingtargeted

    professionaldevelopment).Theywillwanttospotifteachersareusingtheformativesystemthewayin

    which,andasoftenas,itshouldbeused.(DCpublicschoolsisanexampleofaschoolsystemthatis

    alreadyusingthesetypesofformativeassessmentmetricsaspartofitsSchoolStatapproachto

    continuous,districtwideperformancemanagement.)

    Inaddition,theAmericanExaminationSystemplatformwouldprovidetoresearcherslongitudinaldata

    includingformative

    assessment

    data,

    organized

    by

    student/teacher/school/subgroup.

    7

    Inparticular,

    this

    datawouldbeusedaspartoftheresearchtosupportcontinuousimprovementofthesystem:tofine

    tunethelearningtrajectories,measuresofproficiencyforeachstandard,andalgorithmsformass

    customizationofassessments.

    7Alldatawouldbeanonymoustoprotectprivacy(andpreventtheformativedatafrombeingusedforaccountability).

    ResearcherswillbeabletoseethatStudentAhadTeacherXinSchoolYandseedataavailableforA,X,andY,butnotthe

    identityofthoseindividualsandinstitutions.

  • 8/8/2019 ResnickBergerSystemModel

    14/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    14

    Weexpectthatformativeassessmentfidelitydatawillbeespeciallyusefultoresearchers.Many

    instructionalinnovations,whentestedunderrealclassroomcircumstances,failtoshowimpact:

    researcherswonderwhetherthelackofresultswasbecauseofpoordesignorsimplybecausethe

    teachersdidnotimplementitcorrectly.Inthefieldoflearningresearch,scholarsarepointingtothe

    needfor

    researchers

    to

    distinguish

    between

    poor

    design

    and

    poor

    implementation.

    They

    make

    the

    comparisonwithpharmaceuticaltrials,whereaprerequisitefortestingmedicalefficacyisknowing

    whichofthetrialpatientstookthecorrectdosage(Rowan,Correnti,Miller,&Camburn,2009).

    A New Paradigm for Educational Measurement: Adaptive Mass

    Personalization

    Webelievethatanadvancedmodelofeducationalmeasurementcanbebuiltonafoundationof

    gatheringanorderofmagnitudemoredatabothinformalandformalabouteachstudentinthe

    courseoftheyearsothateachtestmerelyenhancestheresolutionofapicturethatissubstantially

    completebefore

    each

    test

    begins.

    Moreover,

    by

    applying

    the

    tools

    of

    mass

    personalization

    already

    so

    prevalentinInternetbasedcommerceandsocialnetworking,wewilleventuallybeabletopersonalize

    eachassessmentattheindividuallevelsothattheenhancedresolutionitprovidesistargetedtoan

    individualstudentscurrentlearninglevelaswellastoappropriatestandardsofreliabilityandvalidity.

    Thatis,thesystemcankeepaskingquestionsuntilitknowsenoughtobeinstructionallyhelpfultothe

    studentandtheteacheranduntilitknowsenoughtosupportrelevantpolicyandaccountability

    decisions.

    Stan dard izat ion Versus Personal i za t ion

    Standardizationwastheengineofthefactorymodelthatdrovetheeconomyofthe19thand20th

    centuries(Resnick

    &

    Resnick,

    1977,

    1980).

    Now

    the

    powerful

    drivers

    of

    the

    economy

    are

    personalization

    andcustomizationoftenappliedindirectcontradictiontoapreviouslyvaluedstandardizedoffering.

    Amazon.com,forexample,learnswhatyouliketoreadandoffersanincreasinglypersonalized

    bookstorejustforyouthatbecomesmorepreciseovertime.ThevideorentalchainNetflixhasnow

    hostedseveralinternationalcompetitionsforimprovingtheirpersonalizationengine.

    ThestatisticalenginesunderlyingpersonalizationontheWorldWideWebaredistinctfromthose

    underlyingstandardizedtesting,buttheyarenowentirelyrobustandprovenindeedtheyaretested

    andrefinedonadailybasisinlargescalecommerce,largescalemedicalresearch,andfinancialmarket

    predictions.

    It

    is

    time

    to

    bring

    these

    ideas

    to

    education

    in

    ways

    that

    will

    dramatically

    improve

    the

    precision

    with

    whichourformerlystandardizedtestsfulfilledtheirstandardpurposes,whilesimultaneouslyexpanding

    theirusefulnesstoinformdailyinstruction,todiagnoseindividualpatternsinstudentlearning,andto

    surroundstudentswithsupportsthatarepersonalizedtotheirneeds.

    BecausetheAmericanExaminationSystemaimstoadministeralltypesofassessmentsforaverylarge

    numberofstudentsoveraperiodofmultipleyears,acrossmultiplestates,andcantakeaccountof

    variousothereducationdata,itshouldbeabletoserveasanengineformasspersonalizationofthese

  • 8/8/2019 ResnickBergerSystemModel

    15/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    15

    assessments.Attributesthatcouldbethebasisofpersonalizationincludepaststudentperformanceon

    assessments,teacherandschoolcharacteristics,aggregatedassessmentperformanceofstudentsina

    school,previouseffectivenessofteacher,whichcurriculumwasused,andwhichassessmentshavebeen

    used.Thistechnologyisscalablecomputingpowerissuchthatthereisnopracticallimitonthe

    amountof

    education

    data

    that

    could

    be

    includedso

    that

    as

    more

    states

    and

    more

    types

    of

    data

    are

    included,themoreprecise(anduseful)thecustomizationbecomes.

    Thisinitialgoalformasspersonalizationwouldbetoapplyittoformativeassessment.Thereare,

    alreadyinuse,manymodalitiesofformativeassessment(diagnostic,progressmonitoring,screening),

    eachincludingamixofassessmenttypes(multiplechoice,constructedresponse,observation).Someof

    thesearebestdeliveredaspartofgroupactivitiesandsomeoneononebetweenasinglestudentand

    teacher.Manyteachers/districtsuseablendoftheseformativeassessments,whichmakessensegiven

    thediverseneedsofparticularstudentsatdifferentmomentsoftheiracademicdevelopment;butmany

    otherteacherswhoarenotthemselvesexpertsinformativeassessmentmethodologiesstruggleto

    decidehow

    best

    to

    integrate

    all

    of

    these

    choices

    into

    their

    teaching

    routines

    for

    their

    particular

    students.

    So,inadditiontoprovidingneweducativeformativeassessments,theAmericanExaminationSystem

    wouldmasscustomizeamuchwiderrangeofformativeassessmentsatthestudentandclasslevel.This

    isadaptiveassessmentatthelevelaboveindividualitemsitfiguresoutwhichformativeassessmentto

    giveandwhenenablingteacherstogetjusttherightnextpieceofinformationtheyneedabouttheir

    students,withoutwastingalotofclassroomorotherschooltime.Withthisplatform,teacherswillbe

    blendingmodesofassessmentinindividualizedwaysvaryingwhatdatatheycollectandhowbased

    onwhatisknownsofarabouteachstudent.Tosupportthis,thesystemwillhostabankofformative

    assessmentmaterials,tocoverthefullrangeofdiagnosticoptionsastateorschooldistrictwishesto

    use,from

    open

    source

    or

    commercial

    sources.

    ThemasspersonalizationprocesscanalsoaddtothereliabilityandefficiencyofDAEs.AppendixC

    showshowastandardstatisticalmodelcanusedatafrompreviousDAEstomakethenextDAEmore

    efficient,aslongasthestudentisbehavingconsistentlyfromoneunittothenext.Ifthestudentseems

    tobeperformingunusuallywell(orpoorly),thenthemodelcandetectthisandsuggestacustomization

    oftheDAEtofurtherexplorewhatthestudentknowsandcando.

    The Assessment P latform

    TheassessmentPlatformmanagesbothpartsofthesystemtheDAEsandtheeducativeformative

    assessmentstoenable

    assessment

    delivery,

    scoring,

    reporting,

    and

    analysis.

    Based

    on

    widespread

    classroomexperiencewithexistingproductsandoncurrentdesigns8(someofwhichhavebeenfunded

    bytheGatesFoundation),itwillbeablehandlealloftheseelementsatscaleinacosteffectiveway,

    whileminimizingadditionalburdensforteachers,students,andadministrators.

    8TheauthorswishtoacknowledgethesupportoftheGatesFoundationinconceptualizinganextgenerationassessment

    platformandformoregenerallyadvancingthefieldofalignedunitsofcurriculumandassessment.

  • 8/8/2019 ResnickBergerSystemModel

    16/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    16

    Honeycomb

    TheAmericanExaminationSystemwillprovideahoneycombaninteractivemapoflearning

    trajectoriesandourhypothesesaboutthedependenciesamongthem.Thehoneycomboffersavisual

    representation

    of

    the

    instructional

    and

    assessment

    space

    that

    needs

    to

    be

    traversed

    in

    each

    grade

    as

    wellasacrossgrades,allthewayfrompreKthroughGrade12.Itprovidesaframeforassemblingdata

    onstudentperformanceinamannerthatwillsupportinferencesabouttheprogressofindividual

    students,classesofstudents,schools,andschooldistricts.Itwillalsosupportresearchtovalidate/refine

    thehypothesesaboutdependenciesamongtheskills(withinandacrosstrajectories)intheCommon

    CoreStandardsandsimilarstatestandardsforinstance,identifyingwhatlevelofwhichspecificliteracy

    skillsareneededtoachievemasteryofwhichmathematicsskills.

    TheAmericanExaminationSystemwouldgiveeducatorssummativeandformativeassessmentsforeach

    skillstepalongeachlearningtrajectory,startingwithmathematicsandliteracyforGrades310.Other

    assessmentdataforinstance,existingformativeassessmentsforpreKthroughGrade3studentsor

    highschool

    examscan

    also

    be

    mapped

    onto

    the

    learning

    trajectories.

    All

    of

    this

    data

    can

    be

    included

    in

    thehoneycombsothatteachers,parents,andthestudentsthemselvescantrackindividualstudent

    progress(andextenttowhichstudentsareontrack)towardcollegeandcareerreadiness.

    ThehoneycombbuildsononeoftheintrinsicadvantagesoftheAmericanExaminationSystem,whichis

    thatitoffersahighlycoherentandintegratedpackageofsummativeandformativeassessments.In

    particular,thesystemsrapidscoringworkflowandreportinginterfacewouldenableeducatorstouse

    theDAEresultsfordiagnosticpurposesattheindividualstudentandclasslevel.Forexample,where

    studentshavewrittenanessay,teacherswouldbeabletoseewhetherstudentscanwritethesortof

    complexsentencesandcanmakeargumentsoutofideasthatareappropriateforthegradeslearning

    trajectory.

    The

    pre

    tests

    for

    each

    exam

    would

    be

    especially

    useful

    in

    this

    regard

    because

    the

    pre

    tests

    assessthetopicsandstandardsthatteacherisabouttoteach.

    Eachhexagonofthehoneycombcouldalsolinktoinstructionalresources(includingvideoexemplars

    andsocialnetworking/collaboration).SeeFigures4and5.

    Thistoolcanbeadaptedforuseinanystatewhosestandardsincludelearningtrajectoriescomparable

    tothoseintheCommonCoreStandards.Weenvisionthattherewouldbetwomeasuresofproficiency

    indicatedforeachskill/hexagon:thefirstbasedonformative(nostakes)dataandthesecondbasedon

    summative(highstakes)data.

    Put t in g Power and Choice in the Hand s o f Teachers

    Theplatformwillincludeanassignmentbuilder,sothateducatorscanselectformativeassessment

    itemsastasksforusebythestudentsintheclassroomorashomework.Thisallowstheteachersto

    focusstudentworkontheparticularconceptsandskillsthattheyneedtodevelop.So,forinstance,a

    teachercoulddrilldownfromaspecifichoneycombhexagon(CommonCoreStandard)tobuildan

    assignmentforasubsetofherstudents.SeeFigures6and7.

  • 8/8/2019 ResnickBergerSystemModel

    17/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    17

    Figure4.HoneycombforMathematicsSixthGrade.

    Figure5.EachHexagonoftheHoneycombCouldAlsoLinktoInstructionalResources.

  • 8/8/2019 ResnickBergerSystemModel

    18/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    18

    Figure6.Assignmentbuilder.

    Figure7.Individualassignment.

  • 8/8/2019 ResnickBergerSystemModel

    19/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    19

    Other P la t for m Tools

    Inadditiontoprovidingthehoneycombandtoolstosupportmasscustomization,theAmerican

    ExaminationSystemplatformwill:

    Enablestudentstotaketheassessmentsonlineoronpaper; Enableteachers/schoolstoscananduploadpaperbasedassessmentsandother

    studentwork;

    Manageremotescoringworkflowandprovidescoringinterfaceforremoteraters; Provideteacherswithascoringinterface(couldincludeabilitytomarkupstudentwork

    andrecordnotes)andareporting(gradebook)interface;

    Providedashboardtoolsfortrackingandanalyzingtheprogressofparticularstudentsand

    groups

    and

    students;

    Provideprincipalsanddistrict/stateadministratorswithareportinginterfacethatincludesaggregateanalysis(includingcrossclass,crossteacher,crossschool,cross

    districtandcrossstateanddemographiccomparisons,withthelongitudinal

    dimensionsincludingvalueaddedonendofyearhighstakesincludedineach);

    Allowuserstogeneratecustomreportsinrealtimeondemandwithbothteacherandprincipal/administratorinterfaces;

    Allowteacherstoshareformativeassessmentswitheachotherandexpertstogaininstructional

    advice

    and

    create

    opportunities

    for

    professional

    development;

    and

    Providerolebasedaccessrights(includingtoprotectstudentprivacy).9Thus,thesystemwillgatherandprovidereadyaccesstoaccountabilityinformation,andalsohelp

    teachersandschoolstoimprovelearningmeasuredbyrigorousstandardsandgoodinstructional

    practices.ItwouldcoverthefulltrajectoryfromPreKthroughGrade12.

    TheAmericanExaminationSystemwouldnotassumethatallassessmentswillalwaysbeconductedwith

    studentssittingatcomputers.Givencurrentschoolinfrastructure,andgiventhechallengeofshowing

    mathematicsworkviakeyboard,itmaybemoreefficienttocontinuetorelytosomeextentonpaper

    andpencil

    inputs

    to

    an

    otherwise

    digital

    system.

    The

    continued

    value

    of

    these

    primitive

    recording

    toolsseemsespeciallycompellingwhenoneconsidersthatmuchofthevalueofthenewgenerationof

    assessmenttasksdependsonsolicitingopenendedexpressionsofstudentreasoningandthinkingand

    inthecaseofmathematicsthisincludesdrawings,graphsandexplanations.

    9Toensureprotectionofstudentprivacyrights,thesystemhasthecapacitytomakedigitizedstudentworkanonymousbefore

    routingittoremotescorers.

  • 8/8/2019 ResnickBergerSystemModel

    20/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    20

    SotheAmericanExaminationSystemwouldincludeaprocesstoenablescanning/digitalphotographing,

    uploadingandarchivingofverylargevolumesofpaperbasedstudentwork,includingforDistributed

    AccountabilityExams,toenableremotescoringaswellasonlinestudentportfolios.The

    scanning/photographingprocess,whichhasalreadybeentestedinNorthCarolinaclassrooms,puts

    minimalburdens

    on

    teachers

    or

    other

    school

    staff

    and

    does

    not

    require

    large

    per

    school

    investments

    in

    hardwareornetworkinfrastructure.

    Fortheforeseeablefuture,assessmentofopenendedexpressionsofstudentreasoningandthinking

    willrequireatleastsomeelementofhumanscoring.Doingthisrigorouslyandreliably,especiallyina

    summativecontextwheretherearestakesforteachersandschoolsaswellasforstudents,requires

    findingacosteffectiveandtimeeffectiveworkflowfordirectingtheworktoremotescorers(including

    crossschoolorcrossstategrading/validationexercises;regradingofasampleofstudentpapersatthe

    statelevel).

    TheAmericanExaminationSystemplatformenablesthisworkflow.Itautomatesdeliveryofdigitized

    studentwork

    (including

    paper

    and

    pencil

    work)

    to

    raters

    and

    those

    validating

    the

    ratings.

    Student

    identityiskeptprivate(theratersdonotknowwhoseworkitis).Theonlineinterfaceforremoteraters

    presentsthemwiththestudentworkalongsidescoringformsbasedontherubricappropriateforthat

    typeofwork.SeeFigure8.

    Theplatformwillallowteachers,principals,districtsandpotentiallyparentsandthestudents

    themselvestogeneratecustomreportsinrealtimeondemand.Thesereportswouldaggregate

    longitudinaldatafromdifferentDistributedAccountabilityExamsandformativeassessmentstoprovide

    amorecompletepictureofeachstudent,class,andschool.

    Figure8.TheOnlineInterfaceforRemoteRatersfortheAmericanExaminationSystem.

  • 8/8/2019 ResnickBergerSystemModel

    21/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    21

    Development and Costs

    OurvisionfortheAmericanExaminationSystemisambitious.Whatmakesitrealisticisthesubstantial

    amountofworkthathasalreadybeendoneindevelopingthecontentandtoolsneededtomakeit

    work.

    Forinstance,IFLhasextensiveexperienceindevelopingmodelinstructionalunitssuchastheonesthat

    willbepartofthesystemandinworkingwithschoolsystemstotailortheunitstolocalneedsand

    preferences(McConachie&Petrosky,2010).IFLunits(andaccompanyingassessments)arebuiltintothe

    curriculumguidancesystemofseveralurbanschooldistrictsandhavebeenshowntoproducehigh

    levelsofteacherengagementandimprovedinstructionwhenaccompaniedbyappropriateformsof

    professionaltraining(David&Greene,2008;Resnick,inpress;Talbert&David,2008).

    Manyoftherequiredtechnologiesarealreadyinuseinexistingassessmentanddatamanagement

    applicationsorarenowbeingdevelopedthroughWirelessGenerationandthroughvariousinitiativesof

    the

    Bill

    and

    Melinda

    Gates

    Foundation

    to

    create

    aligned

    systems

    of

    curriculum

    and

    formative

    assessment.Thus,forinstance,muchoftheplatformforauthoringandadministeringcognitively

    demandingassessmentitemsatscalewillbeavailableforonlineuseinDecember2010.

    Thesystemwedescribeisonethatwilloperatefullyabout3yearsfromthebeginningoftheprocess,

    withmasspersonalizationofsummativeassessmentplayingalargerroleattheendofthattimeframe.

    Muchofthesystem,includingtheDistributedAccountabilityExams,EducativeFormativeAssessments,

    andotheraspectsofthetechnologyplatform,willbeoperationalin2years.

    Plat form

    BasedonthedirectexperienceofWirelessGenerationinbuildingasystemofcomparablecomplexity

    (ARIS,the

    education

    information

    system

    for

    the

    countrys

    largest

    public

    school

    system),

    we

    estimate

    thatasecureandscalableversionoftheinitialplatformcanbeavailableforusein6monthsafterwork

    ontheprojectformallybegins;additionalfunctionalitywouldbeavailableafter12months;anda

    comprehensivesysteminuseatscalein18months.Additionaldevelopment,relatedtotheresearch

    androlloutofthemasspersonalizedaspectoftheassessments,wouldtakeplacewithinalongertime

    frame(36months).

    Assessments

    Theplatformcouldbeusedtodevelopandrolloutassessmentsaccordingtothefollowingthreephases:

    establishmentofcontentvalidity(forboththeDAEsandEducativeFormativeAssessments);

    establishmentof

    instructional

    validity

    (for

    the

    DAEs);

    and

    then

    the

    use

    of

    the

    system

    for

    summative

    accountabilitypurposes.Statesshouldbeconsultedtodeterminewhichgradesandwhatsubjectsto

    prioritize.

    Weanticipatethat,after12months,theDistributedAccountabilityExams,includingtheunitsformodel

    instruction,foruseforGrades310,willreceivesignoffoncontentvaliditybyStateDepartments.The

    EducativeFormativeAssessmentscouldbegintobeusedduringthisfirstyearafterthecontentis

    validated.

  • 8/8/2019 ResnickBergerSystemModel

    22/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    22

    Duringmonths1324,orsoonerifpossible,wewilldotheexperimentstocaptureinstructionalvalidity,

    beginningassoonascontentvalidityisestablished.

    Afterthis24monthperiod,theDistributedAccountabilityExamswouldbeusedforsummative

    accountability

    purposes.

    Operat iona l Costs

    Weestimate,foratypicalstate,theongoingcostsofthesystemwillbeaboutthesameasforcurrent

    NCLBtests.Currentexpendituresaretypically$20$30perstudent,andinsomecaseshigherthan$80,

    tocoverreadingandmathematics(U.S.DepartmentofEducation,2010).

    AdministeringtheDistributedExams(includingthepreinstructionversion)willcostmoretodevelop

    andscorethanthecurrenthighstakestests,iffornootherreasonthantheirfrequency.Butthecurrent

    interimexams,andexpensesassociatedwiththose(typically$15$20ormoreperstudentperyear),

    couldbeeliminated.

    Teacherswithintheschooldistrictcouldscoretheexamsfromeachothersstudents,butasignificant

    portionoftheongoingcostwouldbefromvalidationofsamplesofteacherscoring.

    Apartfromprovisionoftablet/handheldandscanningdevicesforteachers(offtheshelf,industry

    standardtechnologiesthatarecomingdowninpriceeachyear),costsassociatedwiththemaintenance

    ofthetechnologyplatformwouldbeminimalwhenconsideredonaperstudentbasis.

    Key System Characteristics

    Rigorous Standards and Good Instructional PracticesThenewCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystem

    thatiscloselytiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecore

    gradelevelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.Theyare

    specifiedatagrainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly

    meaningfulassessments.

    TheAmericanExaminationSystemincludesDistributedAccountabilityExams,foruseoverthecourseof

    theschoolyear,whichmeasurethespecifichigherorderskillsthatarearticulatedintheCommonCore

    Standardsandstatestandards,aswellasbasicknowledge.TheDistributedAccountabilityExamswill

    include

    extended

    written

    work

    and

    other

    open

    ended

    expressions

    of

    student

    reasoning

    and

    thinking;

    in

    mathematics,thesewouldincludedrawings,graphs,andexplanations.Theywillassessbasicknowledge

    bothwithintheseconstructedperformancesand,whereappropriate,inclustersofmultiplechoice

    items.After24months,thesetestswillbegintoreplacecurrentsummativetestsforaccountability

    purposes.

    TheDAEswillreflectwhatshouldbetaught(specifictopicsdeterminedbystateandCommonCore

    Standards).DistributedAccountabilityExamswilladdresseachoftheskills/topicsarticulatedforeach

  • 8/8/2019 ResnickBergerSystemModel

    23/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    23

    yearofthestateandcommonstandards.Inthefirstwave,therewillbeDistributedAccountability

    ExamsformathematicsandliteracyforGrades310inliteracyandmathematics.Afterthat,sample

    itemswouldbepublishedandinvitationsextendedforparticipatoryauthorshipofassessmentitems

    thatrelatetothestandardsthatarebeingtestedandtheparticularitemandassessmenttypes.

    TheDAEswouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.Eachexamwouldprovide

    areliableestimateofstudentknowledgeonthecontentofaninstructionalunitthatisexplicitly

    targetedtoastandard,orsetofstandards,intheCore.Thecollectionofexamscoresforayear(e.g.,

    fivemathematicsexamsineachofGrades6and7)wouldprovideavalidestimateoftheextentto

    whichastudent(class,school)hasmasteredthecontentspecifiedbythestandardsforthatyear.

    Cont ent Va l id i t y

    Theexamswouldmatchcloselyinbothcontentandformthecontentthatisexpectedtobetaughtin

    eachoftheinstructionalunits.Newinstructionalunits,explicitlylinkedtotheCorestandardswouldbe

    createdto

    anchor

    the

    content

    validity

    of

    the

    units.

    Teams

    of

    independent

    content

    and

    instructional

    expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsandareof

    highinstructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodel

    instructionalunits.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandards

    thatnowtroublesmanystateassessments.

    I ns t ruc t i ona l Val i d i t y

    Assessmentsareconsideredinstructionallyvalidwhenstudentperformanceimprovesafterquality

    instructiononthecontentoftheassessment.Ourdevelopmentprocesswouldincludetestsof

    instructionalvalidity,similartotheexperimentbasedonesusedbythePittsburghScienceofLearning

    Center.These

    tests

    would

    involve

    panels

    of

    teachers

    with

    good

    knowledge

    of

    an

    instructional

    units

    contentaswellasdemonstrablygoodpedagogicalskills(asjudgedbyanexpertpanel).Theseteachers

    wouldbeputintofourgroups.Twoofthegroupswouldteachtheinstructionalunitthatcorrespondsto

    theDistributedAccountabilityExam.Inoneofthesegroups,theywouldgetPretestAfortheirstudents

    beforetheunitistaughtandthenthestudentswouldtakeTestB.Inthesecondofthesegroups,the

    testsareflipped:TestBisthepretestandTestAisgiventostudentsaftertheunitistaught.Inthethird

    andfourthgroups,studentswouldnotbetaughttheparticularinstructionalunitatthattime,butwould

    stillbegiventhepretestsandposttests(onegroupwithAasthepretestandBastheposttest,the

    otherwithBasthepretestandAastheposttest).Onlyteststhat,throughtheseexperiments,

    systematicallyregisterimprovementsinstudentperformanceasaresultofcorrespondinginstruction

    (anddemonstrate

    equivalence

    through

    the

    pre

    and

    post

    test

    swaps)

    will

    be

    included

    in

    our

    Distributed

    AccountabilityExams.Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabank

    oftasksthatwillbedevelopedaspartofthisvalidationprocess.

    Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabankoftasksthatwillbe

    developedaspartofthisvalidationprocess.ItemsortasksfortheDAEswillalsobepretestedand

    calibratedusingstandardclassicalandmultidimensionalIRTframeworks.Availabilityofmultipleforms

    oftheDAEswillallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,along

  • 8/8/2019 ResnickBergerSystemModel

    24/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    24

    withteacherandschooleffectiveness.10Inaddition,preinstructionresultscanbeusedbyteachersas

    partoftheformativedatatheyusetoplananinstructionalunit.

    Rel iabi l i ty

    DistributedAccountability

    Exams

    would

    contain

    a

    mix

    of

    short

    constructed

    response

    items,

    and

    more

    extendedwrittenresponses,alongwithsetsofmultiplechoiceitemsasappropriatetothestandard

    beingexamined.Shortandlongconstructedresponsecomponentswouldrequirehumanscoring.

    Researchhasestablishedthatwhenconstructedresponsetasksarewelltargeted,scoringrubricsare

    specificandgradersaretrained,ahighlevelofinterraterreliabilitycanbeattained(Mariano&Junker,

    2007;Patzetal.,2002;Rayn&Shepard,2008).

    Studentresponsesonconstructedresponseitemscouldbegradedlocally(withinthesameschoolbut

    notbythestudentsownteacher),orbygeographicallyandsociallyremotescorers(includingteachers

    elsewhereinthedistrictorstate).Thesegradescouldbevalidatedusingoneofanumberofmethods

    thathave

    been

    used

    in

    European

    countries

    (e.g.,

    cross

    school

    or

    cross

    state

    grading

    exercises;

    re

    gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationinthegradingexamsand

    therelatedvalidationexercises(someofwhichcouldbefacetoface)isagoodprocessforprofessional

    learningandisusedinmostcountries.Thoughtheprocessismorecostlyindollarsthanmachine

    scoring,itisaneducativeprocessworthbuildingintoourExaminationSystem.Gradevalidationatscale

    wouldbesupportedbytheAmericanExaminationSystemplatform,whichcanenablerapid,cost

    effectiveremotescanning,transmission,grading,validation,andreporting.Toensureprotectionof

    studentprivacyrights,thesystemhasthecapacitytoanonymizethedigitizedstudentworkbefore

    routingittotheremotescorersandvalidators,aswellas,forlimitedpurposes,automaticessayscoring

    technologies.

    TheDistributed

    Accountability

    Exams

    open

    the

    possibility

    for

    increased

    use

    of

    constructed

    responses

    becausetheyaredistributedoverthecourseoftheyear,yieldingseveraltimesmoreopportunityto

    collectdatathancurrentendofyeartests.Thisalsobringsbenefitsintermsofincreasedtestreliability.

    Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEs

    takentogetherwouldbe5*(0.7)/(1+4*0.7)=0.92.Ifinstead,halfofeachDAEstestingtimewereused

    forapretestonthenextinstructionalunitorsimplyforcalibratingfuturetestitems,theimprovement

    wouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.

    Yettoobtainthesemorereliableresults,studentswouldnothavetositfora5hourexam,oreventake

    anendofyearexam.Theyjustwouldhavetotakeunitexamsastheynormallywouldinthecourseof

    teaching,but

    now

    with

    the

    unit

    exam

    contributing

    to

    an

    overall

    accountability

    score.

    Another

    advantage

    isthatstudentswouldbetestedonrecentlylearnedmaterialatalltimes,sothatnuisanceeffectsof

    delayedrecallwouldnotinfluencemeasuresofhowwellstudentswerelearningwhattheteachers

    taught;thiswouldprobablyincreasereliabilityevenmore.

    10Ifthepreinstructionversionsarenotlongenoughtobereliabletoestimateinstructionaleffectsonindividualstudents,then

    thoseeffectswillbeestimatedonsomeaggregatelevel.

  • 8/8/2019 ResnickBergerSystemModel

    25/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    25

    Althoughtheillustrationaboveisuseful,inrealityitlikelywillnotbepossibletostringtogetherDAEs

    intoasingleunidimensionalmeasurementtowhichclassicalreliabilitycalculationsapply.Insteadwe

    believetheDAEswithinasubjectwillbeatleastmildlymultidimensional;ifweconsidereachDAEwithin

    asubjectwithinayearasameasureofoneproficiency,fiveDAEswouldbemeasuringfivedifferentbut

    substantivelyrelated

    proficiencies.

    These

    proficiencies

    are

    likely

    to

    be

    statistically

    related

    as

    well.

    For

    exampleinNAEP,proficiencysubscaleswithinthesamesubjectareaaretypicallycorrelated0.8or

    higher,andseldomlowerthan0.50.6.Wecanexploitthesecorrelationsbybuildingamultidimensional

    BayesianlatentvariablemodeltotakeadvantageofproficiencyestimatesfromoldDAEstohelp

    producemorepreciseproficiencyestimatesforthenextDAE,orindeedtoshortenthenextDAEwithno

    lossinmeasurementprecision.

    Forexample,suppose11wewishtoestimateastudentsproficiencywithamarginoferrorof0.2(SEM=

    0.1),andeachitemcontributesroughlyoneunitofFisherinformationtoproficiencyestimation(here

    weareborrowinganIRTformulationforspecificity),thenthestudentwouldneedtoanswerroughly

    100items.

    However,

    if

    we

    could

    already

    predict

    the

    proficiency

    on

    this

    DAE

    with

    a

    margin

    of

    error

    of

    0.4

    usingpastDAEperformance,wewouldneedonlyroughly20moreitemstoobtainamarginoferrorof

    0.2onthisDAE.

    ThiscalculationdependsonthestudentsperformanceonthenewDAEbeingconsistent,inawaythat

    canbemadepreciseusingBayesianmodeling,withhis/herperformanceonpastDAEs.Ifthestudents

    responsesonthenextDAEareinconsistentwithhisorherolderDAEresults,wewouldneedtodo

    followuptestingtogetamorepreciseestimateofthestudentsproficiency.Thusforstudentswho

    learnconsistentlyfromoneunittothenext,wecanexploitpastperformancetohelpestimate

    proficiencyonthecurrentunitofinstruction.However,forexample,forthestudentwhoperforms

    unusuallywell(orpoorly)onthecurrentunit,wecanusetheBayesianmachinerytoseethe

    inconsistency,and

    offer

    another

    block

    of

    items

    in

    order

    to

    more

    precisely

    assess

    that

    students

    learning.

    Asimilarprocessisusedinonlinetutoringsystemsandadaptivetestingsystems,andisanillustrationof

    thekindofusefulcustomizationthatisdiscussedbelow.

    Distributedcontentandinstructionallyvalidatedexamsareanextlogicalstepinendingthetestingbind

    anddevelopinganassessmentsystemthatwilldetectandrewardhighquality,effectiveteaching.

    Insteadofsupportingtheuseofpracticematerialsthatmimictheoldendofyeartests,statescan

    providehighqualityinstructionaltoolsthathelpteacherspreparestudentsforDAEexaminations.12

    Therewillbenoneedforinterimtests,sinceDAEsandrelatedformativeassessmentswilloccur

    throughouttheschoolyearattimesthatmakeinstructionalsense.Withthissystem,wegainabilityto

    measurea

    set

    of

    higher

    order

    skills

    that

    are

    not

    easily

    otherwise

    tested,

    including

    ones

    essential

    to

    collegeandcareerreadyperformanceinreading,writingandmathematics,withoutaddingenormous

    burdenoftesting.

    11Thenumbersarechosenheremostlyforcomputationalconvenience,andmaynotreflecttheactualvaluesobtainedfrom

    itemprecalibration,etc.12

    Foradescriptionofapproachestoprovidingthiskindofinstructionalguidanceinformsthatdonotsuppressteacher

    ingenuityandjudgment,seeResnick(inpress).

  • 8/8/2019 ResnickBergerSystemModel

    26/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    26

    AnotherwayofvalidatingtheDistributedAccountabilityExamsscoreswouldbetocomparethemto

    NAEPscores.StatesmightexpandtheuseoftheNAEPtest(everyyearand/orincreasethepercentage

    ofstudents).

    The

    American

    Examination

    System

    will

    also

    foster

    a

    rich

    environment

    of

    formative

    assessments

    that

    are

    educativeinwaysthatdirectlyresemblethesummativesystem,butwithmoredirectapplicationto

    dailyandweeklyinstruction.

    TheywouldbealignedwiththelearningtrajectoriesderivedfromtheCommonCoreStandards,andthusalignedwithwhatteachersneedtoteach.

    Theywouldmodelapproachestohowtoteach,andwould,attherequestofeducators,provideteachersstructuredopportunitiesforgainingexperienceinusingthoseteaching

    methods.

    Teachers

    would

    make

    these

    assessments

    part

    of

    their

    instructional

    routine,

    rather

    than

    anadditiontoit.Dataentry/recordkeepingburdenswillbeminimal,andteacherswill

    haveeasyandquickaccesstostudent andclasslevelreportingaswellastoolsto

    understandtheinstructionalsignificanceofthatdata.Bytrackingfidelityintheuseof

    thesediagnostictools,thesystemwillhelpteacherstousethemappropriately.

    Formativeassessmenttasksthatcannotbemachinescoredwillbeaccompaniedbysimplerubricsfor

    quicklyanalyzingthestudentwork.Teacherswillbeabletousedigitaldevicestorecordtheseanalyses.

    Throughthosedevices,theteacherswillalsobeprovidedwithsamplesofanswersthatcorrespondto

    eachlevelontherubric,tohelpthemcalibratetheirownanalyses.Asaformofprofessional

    developmentandtoimprovethereliabilityofanalyses,teacherscouldalsouploadthestudentwork

    intothesystem,alongwiththeiranalyses,togetfeedbackfromotherteachersorsubjectmatter

    experts.Theformativeassessmentswouldingeneralnotbeusedforsummativepurposes,butmetrics

    ofteacherfidelityinimplementingtheformativeassessments(andtheirassociatedinstructional

    recommendations)couldbeusedaspartofteacher/schoolperformancemanagement/accountability.

    Toenableteacherstomakebestuseofallofthese,thesystemwillprovideanonlineplatformwhich

    includes:thehoneycomb(totrackstudentprogressonlearningtrajectoriestowardscollegeandcareer

    readiness,andtoaccessdiagnosticandinstructionalsupportforeachstageofeachtrajectory);other

    dashboardtoolsfortrackingandanalyzingtheprogressofparticularstudentsandgroupsandstudents;

    andinterfacesforuploading,sharing,scoring,reportingandanalyzingstudentwork.

    Becausethe

    system

    will

    administer

    both

    types

    of

    assessments

    (Distributed

    Accountability

    Exams

    and

    formative),foraverylargenumberofstudentsoveraperiodofmultipleyearsandpotentiallyacross

    multiplestates,andcantakeaccountofvariousotherstudent,teacherandschooldata,itwouldalso

    eventuallybeabletoserveasanengineforthemasspersonalizationofassessments.Mass

    personalizationforformativeassessmentcouldbedoneacrossmanydimensionstoinclude:past

    studentperformanceonassessments;teacherandschoolcharacteristicsincludingaggregated

    assessmentperformanceofstudentsandothermeasuresofpreviouseffectiveness;andwhich

  • 8/8/2019 ResnickBergerSystemModel

    27/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    27

    curriculumwasused.Thisadaptiveorpersonalizedapproachtoassessmentwillenablegreaterprecision

    inthedata;closeralignmenttothetaughtcurriculum;andlesstesting.

    Thisinitialgoalformasspersonalizationwouldbetoapplyittoformativeassessment.Thereare,

    already

    in

    use,

    many

    modalities

    of

    formative

    assessment

    (diagnostic,

    progress

    monitoring,

    screening),

    eachincludingamixofassessmenttypes(multiplechoice,constructedresponse,observation).Someof

    thesearebestdeliveredaspartofgroupactivitiesandsomeoneononebetweenasinglestudentand

    teacher.Manyteachers/districtsuseablendoftheseformativeassessments,whichmakessensegiven

    thediverseneedsofparticularstudentsatdifferentmomentsoftheiracademicdevelopment;butsome

    teacherswhoarenotthemselvesexpertsinformativeassessmentmethodologiesstruggletodecide

    howbesttointegrateallofthesechoicesintotheirteachingroutinesfortheirparticularstudents.

    So,inadditiontoprovidingnewEducativeFormativeAssessments,theAmericanExaminationSystem

    wouldmasscustomizeamuchwiderrangeofformativeassessmentsatthestudentandclasslevel.This

    isadaptiveassessmentatthelevelaboveindividualitemsitfiguresoutwhichformativeassessmentto

    giveand

    when

    enabling

    teachers

    to

    get

    just

    the

    right

    next

    piece

    of

    information

    they

    need

    about

    their

    students,withoutwastingalotofclassroomorotherschooltime.Withthisplatform,teacherswillbe

    blendingmodesofassessmentinindividualizedwaysvaryingwhatdatatheycollectandhowbased

    onwhatisknownsofarabouteachstudent.Tosupportthis,thesystemwillhostabankofformative

    assessmentmaterials,tocoverthefullrangeofdiagnosticoptionsastateorschooldistrictwishesto

    use,fromopensourceorcommercialsources.

    ThemasspersonalizationprocesscanalsoaddtothereliabilityandefficiencyoftheDistributed

    AccountabilityExams.Above,weshowedhowaBayesianmodelcanusedatafrompreviousDAEsto

    makethenextDAEmoreefficient,aslongasthestudentisbehavingconsistentlyfromoneunittothe

    next.

    If

    the

    student

    seems

    to

    be

    performing

    unusually

    well

    (or

    poorly)

    then

    the

    Bayesian

    machinery

    can

    detectthisandsuggestacustomizationoftheDAEtofurtherexplorewhatthestudentknowsandcando.

    Technology

    Del i very

    Integratedonlinedeliveryofallassessments.Bothsummative(DistributedAccountability)and

    formativeassessmentsdeliveredtoteachersand/orstudentsacrossandwithinstatesthroughasingle

    softwareplatform.Thesystemenablesacoherentuseofmultipletypesofassessments(includingtypes

    thatwillbeadministeredonpaperandthenscanned)aspartofeffortstohavestudentsmeetthe

    standards

    and

    move

    along

    the

    skill

    trajectories

    towards

    college

    readiness

    and

    career

    readiness.

    ThehoneycomboffersaninteractiveonlinemapoflearningtrajectoriesbasedontheCommonCore

    Standards.Itprovidesanintuitiveandaccessiblewayforeducatorstounderstandandmakeuseof

    thesetrajectoriesallthewayfromPreKthrough12.Itwillalsoenablethemtograspthedependencies

    amongandwithinthetrajectoriesforinstance,identifyingwhatlevelofwhichspecificliteracyskills

    areneededtoachievemasteryofwhichmathematicsskills.Thistoolcanadaptedforuseinanystate

    whosestandardsincludelearningtrajectoriescomparabletothosethatwillbeintheCommonCore.

  • 8/8/2019 ResnickBergerSystemModel

    28/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    28

    TheAmericanExaminationSystemwilldeliverDistributedAccountabilityExams,formativeassessments

    andavailableinstructionaloptionsforeachstepalongeachlearningtrajectory,startingwith

    mathematicsandliteracyforGrades310.Thehoneycomballowseducatorstovisualizethesequenceof

    assessmentsandinstructionaloptionsalignedwiththelearningtrajectories;theywillbedisplayedfor

    educatorsat

    intervals

    along

    scales

    that

    include

    the

    entire

    range

    of

    skills

    to

    be

    taught

    in

    PreK

    12.

    Other

    (nonDAE)formativeassessmentsandinstructionaloptions,includingforPreK2and1112,canalsobe

    alignedanddeliveredthroughthesameinterfacetohelpeducatorsusetheminacoherentwayto

    identifyandaddresstheparticularlearningneedsofeachstudentastheymoveonthepathstowards

    collegeandcareerreadiness.

    Masscustomizationofassessments.BecausetheSystemwilladministeralltypesofassessmentsfora

    verylargenumberofstudentsoveraperiodofmultipleyearsandpotentiallyacrossmultiplestates,and

    cantakeaccountoftakeaccountofvariousothereducationdata,itwillbeabletoserveasanenginefor

    themasspersonalizationofassessments.(Dimensionsandbenefitsofmasscustomizationdiscussedin

    RigorousStandards

    and

    Good

    Instructional

    Practices

    section.)

    This

    technology

    is

    scalablecomputing

    powerissuchthatthereisnopracticallimitontheamountofeducationdatathatcouldbeincluded

    sothatasmorestatesandmoretypesofdataareincluded,themoreprecise(anduseful)the

    customizationbecomes.

    Scor ing

    Enableteachers/schoolstoscananduploadstudentwork.TheAmericanExaminationSystemdoesnot

    assumethatallassessmentswillalwaysbeconductedwithstudentssittingatcomputers.Givencurrent

    schoolinfrastructure,andgiventhechallengeofshowingmathematicsworkviaakeyboard,itmaybe

    moreefficienttocontinuetorelytosomeextentonpaperandpencilinputstoanotherwisedigital

    system.The

    continued

    value

    of

    these

    primitive

    recording

    tools

    seems

    especially

    compelling

    when

    one

    considersthatmuchofthevalueofthenewgenerationofassessmenttasksdependsonsolicitingopen

    endedexpressionsofstudentreasoningandthinkingandinthecaseofmathematicsthisincludes

    drawings,graphs,andexplanations.

    SotheAmericanExaminationSystemincludesaprocesstoenablescanning/digitalphotographing,

    uploading,andarchivingofverylargevolumesofpaperbasedstudentwork,includingforDistributed

    AccountabilityExams,toenableremotescoringaswellasonlinestudentportfolios.The

    scanning/photographingprocess,whichhasalreadybeentestedinNorthCarolinaclassrooms,puts

    minimalburdensonteachersorotherschoolstaffanddoesnotrequirelargeperschoolinvestmentsin

    hardwareornetworkinfrastructure.

    Remotescoringworkflowandinterface.Fortheforeseeablefuture,assessmentofopenended

    expressionsofstudentreasoningandthinkingwillrequireatleastsomeelementofhumanscoring.

    Doingthisrigorouslyandreliably,especiallyinasummativecontextwheretherearestakesforteachers

    andschoolsaswellasforstudents,requiresfindingacosteffectiveandtimeeffectiveworkflowsfor

    directingtheworktoremotescorers(includingcrossschoolorcrossstategrading/validationexercises;

    regradingofasampleofstudentpapersatthestatelevel).

  • 8/8/2019 ResnickBergerSystemModel

    29/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    29

    TheAmericanExaminationSystemplatformenablesthisworkflow.Itautomatesdeliveryofdigitized

    studentwork(includingpaperandpencilwork)toratersandthosevalidatingtheratings.Student

    identityiskeptprivate(theratersdontknowwhoseworkitis).Theonlineinterfaceforremoteraters

    presentsthemwiththestudentworkalongsidescoringformsbasedontherubricappropriateforthat

    typeof

    work.

    Formativeassessmentinterface.Forformativeassessment,theplatformprovidesascoringinterface

    forteacherssimilartotheoneforremotescoringofDistributedAccountabilityExams.Thisinterface

    includestoolstomarkupstudentworkandrecordnotes.Teacherscanalsoeasilyemailthemarkedup

    worktostudentsandtheirparents(sotheygetfeedbackonthesamedaythattheassessmentwas

    delivered).Whenelectronicessayscoringtechnologieswillbeusedtoaddprecisionand/orcanhelp

    teachersmanagethetriageassociatedwithknowingwhichpapersmightrequirespecialattention.

    SimilartoWirelessGenerationsmClassplatform,theAmericanExaminationSystemplatformcouldalso

    includemobiletoolsthatenableteacherstodigitallyrecordwhattheyareobservingwhiletheyare

    activelyinvolved

    with

    the

    class.

    Because

    formative

    assessment

    is

    part

    of

    each

    teachers

    day

    to

    day

    instruction,capturingtheresultingdataprovidesawaytotrackinstructionalfidelity(whetherthe

    teachersareusingtherecommendedgoodinstructionalpractices).

    Repor t i ng

    PlatformprovidesreportsandreportinginterfacesdescribedintheReportingsectionbelow.

    Summative Assessments That Measure Growth and That P roject

    Readiness

    TheCommonCoreprovidesafoundationforacriterionreferencedexaminationsystemthatisclosely

    tiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecoregradelevel

    standardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.13Theyarespecifiedata

    grainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondinglymeaningful

    assessmentstojudgeprogresstowardcollegeandcareerreadiness.

    TasksoritemsfortheDAEswouldbepretestedandcalibratedusingstandardclassicalandmulti

    dimensionalIRTframeworks.Attheoutset,twoversionsofeachDAEwouldbedeveloped.Thetwo

    versions,oneadministeredbeforeinstructionandoneafterwards,wouldbeusedbytheassessment

    developerstoestablishinstructionalvalidityoftheexams.AvailabilityofmultipleformsoftheDAEs

    wouldallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,alongwith

    teacherand

    school

    effectiveness.14

    13Someofthelearningsequencesinthestandardsarebasedonresearchconductedbymultiplescholarsoverthreedecades.

    Othersarebasedonwellhonedintuitivejudgmentsbyexpertscholarsandpractitioners.Allwillrequirefurthervalidationin

    useoverthecomingyears.Whatisnewandimportantinthecurrentcorestandardseffortisthatthestandardsareorganized

    intomultidimensionalsequencesoflearningthatcaninformbothassessmentandinstruction.14Ifthepreinstructionversionsarenotlongenoughtobereliabletoestimateinstructionaleffectsonindividualstudents,then

    thoseeffectswouldbeestimatedonsomeaggregatelevel.

  • 8/8/2019 ResnickBergerSystemModel

    30/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    30

    Studentgrowthforpurposesofassessingprogresstowardcollegeandcareerreadinesscanbe

    definedasprogressalongtheCommonCorelearningtrajectories.Inthisway,theAmerican

    ExaminationSystemmeasurestheextenttowhichstudentsareontrack(andstudentgrowth)allthe

    wayfromPreKthrough12.

    Thisapproachallowsmeasurementnotjustofwhetherstudentsareontrack,butalsoidentifieswhich

    specificskilldeficitsareholdingeachofthemback.Itallowsteacherstoanswerthequestion:what

    shouldtheinstructionalfocusberightnow,tomovethisparticularstudentorgroupsofstudents

    forwardtowardscollegeandcareerreadiness?Italsoidentifieswhereinstructionalpracticesand/or

    curriculummayneedtobereworked(wherethemeasuresshowthatthemajorityofstudentshavenot

    gainedaskill).

    Thehoneycombservesasavaluablewaytodisplaythesemeasuresofstudentgrowthforthestudents

    andtheirparents,becauseitoffersaneasilycomprehensiblemapofthatstudentsprogress,relativeto

    time,andtothestandardsforeachgradeaswellastotheultimategoalsofcollegeandcareer

    readiness.

    Accessibility

    AllpartsofoursystemincorporatetheprinciplesofUniversalDesignforLearning.

    TheexamscanandshouldremovebarriersfornonnativeEnglishspeakersandforstudentswithspecial

    learningneeds.FornonnativeEnglishstudents,thetestsshouldbedesignedsothatlanguagewillnot

    unnecessarilymakethemeaningofthequestionsunclearsothatthesestudentswillunderstandthe

    examssothattheycanbemeasuredfairly.

    TheDAEswouldmirrortheinstructionthatstudentswillreceiveintheclassroom;wewouldcarefully

    designandvalidateaccessibilityforstudentswithlowincidencedisabilities.Somestudentsmaydeviate

    fromthelearningtrajectories,buttheyshouldremainfocusedonacademiccontent.Thesystemshould

    maintainexpectationsforallstudentsandguideteachersonhowallstudentscanmasterconceptsand

    skills.Assessmentswouldbedesignedforallstudents,modificationswouldallowasmanystudentsas

    possibletobevalidlyassessedwithinthesystem,andtherewouldbeflexibilityintermsofmodalityof

    testadministrationanditemtype.

    Technical Quality

    ThenewCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystem

    that

    is

    closely

    tied

    to

    instruction

    yet

    meets

    crucial

    criteria

    of

    technical

    quality

    of

    assessment.

    The

    core

    gradelevelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.Theyare

    specifiedatagrainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly

    meaningfulassessments.

    TheAmericanExaminationSystemincludesDistributedAccountabilityExams,foruseoverthecourseof

    theschoolyear,whichmeasurethespecifichigherorderskillsthatarearticulatedintheCommonCore

    Standardsandstatestandards,aswellasbasicknowledge.TheDistributedAccountabilityExamswill

  • 8/8/2019 ResnickBergerSystemModel

    31/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    31

    includeextendedwrittenworkandotheropenendedexpressionsofstudentreasoningandthinking;in

    mathematics,thesewouldincludedrawings,graphs,andexplanations.Theywillassessbasicknowledge

    bothwithintheseconstructedperformancesand,whereappropriate,inclustersofmultiplechoice

    items.

    DistributedAccountabilityExamswilladdresseachoftheskills/topicsarticulatedforeachyearofthe

    stateandcommonstandards.Theywouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.

    Eachexamwouldprovideareliableestimateofstudentknowledgeonthecontentofaninstructional

    unitthatisexplicitlytargetedtoastandard,orsetofstandards,intheCore.Thecollectionofexam

    scoresforayear)wouldprovideavalidestimateoftheextenttowhichastudent(class,school)has

    masteredthecontentspecifiedbythestandardsforthatyear.

    Cont ent Va l id i t y

    Theexamswouldmatchcloselyinbothcontentandformthecontentthatisexpectedtobetaughtin

    eachof

    the

    instructional

    units.

    New

    instructional

    units,

    explicitly

    linked

    to

    the

    Core

    standards

    would

    be

    createdtoanchorthecontentvalidityoftheunits.Teamsofindependentcontentandinstructional

    expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsand

    instructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodelinstructional

    units.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandardsthatnow

    troublesmanystateassessments.

    I ns t ruc t i ona l Val i d i t y

    Assessmentsareconsideredinstructionallyvalidwhenstudentperformanceimprovesafterquality

    instructiononthecontentoftheassessment.Ourdevelopmentprocesswouldincludetestsof

    instructionalvalidity,

    similar

    to

    the

    experiment

    based

    ones

    used

    by

    the

    Pittsburgh

    Science

    of

    Learning

    Center.Thesetestswouldinvolvepanelsofteacherswithgoodknowledgeofaninstructionalunits

    contentaswellasdemonstrablygoodpedagogicalskills(asjudgedbyanexpertpanel).Theseteachers

    wouldbeputintofourgroups.Twoofthegroupswouldteachtheinstructionalunitthatcorrespondsto

    theDistributedAccountabilityExam.Inoneofthesegroups,theywouldgetpretestAfortheirstudents

    beforetheunitistaughtandthenthestudentswouldtakeTestB.Inthesecondofthesegroups,the

    testsareflipped:TestBisthepretestandTestAisgiventostudentsaftertheunitistaught.Inthethird

    andfourthgroups,studentswouldnotbetaughttheparticularinstructionalunitatthattime,butwould

    stillbegiventhepretestsandposttests(onegroupwithAasthepretestandBastheposttest,the

    otherwithBasthepretestandAastheposttest).Onlyteststhat,throughtheseexperiments,

    systematicallyregister

    improvements

    in

    student

    performance

    as

    a

    result

    of

    corresponding

    instruction

    (anddemonstrateequivalencethroughthepre andposttestswaps)willbeincludedinourDistributed

    AccountabilityExams.

    Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabankoftasksthatwillbe

    developedaspartofthisvalidationprocess.ItemsortasksfortheDAEswillalsobepretestedand

    calibratedusingstandardclassicalandmultidimensionalIRTframeworks.

  • 8/8/2019 ResnickBergerSystemModel

    32/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    32

    Rel iabi l i ty

    DistributedAccountabilityExamswouldcontainamixofshortconstructedresponseitems,andmore

    extendedwrittenresponses,alongwithsetsofmultiplechoiceitemsasappropriatetothestandard

    being

    examined.

    Short

    and

    long

    constructed

    response

    components

    would

    require

    human

    scoring.

    Researchhasestablishedthatwhenconstructedresponsetasksarewelltargeted,scoringrubricsare

    specificandgradersaretrained,ahighlevelofinterraterreliabilitycanbeattained(Mariano&Junker,

    2007;Patzetal.,2002;Rayn&Shepard,2008).

    Studentresponsesonconstructedresponseitemscouldbegradedlocally(withinthesameschoolbut

    notbythestudentsownteacher),orbygeographicallyandsociallyremotescorers(includingteachers

    elsewhereinthedistrictorstate).Thesegradescouldbevalidatedusingoneofanumberofmethods

    thathavebeenusedinEuropeancountries(e.g.,crossschoolorcrossstategradingexercises;re

    gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationinthegradingexamsand

    therelatedvalidationexercises(someofwhichcouldbefacetoface)isagoodprocessforprofessional

    learningand

    is

    used

    in

    most

    countries.

    Though

    the

    process

    is

    more

    costly

    in

    dollars

    than

    machine

    scoring,itisaneducativeprocessworthbuildingintoourExaminationSystem.Gradevalidationatscale

    wouldbesupportedbytheAmericanExaminationSystemplatform,whichcanenablerapid,cost

    effectiveremotescanning,transmission,grading,validation,andreporting.Toensureprotectionof

    studentprivacyrights,thesystemhasthecapacitytoanonymizethedigitizedstudentworkbefore

    routingittotheremotescorersandvalidators,aswellas,forlimitedpurposes,automaticessayscoring

    technologies.

    TheDistributedAccountabilityExamsopenthepossibilityforincreaseduseofconstructedresponses

    becausetheyaredistributedoverthecourseoftheyear,yieldingseveraltimesmoreopportunityto

    collect

    data

    than

    current

    end

    of

    year

    tests.

    This

    also

    brings

    benefits

    in

    terms

    of

    increased

    test

    reliability.

    Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEs

    takentogetherwouldbe5*(0.7)/(1+4*0.7)=0.92.Ifinstead,halfofeachDAEstestingtimewereused

    forapretestonthenextinstructionalunitorsimplyforcalibratingfuturetestitems,theimprovement

    wouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.

    Yettoobtainthesemorereliableresults,studentswouldnothavetositfora5hourexam,oreventake

    anendofyearexam.Theyjustwouldhavetotakeunitexamsastheynormallywouldinthecourseof

    teaching,butnowwiththeunitexamcontributingtoanoverallaccountabilityscore.Anotheradvantage

    isthatstudentswouldbetestedonrecentlylearnedmaterialatalltimes,sothatnuisanceeffectsof

    delayedrecallwouldnotinfluencemeasuresofhowwellstudentswerelearningwhattheteachers

    taught;this

    would

    probably

    increase

    reliability

    even

    more.

    Althoughtheillustrationaboveisuseful,inrealityitlikelywillnotbepossibletostringtogetherDAEs

    intoasingleunidimensionalmeasurementtowhichclassicalreliabilitycalculationsapply.Insteadwe

    believetheDAEswithinasubjectwillbeatleastmildlymultidimensional;ifweconsidereachDAEwithin

    asubjectwithinayearasameasureofoneproficiency,fiveDAEswouldbemeasuringfivedifferentbut

    substantivelyrelatedproficiencies.Theseproficienciesarelikelytobestatisticallyrelatedaswell.For

    exampleinNAEP,proficiencysubscaleswithinthesamesubjectareaaretypicallycorrelated0.8or

  • 8/8/2019 ResnickBergerSystemModel

    33/60

  • 8/8/2019 ResnickBergerSystemModel

    34/60

    National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

    34

    Produce Resul ts That Can Be Aggreg ated a t th e Classroom , School , D ist r i c t , and

    Sta t e Leve ls

    Yes.

    Produce Repor