dr. mark gierl, professor and canada research chair centre for research in applied measurement and...

29
Dr. Mark Gierl, Dr. Mark Gierl, Professor and Canada Research Chair Professor and Canada Research Chair Centre for Research in Applied Measurement and Evaluation Centre for Research in Applied Measurement and Evaluation University of Alberta University of Alberta How You Can Learn To Love Large-Scale Assessment: How You Can Learn To Love Large-Scale Assessment: Let Me Count the Ways” Let Me Count the Ways” An Outline For Our Future At The University of Alberta An Outline For Our Future At The University of Alberta Presentation at the Centre for Teaching and Learning (CTL) “Teaching Big” Symposium University of Alberta—August, 2012

Upload: camron-white

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Dr. Mark Gierl, Dr. Mark Gierl, Professor and Canada Research ChairProfessor and Canada Research ChairCentre for Research in Applied Measurement and EvaluationCentre for Research in Applied Measurement and Evaluation

University of AlbertaUniversity of Alberta

““How You Can Learn To Love Large-Scale Assessment:How You Can Learn To Love Large-Scale Assessment:

Let Me Count the Ways”Let Me Count the Ways”

An Outline For Our Future At The University of AlbertaAn Outline For Our Future At The University of Alberta

Presentation at the Centre for Teaching and Learning (CTL) “Teaching Big” Symposium

University of Alberta—August, 2012

Centre for Research in Applied Measurement and Evaluation

TO BEGIN…TO BEGIN…

• Educational measurement is a discipline and a profession focused on the use of methodologies for assigning test scores to examinees, typically on a numeric scale, so we can make inferences about their knowledge, skills, and competencies

• Once a static and largely quantitatively-driven field, recent developments in the learning sciences, mathematical statistics, computer technology, educational psychology, and computing science are creating profound changes in educational measurement—as a result, our contemporary assessments barely resemble their predecessors of decade ago

Centre for Research in Applied Measurement and Evaluation

OVERVIEWOVERVIEWBACKGROUND

•Measurement, Evaluation, and Cognition (MEC) Program in the Department of Educational Psychology•Centre for Research in Applied Measurement and Evaluation (CRAME)

PRESENTATION

•Four principles of testing in large classrooms•Two applications for putting principles into practice•Plea for our collective future

•My presentation today will have four key messages

Centre for Research in Applied Measurement and Evaluation

OVERVIEWOVERVIEW• Measurement, Evaluation, and Cognition (MEC) is 1 or 8 areas in the

Department of Educational Psychology

• Graduate students (16 currently) who receive an MEd or PhD in MEC specialize in educational measurement, statistics, research methods, cognition applied to assessment, and/or program evaluation

• Our graduates work in the private sector at testing companies like the Educational Testing Service (ETS) or in the public sector for different agencies (e.g., Alberta Education; Medical Council of Canada)

• MEC has five faculty members: Drs. Mark Gierl, Jacqueline Leighton, Ying Cui, Cheryl Poth, and Sharla King

• The Centre for Research in Applied Measurement and Evaluation (CRAME) is a centre within MEC focused on conducting research in the areas of educational measurement, cognitive psychology , and statistics with the goal of making assessment an integral part of learning and instruction

Centre for Research in Applied Measurement and Evaluation

OVERVIEWOVERVIEW

MESSAGE #1:MESSAGE #1: Educational measurement is a specialized discipline where you can earn a graduate degree at both the MEd and PhD levels—this indicates that testing is embedded in a discipline that requires rigorous and comprehensive training

MESSAGE #2:MESSAGE #2: You have colleagues at the University of Alberta who actually love to talk about tests and who train graduate students who also like and excel in our discipline [resources exist on campus]

Centre for Research in Applied Measurement and Evaluation

HOW TO MAKE A GOOD MULTIPLE-CHOICE TEST ITEM

The item measures specific content, as outlined in the test specifications.

The item is based on important topic in the curriculum and is designed to measure key thinking and problem-solving skills.

The item is carefully edited, formatted, and presented using correct grammar, punctuation, capitalization, and spelling.

The central idea in included in the stem, not the options.

The stem of the item is worded positively, and avoids negatives such as NOT or EXCEPT.

Only one of the options is clearly correct.

The correct option is not cued due to item writing errors such as presenting a conspicuous correct options or blatantly incorrect options.

All of the distractors are plausible (e.g., basing distractors on typical errors made by students)

Etc., etc., etc., etc., etc., etc….

““TESTING TIPS BY MARK”TESTING TIPS BY MARK”

Centre for Research in Applied Measurement and Evaluation

OUR FOUR PRINCIPLESOUR FOUR PRINCIPLESPRINCIPLE #1:PRINCIPLE #1: We will shift from infrequent summative assessments (e.g., 2 midterms + final) to more frequent formative assessment (e.g., 8-10 exams or more per term)

PRINCIPLE #2:PRINCIPLE #2: Testing on-demand is required where students can write exams at any time and at any location

PRINCIPLE #3:PRINCIPLE #3: Assessments will be scored immediately and students will receive both instant and detailed feedback on their overall performance as well as their problem-solving strengths and weaknesses

PRINCIPLE #4: PRINCIPLE #4: You will spend less time and less effort implementing these principles in your large classes compared to the amount of time you currently spend on assessment-related activities—in fact, much less

Centre for Research in Applied Measurement and Evaluation

APPLICATION #1:APPLICATION #1:COMPUTER-BASED TESTINGCOMPUTER-BASED TESTING

COMPUTED-BASED TESTINGCOMPUTED-BASED TESTING

Centre for Research in Applied Measurement and Evaluation

Test DevelopmentTest Development Test AdministrationTest Administration

Test ReportingTest Reporting

PAPER-BASED TESTINGPAPER-BASED TESTING

Centre for Research in Applied Measurement and Evaluation

COMPUTED-BASED TESTINGCOMPUTED-BASED TESTING

Centre for Research in Applied Measurement and Evaluation

COMPUTED-BASED TESTINGCOMPUTED-BASED TESTING

AUTOMATED

Centre for Research in Applied Measurement and Evaluation

COMPUTED-BASED TESTINGCOMPUTED-BASED TESTING

Centre for Research in Applied Measurement and Evaluation

COMPUTED-BASED TESTINGCOMPUTED-BASED TESTING

Centre for Research in Applied Measurement and Evaluation

COMPUTED-BASED TESTINGCOMPUTED-BASED TESTING

Centre for Research in Applied Measurement and Evaluation

• In short, computer-based testing is a very good thing and it is here to stay—computer-based testing either eliminates or automates 2/3 of the testing activities that, currently, you do manually

• Admittedly, we are focusing on examples that use objectively-scored assessment items—but examples can also be cited for automated essay scoring of student-produced assessment tasks

• The architecture for a computer-based testing system is feasible [PAPER –BASED TESTING IS DEAD]

MESSAGE #3:MESSAGE #3: The University of Alberta needs a computer-based testing system because YOU need this system for all of your classes, big and small

COMPUTED-BASED TESTINGCOMPUTED-BASED TESTING

Centre for Research in Applied Measurement and Evaluation

COMPUTED-BASED TESTINGCOMPUTED-BASED TESTINGTest DevelopmentTest Development Test AdministrationTest Administration

Test ReportingTest Reporting

*ELIMINATED*

*AUTOMATED*

Centre for Research in Applied Measurement and Evaluation

APPLICATION #2:APPLICATION #2:AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement and Evaluation

ONE WAY TO CREATE TEST ITEMS…ONE WAY TO CREATE TEST ITEMS…

Professor writing test items the day Professor writing test items the day before the midterm exam… before the midterm exam…

Centre for Research in Applied Measurement and Evaluation

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Another way to address this item development challenge is with automatic item generation (AIG)

Automatic item generation is the process of using item models to generate test items with the aid of computer technology—with this approach, hundreds or even thousands of items can be generated with a single item model

While the idea of automatic item generation may be viewed as a “dream come true” —I am here to tell you that the dream is well within our reach because of developments in modern educational measurement theory

A 54-year-old woman has a laparoscopic cholecystectomy. On post-operative day 3 she has a temperature of 38.5c. Physical examination reveal a red and tender wound and calf tenderness. Which one of the following is the best next step?

a. Mobilizeb. Antibioticsc. Anti coagulationd. Reopen the wound

Centre for Research in Applied Measurement and Evaluation

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement and Evaluation

• That ugly diagram is a cognitive model highlighting the knowledge, skills, and content required to make a medical diagnosis

The model includes three key outcomes:

1.Identify THE PROBLEM (i.e., Post-Operative Fever);

2.Specify Sources of information required to diagnose the problem (e.g., Type of Surgery); and

3. Describe KEY features within each information source (e.g., Guarding and Rebound) needed to create different instances of the problem

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement and Evaluation

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement and Evaluation

• Next, an item models is created, where an item model is like a template or a mould of the assessment task (i.e., it’s a target where we want to place the content in the test item)

A 54-year-old woman has a <TYPE OF SURGERY>. On post-operative day <Timing of Fever> the patient has a temperature of 38.5c. Physical examination reveal <Physical Examination>. Which one of the following is the best next step?

Type of Surgery: Gastrectomy, Right Hemicolectomy, Left Hemicolectomy, Appendectomy, Laparoscopic Cholecystectomy

Timing of Fever: 1 to 6 days

Physical Examination: Red and Tender Wound, Guarding and Rebound, Abdominal Tenderness, Calf Tenderness

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement and Evaluation

• Finally, we combine this information systematically to produce new items

• To accomplish this complex combinatoric task, we created software for item generation called IGOR (IItem GGeneratOROR)

• IGOR was programmed using JAVA

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement and Evaluation

• When we used our method with 5 different item models developed for the MCC QE Part I in surgery, more than 20,000 items were generated:

Item Model 1: Gallstones—288Item Model 2: Hernias—256Item Model 3: Aneurism—5,184Item Model 4: Post Operation Management—7,488Item Model 5: Post Operation Fever—7,680

• We have also developed item models at the K-12 levels in Language Arts, Social, Science, Math as well as AP Biology and Architecture in addition to 10 different content areas in Medicine producing millions of test items

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement and Evaluation

16. A 60-year-old woman has been booked for a laparoscopic cholecystectomy for symptomatic gallstones. Prior to her surgery, she presents to the Emergency Department with a history of feeling faint and unwell. She has had rigors. On physical examination, her temperature is 40 C. Her white blood count is 22 x 10 9/L; aspartate aminotransferase 63 U/L; alanine aminotransferase 78 U/L; alkaline phosphatase 450 U/L; amylase level 200 U/L and bilirubin 50 µmol/L. Which one of the following is the most likely diagnosis?

(a) Cholecystitis.(b) Cholangitis.(c) Pancreatitis.(d) Hepatic abscess.(e) Duodenal ulcer.

39. An obese 61-year-old male collapsed with sudden pain at a shopping center and is brought to hospital by ambulance. He is diaphoretic. His pulse is 96/minute; blood pressure 100/70 mm Hg; he complains of severe pain in his abdomen and left flank. Which one of the following is the most likely diagnosis?

(a) Acute hemorrhagic pancreatitis.(b) Ruptured aortic aneurysm.(c) Mesenteric vascular occlusion.(d) Acute diverticulitis.(e) Volvulus of sigmoid colon.

AUTOMATIC ITEM GENERATIONAUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement and Evaluation

• Educational measurement is a specialized discipline requiring advanced graduate training—this implies that assessment contains many complex and thorny issues but please remember that you have colleagues on-campus who can help you deal with these issues

• Our discipline is undergoing profound changes that will yield much better methods for evaluating students while at the same time requiring less time and effort for the examiner because much of the unpleasant work is being automated—computer-based testing and automatic item generation are but two examples from a list of many

CONCLUSIONCONCLUSION

MESSAGE #4:MESSAGE #4: There is no going back to the “good old days”…therefore, we must work together to structure our future at the University of Alberta by building and implementing these new assessment systems…but also recognize that this work is just getting started

Centre for Research in Applied Measurement and Evaluation

THANK YOUTHANK YOUDr. Mark J. Gierl ([email protected])Dr. Mark J. Gierl ([email protected])

6-110 Education Centre North6-110 Education Centre North