introduction & rationale

Click here to load reader

Upload: harper-buchanan

Post on 31-Dec-2015

52 views

Category:

Documents


0 download

DESCRIPTION

Introduction & Rationale. 林陳涌 師大生物系. 測驗的兩個向度. 心理計量 (Psychometric) 教育計量 (Edumetric) Carver, R.P. (1974). Two dimensions of tests: Psychometric and Edumetric. American Psychologists, July, pp. 512-518. 1. 目的 評測個別差異。 2. 試題的選擇 取 P 值為 50% 者,以求 D 值最大,分數的變異量 (Variance) 大;信效度才能理想。 ‧. - PowerPoint PPT Presentation

TRANSCRIPT

  • Introduction & Rationale

  • (Psychometric)

    (Edumetric)

    Carver, R.P. (1974). Two dimensions of tests: Psychometric and Edumetric. American Psychologists, July, pp. 512-518.

  • (Dimensions) (Psychometric) (Edumetric)1.

    2. P50%D(Variance)

    (Gain/Growth)

    0100% (Sensitive to Gain)

  • (Dimensions) (Psychometric) (Edumetric)3. (Consistency)SEmDependent on Variances4. (Convergent & Discriminant Validity)

    (Alternate Forms)NOT Dependent on Variances

    (Sensitive to Gain)

  • E. L. ThorndikeIf A Thing Exists,It Exists in Some Amount.

    If It Exists in Some AmountIt Can be Measured.

  • A Grade isPaul Dresse, (1957). Basic College Quarterly.An Inadequate Report of an Inadequate Judgement by a Biased and Variable Judge of the extent to which a student has attended an Undefined Level of Mastery of an Unknown Proportion of an Indefinite Amount of Material.

  • D. L. StufflebeamThe Purpose of Evaluation isTO Improve,NOT to Prove.

  • .

  • (measurement)

  • (Evaluation), .

    Judgment of merit, usually qualitatively; Measurement is quantitative.

  • (Assessment or testing):

    :

    :

  • (Assessment or testing):

  • (Ability Tests)

    Assess the performance or level of skills of individuals in well-defined subject areas. (Satterly, 1990)

  • (Aptitude Tests)

    Indicate the probability with which new material will be learned. (Satterly, 1990)

  • (Cognition )

    Includes the processes of perception, thinking, reasoning, understanding, problem solving, and remembering. (Satterly, 1990)

  • (Cognitive Style Tests)

    Assess their typical approach or ways of learning and thinking in a variety of tasks. (Satterly, 1990)

  • (Learning Ability Tests)

    Seek to measure the ability to respond to instruction and so are measures of potential rather than achievement.(Hegarty, 1990)

  • 1.(Evaluation in the Teaching of Science)(Evaluation of Science Teaching)

  • () ()

  • 1. (Achievement Test)2. (Aptitude Test)3. (Intelligence Test)

  • 1. (Preference Test) 2. (Belief)

  • 1. (Aptitude Test) 2. (Intelligence Test)

    /

  • :

    (Placement Evaluation)(Diagnostic Evaluation)(Formative Evaluation)(Summative Evaluation)

  • (Norm-Referenced Evaluation) (Norm group)2.(Criterion-Referenced Evaluation)

  • NRE & CRE

    NRE

    CRE

  • NRE & CRE

    NRE

    CRE

  • NRTCRT

  • NRTCRT

  • NRTCRT

  • NRTCRT

  • NRT()CRT()

  • NRTCRT

  • &&&&&&

  • .

  • 1. (Measurement)(Test) (Assessment)(Evaluation) 2. (Summative Evaluation) (Formative Evaluation)

  • 1.

    2. --- (Construct)

    3.

    4.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .1. Primarily group-administered testsA variety of administrative formats including large groups, small groups, and individuals.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .2. Primarily paper-and-pencil testsA variety of test formats including pictorial and laboratory performance tests.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .3. Primarily end-of-course summative assessmentA variety of pretest, diagnostic and formative types of measurements.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .4. Primarily measurement of low-level cognitive outcomesThe inclusion of higher level cognitive outcomes (analysis, evaluation, critical thinking), as well as the measurement of affective (attitudes, interests, and values) and psychomotor outcomes.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .5. Primarily Norm-Referenced Achievement TestingThe inclusion of more Criterion-Referenced Assessment, mastery testing, and self and peer evaluation.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .6. Primarily measurement of facts and principles of scienceThe inclusion of objectives related to the processes of science, the nature of science, and the interrelationship of science, technology, and society.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .7. Primarily measurement of student achievementThe inclusion of measuring the effects of programs, curricula, and teaching techniques.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .8. Primarily teacher-made testsThe combined use of teacher-made tests, standardized tests, research instruments, and items from collections assembled by teachers, projects, and other sources.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .9. Primarily concern with total test scoresInterest in sub-test performance, item difficulty and discrimination, all aided by mechanical and computerized facilities.

  • Predicted Trends in Measurement and Evaluation of Science Instruction From . . . . . . To . . . . . .10. Primarily a one-dimensional format of evaluation (e.g., a numerical or letter grade)A multidimentional system of reporting student progress with respect to such variables as concepts, processes, laboratory procedures, classroom discussion, and problem-solving skills.

  • National Science Education Standards

    Assessment Standards

  • Assessment StandardsNational Science Education StandardsAssessments must be consistent with the decisions they are designed to inform.Assessments are deliberately designed.Assessments have explicitly stated purposes.The relationship between the decisions and the data is clear.

  • Assessment StandardsNational Science Education StandardsAchievement and opportunity to learn science must be assessed.Achievement data collected focus on the science content that is most important for students to learn.Opportunity-to-learn data collected focus on the most powerful indicators.Equal attention must be given to the assessment of opportunity to learn and to the assessment of student achievement.

  • Assessment StandardsNational Science Education StandardsThe technical quality of the data collected is well matched to the decisions and actions taken on the basis of their interpretation.The feature that is claimed to be measured is actually measured.Assessment tasks are authentic.Students have adequate opportunity to demonstrate their achievements.Assessment tasks and methods of presenting them provide data that are sufficiently stable to lead to the same decisions if used at different times.

  • Assessment StandardsNational Science Education StandardsAssessment practices must be fair.Stereotype, language,

  • Assessment StandardsNational Science Education StandardsThe inferences from assessments about student achievement and opportunity to learn must be sound.

  • 1.

    2.

    3.

    4.

  • 5.

    6.

    7.

  • The End!

  • 1. (Peer Evaluation) 2. (Multi-talent Evaluation) 3. (Evaluation in Learning Through Inquiry) IRA (Inquiry Role Approach)

  • 4.(Laboratory Work Evaluation) 1)(Science Process Skills) 2) 3) 5.(Self-Evaluation)

  • (Against)/

  • ()/()(Programmed Instruction)

  • NRT-CRT.doc

    Comparisons Between NRT and CRT

    Attribute

    Norm-Referenced

    Test ( NRT )

    Criterion-Referenced

    Test ( CRT )

    State of the Art

    Developmental

    Cost

    Content

    Validity &

    Coverage

    Score

    Interpretation

    Highly developed;

    Technically sound

    Major

    Based on a specified

    content domain,

    appropriately sampled,

    and tending to have

    fewer items per

    objective.

    Tends to be general and

    broad.

    In terms of a specified

    norm group (e.g.

    percentile ranks, grade

    equivalents)

    Mixed & variable;

    Technology developing

    Moderate to major

    Based on a specified

    content domain,

    appropriately sampled,

    and tending to have

    more items per

    objective.

    Tends to be specific and

    narrow.

    In terms of a specified

    criterion of proficiency

    (e.g. percent mastery)

  • NRT-CRT.doc

    Comparisons Between NRT and CRT

    Attribute

    Norm-Referenced

    Test ( NRT )

    Criterion-Referenced

    Test ( CRT )

    Item

    Development

    Standardized

    Sensitivity to

    Instruction

    Reliability

    Application

    Two main considerations:

    Content Validity and

    Item Discrimination

    Yes

    Tends to be low to

    moderate, because of

    its general purpose nature

    High

    To assess the effectiveness of given instructional treat-

    ments achieving eneral

    instructional objectives

    One main considerations:

    Content Validity

    Usually

    Tends to be high, when

    closely matched to a

    particular instructional

    situation

    Can be high, but sometimes hard to establish

    To assess the effectiveness of given instructional treat-

    ments in achieving

    specific instructional objectives.

  • (Against)/

  • ()/()(Programmed Instruction)