jiptiain iffahmursy 8344 6 mythesis

45
CHAPTER I INTRODUCTION 1.1 Background of the study In teaching learning activities, testing has an important role. The results of teaching without evaluating or testing will be useless, because testing help to show the achievement of the objective of education. From the result of the test can be seen the teaching learning process is successful or not. Both testing and teaching are so closely interrelated that it is virtually impossible to work in either field without being constantly concerned with the other. 1 It is clear that relation between testing and teaching can’t be ignored. Teachers, students, and school want to know their effort to achieve the educational objectives are successful or not. They will be satisfied if their effort are successful. But if their effort unsuccessful so they will change their ways. Test is used to provide information concerning not only with the individual students performance, but also with the effectiveness of teaching 1 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.5 1

Upload: alpacino-qoeda

Post on 09-Jul-2016

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Jiptiain Iffahmursy 8344 6 Mythesis

1

CHAPTER I

INTRODUCTION

1.1 Background of the study

In teaching learning activities, testing has an important role. The

results of teaching without evaluating or testing will be useless, because

testing help to show the achievement of the objective of education. From

the result of the test can be seen the teaching learning process is successful

or not. Both testing and teaching are so closely interrelated that it is

virtually impossible to work in either field without being constantly

concerned with the other.1

It is clear that relation between testing and teaching can’t be

ignored. Teachers, students, and school want to know their effort to

achieve the educational objectives are successful or not. They will be

satisfied if their effort are successful. But if their effort unsuccessful so

they will change their ways.

Test is used to provide information concerning not only with the

individual students performance, but also with the effectiveness of teaching

1 Heaton, J.B. 1988. Writing English Language Test. New York:Longman.pg.5

1

Page 2: Jiptiain Iffahmursy 8344 6 Mythesis

2

learning activities. And test is one type of measurement is used to measure

student's behavior goal of instructions. For teachers, a test is used to

measured the effectiveness of teaching learning activities.

That the classroom test is concerned with evaluation for the purpose

of enabling teachers to increase their own effectiveness by making

appropriate in their teaching to enable of the students in class to benefit

more.2

And test is used to selection of the students to enter to the next

level, a test will show the student competent or not to enter to higher level.3

That the test may be made to determine whether or not they should enter

the program or the test is made to determine whether or not the students are

ready to continue the level.

Besides the purposes above, test is also used to diagnose the

strength and weakness area of the students4, the test is qualified enough,

the teacher will know the strength and the weakness of their students. To

2 Heaton, J.B. 1988. Writing English Language Test. New York:Longman.pg.63 Bachman, Lyle F. 1990. Fundamental Considerations in Language Testing.USA: Oxford University Press.pg.58

4 Arikunto, Suharsimi. 2003. Prosedur Penelitian Suatu Pendekatan Praktik.Jakarta : PT. Rineka Cipta.pg.10

Page 3: Jiptiain Iffahmursy 8344 6 Mythesis

3

knowing the students weakness, the teacher to solve the student’s problem.

The same opinion with Arikunto, that a good classroom test will also help

to locate the precise areas of difficulty encountered by the class or by the

individual student.5 Therefore, it is necessary for the teacher to know their

student’s weakness and difficulties.

Because testing is important in teaching, teachers as a test

constructor should be able to construct a good test. Teachers who construct

a good test will give contribution to students education. On the other hand,

teachers who have lack of skill in constructing a good test will give less

contribution or might even make student’s education become worst. The

test will fulfill the purpose of testing if it has the characteristic of a good

test. There are many ways to know the quality of a good test.

From those evaluation experts, each expert mentions both validity

and reliability. It can be said that both validity and reliability are the

important thing for good test quality.

There are two kinds of test, classification of the test from types of

the test according to the test role and types of the test according to the test

maker. In standardized test, the test is made by professional testing services

5 Heaton, J.B. 1988. Writing English Language Test. New York:Longman.pg.6

Page 4: Jiptiain Iffahmursy 8344 6 Mythesis

4

that the test is tried on first, analyzed, and revised before being used, the

example of this test is UAN and SPMB.

Standardized test is design to be used with thousands and

sometimes hundreds of thousands of subjects throughout the nation or the

world, and prepared (and perhaps administered, scored, and interpreted) by

a team of testing specialists.6

However, in the teacher-made test, the test is made by the teacher

self or group of teachers without being tried on first, analyzed, and revised.

Teacher made test is a test made by the teacher himself or group of

teachers using untried out, unanalyzed, and unrevised test items.7 Since the

test is prepared, administrated, and scored by one teacher without being

tried out, analyzed, and revised, the reliability of the teacher made test is

considered to be low. The teacher made test has mid or lower reliability

than standardized test. As the result, the test is far from the expectation..

UAS and UTS are the examples of the teacher made test.

Teachers as a test constructor should be able to construct a good

test for their students. A good test should be valid and reliable. Moreover,

6 Harris, David P. 1969. Testing Language as a Second Language. USA:McGraw-Hill.pg.2

7 Arikunto, Suharsimi. 2003. Prosedur Penelitian Suatu PendekatanPraktik. Jakarta : PT. Rineka Cipta.pg.144

Page 5: Jiptiain Iffahmursy 8344 6 Mythesis

5

the quality of the test made by the teacher is doubtful, because the test

unanalyzed by the other. It is still to be questioned whether the test is valid

and reliable or not since teachers seldom analyze and revise the test they

made. Teachers prefer use a unanalyzed and unrevised test items. It is

supported by Arikunto, that teachers rarely use an analyzed and revised test

items. Knowing this fact, the validity and reliability of the teacher made

test is doubtful. It can be low or even unknown. Knowing this fact, the

teacher should analyze their test so that they can know which items can be

used or which items should be revised. Based on the fact above, the quality

of teacher made test is investigated.

There are some studies taken before, concerning the content

validity, reliability, index of difficulty, and index of discrimination. This

study also analyzes this elements, but it is different from those previous

studies. The differences are: this study uses the English curriculum to

analyze the content validity, this study analyzes two forms of objective

test, multiple choice and completion test, and also the object of this study is

the first year students of senior high school.

This study is focus an analyzing the teacher made English test items

in UAS semester 2 2008/2009 of the first year students of SMA

Muhammadiyah 2 Sidoarjo concerning study about the content validity,

Page 6: Jiptiain Iffahmursy 8344 6 Mythesis

6

reliability, item difficulty, and discrimination index. The form of test used

is the multiple choice and completion form. Here, the teacher does not use

standardized test but the teacher made test. It means that the test is

prepared, administrated, and scored by the teacher himself or herself. So,

the teacher made English test items in UAS semester 2 2008/2009 of the

first year students of SMA Muhammadiyah 2 Sidoarjo are analyzed,

whether it is really constructed in a right way, following the right

principles or not.

1.2 Statement of the problem

Based on the background of the study above, the questions of the

problem are formulated as includes:

1. How is the content validity of the teacher made English test?

2. How is the reliability of the teacher made English test?

3. How is the index of difficulty of teacher made English test?

4. How is the index of discrimination of the teacher made English

test?

Page 7: Jiptiain Iffahmursy 8344 6 Mythesis

7

1.3 Objectives of the study

Based on the statement of the problems stated above, the

objectivities of the study are stated as follow:

1. To find out the content validity of the teacher made English test

items in UAS semester 2 2008/2009 of the first year students of

SMA Muhammadiyah 2 Sidoarjo.

2. To find out the reliability of the teacher made English test items in

UAS semester 2 2008/2009 of the first year students of SMA

Muhammadiyah 2 Sidoarjo.

3. To find out the index of difficulty of the teacher made English test

items in UAS semester 2 2008/2009 of the first year students of

SMA Muhammadiyah 2 Sidoarjo.

4. To find out the index of discrimination of the teacher made English

test items in UAS semester 2 2008/2009 of the first year students of

SMA Muhammadiyah 2 Sidoarjo.

Page 8: Jiptiain Iffahmursy 8344 6 Mythesis

8

1.4 Significances of the study

This study is expected to be useful for:

1. The English teachers

This study is expected to be useful for the teachers of SMA

in Sidoarjo as the test constructor of the test items in constructing a

good English test so that they can construct good English test items

in the future and can decide which items should be kept and which

items should be revised so that the test becomes valid and reliable.

2. The students

For the students, this kind of test will show their real

achievement in their learning. The students will also know their

ability when they do the test in a right way. Knowing the result of

their test, they will know how far do they understand the lesson and

know whether they deserve to enter the next level or not.

3. Those who are involved in the teaching learning process

The findings of this study can be used to determine the

effectiveness of teaching learning process at schools and districts

by making a comparison with other schools or districts. This

findings is also can be used as valuable information to construct a

Page 9: Jiptiain Iffahmursy 8344 6 Mythesis

9

good test and can be used as comparison between the item analysis

in one school with another.

1.5 Scope and limitation

The scope and limitation of this study is the English final form of

test (UAS) for the first year students of senior high school. In this study,

the quality of the multiple choice items and the completion items are

discussed based on the student’s answers and scores. The test consist of

fifty problems which contain fourty-five multiple choice items and five

completion items. The variety of test type is used to get an objective result.

Here the student’s answers and scores of the first grade of SMA

Muhammadiyah 2 Sidoarjo are observed.

1.6 Definition of key terms

Avoiding misunderstanding and misinterpretation the terms is used

in this study, the following definitions are given:

1. Content validity

Content validity is a careful analysis of the language being tested

and of particular course objectives.

Page 10: Jiptiain Iffahmursy 8344 6 Mythesis

10

2. Reliability

The reliability of a test is a matter of how consistently it produced

similar results on different occasions under similar circumstances.

3. Item analysis

Item analysis is an examination of the tests from the point view of

their difficulty level and their level of discrimination.

4. Item difficulty

The index of difficulty shows how shows how easy or difficult the

particular item proved in test.

5. Discrimination index

The discrimination index of an item indicates the extant to which

the item discriminates between the tastes, separating the more able

testes from the less able.

6. Test

The examination or trial of the quality of a person or things;

examine and measure the qualities of person or the knowledge.

Page 11: Jiptiain Iffahmursy 8344 6 Mythesis

11

CHAPTER II

REVIEW OF RELATED LITERATURE

2.1 The Definition of Evaluation, Measurement, Testing and

Assessment

We are sometimes confused with the terms evaluation,

measurement and testing because they are often used synonymously. To

distinguish those meanings. Evaluation refers to the act or process of

determining the value of something.8 In addition, Gronlund state that

evaluation is qualitative descriptions of pupil behavior. Measurement as the

process of quantifying the characteristics of person according to explicit

procedures and rules.9 In line with Bachman, as Gronlund that

measurement is a quantitative description of pupil behavior. Measurement

means that the act or process of ascertaining the extent or quantity of

8 Nurkancana, Wayan and P.P.N. Sumartana. 1986. Evaluasi Pendidikan.Surabaya: Usaha Nasional.pg.29 Bachman, Lyle F. 1990. Fundamental Considerations in Language Testing.USA: Oxford University Press.pg.18

11

Page 12: Jiptiain Iffahmursy 8344 6 Mythesis

12

something.10 From those definitions, we know the differences between

evaluation and measurement.

In educational process, measurement refers to the quantitative and

evaluation refers to the qualitative. Nurkancana ans Sumartana also

differentiate those terms. Measurement is used to answer the question

“how much”, whole evaluation is used to answer the question “what

value”.

Although evaluation and measurement are different, they are

related to each other. Assessment of a program’s outcomes or results

(evaluation) is facilitated by measurement.11 In addition, Arikunto states

that to evaluate something, we do measurement first. It means that when

we are evaluating something, it should be based on measurement. For

example, to evaluate student’s reading ability, the teacher has to know the

student’s comprehension in reading. On the other hand, measurement will

be useless if we do evaluate it. After we measure something, we do

10 Nurkancana, Wayan and P.P.N. Sumartana. 1986. Evaluasi Pendidikan.Surabaya: Usaha Nasional.pg.211 Tuckman, Bruce W. 1975. Measuring Educational Outcomes:Fundamental of Testing. USA: Harcourt Brace Javanovich.pg.12

Page 13: Jiptiain Iffahmursy 8344 6 Mythesis

13

evaluation on it.12 For example, if the students comprehend the reading text

well, we can say that their reading ability are good.

For testing, test is a procedure designed to elicit certain behavior

from which one can make inferences about certain characteristics of an

individual.13A test can be considered to be a device typically used to find

out something about a person.14 In addition, Arikunto that test is a device

or a procedure which is used to find out or to measure something. Here, a

test is used to measure the changing of individual’s behavior as the goal of

instruction. By giving a test the changing of their student’s behavior. The

objectives of language testing.15

1. To determine readiness for instructional programs.

2. To classify or place individuals in appropriate language classes.

3. To diagnose the individual’s specific strengths and weaknesses.

4. To measure aptitude for learning.

12 Arikunto, Suharsimi. 1986. Dasar-dasar Evaluasi Pendidikan. Jakarta:Bumi Aksara.pg.2

13 Bachman, Lyle F. 1990. Fundamental Considerations in Language Testing.USA: Oxford University Press.pg.20

14 Tuckman, Bruce W. 1975. Measuring Educational Outcomes:Fundamental of Testing. USA: Harcourt Brace Javanovich.pg.1215 Harris, David P. 1969. Testing Language as a Second Language. USA:McGraw-Hill.pg.2

Page 14: Jiptiain Iffahmursy 8344 6 Mythesis

14

5. To measure the extent of student achievement of the instructional

goals.

6. To evaluate the effectiveness of instruction.

Looking at the explanations above, we can conclude that a test is an

instrument to give information about the student’s ability and to decide

something dealing either with the students or the teaching learning process.

We might be tempted to think of testing and assessing as

synonymous terms, but they are not. Tests are prepared administrative

procedures that occur at identifiable times in a curriculum when learners

muster all their faculties to offer peak performance, knowing that their

responses are being measured and evaluated.

Assessment, is an ongoing process that encompasses a much wider

domain. Whenever a student responds to a questions, offers, a comment, or

tries out a new word or structure, the teacher subconsciously makes an

assessment of the student’s performance.16

16 Brown, Douglas H. 2004. Language Assement Principles and ClassroomPractices. USA: San Francisco State University.pg.6

Page 15: Jiptiain Iffahmursy 8344 6 Mythesis

15

2.2 Types of Test

2.2.1 Types of Test According to Its Role

According to its role in teaching, categories test into four

categories. There are placement test, diagnostic, formative, and summative

test.17

2.2.1.1 Placement Test

Placement test concerns with the student’s entry behavior in a

sequence of instruction. The goal of placement test is to determine the

position in the instructional sequence and the mode of instruction that are

more likely to provide optimum achievement for each student.

2.2.1.2 Diagnostic Test

Diagnostic test concerns with the student’s persistent learning

difficulties that are left unsolved by the standard corrective prescriptions of

formative evaluation. In other word we can say that diagnostic test is a test

of student learning difficulties during instruction. The primary aim of

17 Grounlund, Norman E. 1976. Measurment and Evaluation in Teaching. NewYork: McMillan Publishing.pg.16

Page 16: Jiptiain Iffahmursy 8344 6 Mythesis

16

diagnostic test is to determine the causes of learning problems and to

formulate a plan for remedial action.

2.2.1.3 Formative Test

Formative test concerns with the student’s learning progress during

instruction and this test used to monitor learning progress. Its purpose is to

provide continuous feedback to both students provides reinforcement of

successful learning and identifies the specific learning errors that need

correction. Feedback to teacher provides information for modifying

instruction and for prescribing group and individual remedial work. Sine

formative test is directed toward improving learning and instruction, the

results are typically not used for assigning course grade.

2.2.1.4 Summative Test

Summative test concerns with the student’s achievement at the end

of instruction. It is designed to determine the extent to which the

instructional objectives have been achieved and is used primarily for

assigning course grades or for certifying student mastery of the intended

learning costumes. The main goal of this test is not only for grading or

Page 17: Jiptiain Iffahmursy 8344 6 Mythesis

17

certifying student’s mastery, but also for judging the appropriateness of the

couse objectives and the effectiveness of the instruction.

In line with Gronlund, Johnson and Johnson state that summative

test is conducted at the end of an instructional unit or semester to judge the

final quality and quantity of student achievement and the success of the

instructional program.18 In curriculum 2004, summative test is known as

UAS (Ujian Akhir Semester) of final form test.

2.2.2 Types of Test According to The Test Maker

Beside types according to its ole in teaching, there are types of test

according to the test maker. Categories test according to the test maker into

two categories.19 They are standardized test and teacher-made test.

2.2.2.1 Standardized Test

Standardized test is a test which is made by professional testing

services that the test is tried on first, analyzed, and revised before being

used. Standardized test is designed to be used with thousands and

18 Johnson, David W. And Roger T. Johnson. 2002. Meaningful Assessment: aManageable and Cooperative Process. USA: Allyn and Bacon.pg.7

19 Harris, David P. 1969. Testing Language as a Second Language. USA:McGraw-Hill.pg.1

Page 18: Jiptiain Iffahmursy 8344 6 Mythesis

18

sometimes hundreds of thousands of subjects throughout the nation or the

world, and prepared (and perhaps administered, scored, and interpreted) by

team of testing specialist.20

In addition, standardized tests are prepared for nation wide use

(usually commercial) to provide accurate and meaningful information on

student’s level of performance relative to others at their age or grade

levels.21 They also state that such tests are usually constructed by subject

matter specialists and experts on testing. To make the test scores

comparable, the tests are administered and scored under carefully

controlled conditions. It means that in standardized test, the test should be

tried out, analyzed and revised before being used, UAN and SPMB are the

examples of standardized test.

There are some characteristics of standardized test. According to

Arikunto, the characteristics of standardized test are as follows:22

20 Harris, David P. 1969. Testing Language as a Second Language. USA:McGraw-Hill.pg.121 Johnson, David W. And Roger T. Johnson. 2002. MeaningfulAssessment: a Manageable and Cooperative Process. USA: Allyn andBacon.pg.5322 Arikunto, Suharsimi. 1986. Dasar-dasar Evaluasi Pendidikan. Jakarta:Bumi Aksara.pg.144

Page 19: Jiptiain Iffahmursy 8344 6 Mythesis

19

1. Based on the content and the general goal for the whole schools in

the country.

2. In relation with general knowledge or capability.

3. Developed by professors, reviewer, and editors of test items.

4. Using items that are tried out, analyzed, and revised before being

used for a test.

5. Having high reliability.

6. Having norms which represent the whole performance of schools in

the country.

2.2.2.2 Teacher-Made Test

In teacher-made test, the test is made by the teacher himself of

group of teachers without being tried on first, analyzed, and

revised.Classroom test are generally prepared, administered, and scored by

one teacher.23 In addition, Arikunto also states that the teacher-made testis

23 Harris, David P. 1969. Testing Language as a Second Language. USA:McGraw-Hill.pg.1

Page 20: Jiptiain Iffahmursy 8344 6 Mythesis

20

a test made by the teacher himself or group of teachers is using untried out,

unanalyzed, and unrevised test items.24

The teacher-made test is used to measure his student’s achievement

on the objectives given after finishing the teaching learning progress. The

teacher-made test is made by the teacher based on his or her own

objectives and it is not tried out, analyzed and revised.25 Therefore, he also

states that the teacher-made test has average or lower reliability than

standardized test. UTS (Ujian Tengah Semester) or mid form test and UAS

(Ujian Akhir Semester) or final form test are the examples of teacher-made

test.

2.3 Forms of Test

There are some forms of test. According to Heaton there are two

forms of test. They are subjective test and objective test are the terms are

used refer to the scoring of the test.26 Objective tests usually only have one

24 Arikunto, Suharsimi. 1986. Dasar-dasar Evaluasi Pendidikan. Jakarta:Bumi Aksara.pg.144

25 Arikunto, Suharsimi. 1986. Dasar-dasar Evaluasi Pendidikan. Jakarta:Bumi Aksara.pg.144

26 Heaton, J.B. 1988. Writing English Language Test. New York:Longman.pg.25

Page 21: Jiptiain Iffahmursy 8344 6 Mythesis

21

correct answer, so they can be scored mechanically, while subjective test

need scale for scoring the test.

2.3.1 Subjective Test

Subjective test or essay test requires students to express their own

idea. In essay test candidates must think of what to say and then express

their ideas as well as possible.27 In line with Heaton, Johnson and Johnson

state that essay items require students to recall, select, organize, apply what

they have learned and expressed it in their own words.28 It means that in

subjective test or essay test the students are expected to think the answer

and then express their ideas in a good arrangement. Essay items provide

test takers with the opportunity to construct and compose their own

responses within relatively broad limits.29

In the subjective test, the scorer’s subjective judgment enters into

the scoring. The scores differ from one scorer to another and from one time

to another. It means that in scoring test, scorer’s subjectivity influences the

27 Heaton, J.B. 1988. Writing English Language Test. New York:Longman.pg.2528 Johnson, David W. And Roger T. Johnson. 2002. Meaningful Assessment: aManageable and Cooperative Process. USA: Allyn and Bacon.pg.66

29 Tuckman, Bruce W. 1975. Measuring Educational Outcomes:Fundamental of Testing. USA: Harcourt Brace Javanovich.p.111

Page 22: Jiptiain Iffahmursy 8344 6 Mythesis

22

test. Different scorer may produce different score. Subjective test are those

that require an opinion, a judgment on the part of the examiner.

The opinion above lead to the conclusion of the strengths and the

weakness of subjective test. Here are the strengths and the weaknesses of

subjective test:30

The strength of subjective test are:

a. It is easy to construct the items.

b. It encourages the students to express their ideas and construct them

in good sentences.

c. It is able to see how far the students master the material.

The weaknesses of subjective test are:

a. It has low validity and reliability because it is easy to know which

knowledge has been mastered perfectly.

b. It lacks representative of all the materials that will be examined to

the students.

c. It takes a long time in scoring.

d. It is difficult to score because it requires the scorer considerations.

30 Khoiriyah, Nurul. 2005. An Analysis on the Reading Section of the EnglishTest Items of UAN 2003/2004. Unpublished S-1 Thesis. Surabaya:Universitas Negeri Surabaya.

Page 23: Jiptiain Iffahmursy 8344 6 Mythesis

23

2.3.2 Objective Test

Objective test requires the students to choose the right answer ar

give short answer. Objective tests are scored rather mechanically without

need to evaluate complex performance on a scale. It means that in

objective tests, the students are demanded to give short answer even only

by choosing certain codes representatives of the answers available.31

Defines objective test as a short answer test.

In addition, Heaton states that objective test is referring to the

scoring of the test that can be described as objective. In line with Heaton,

Arikunto adds that the objective test is the test that can be scored

objectively.32 It means that the student will get some score, no matter who

examiners mark the test since it only has one correct answer.

The opinion above lead to the conclusion of the strengths and

weaknesses of objective test. Here are the strengths and the weaknesses of

objective test.33

The strengths of objective test are:

31 Nurgiyantoro, Burhan.1987. Penilaian dalam Pengajaran Bahasa danSastra. Yogyakarta: BPFE.pg.1332 Arikunto, Suharsimi. 1986. Dasar-dasar Evaluasi Pendidikan. Jakarta:Bumi Aksara.pg.16333 Khoiriyah, Nurul. 2005. An Analysis on the Reading Section of theEnglish Test Items of UAN 2003/2004. Unpublished S-1 Thesis. Surabaya:Universitas Negeri Surabaya.pg.22

Page 24: Jiptiain Iffahmursy 8344 6 Mythesis

24

a. It can represent the materials that will be examined to the students.

b. It has high objectivity because it can avoid the sorer considerations.

c. It is easy to score and take a short time to score.

The weaknesses of objective test are:

a. It is much more difficult to construct than essay test items.

b. It tends to measure the cognitive aspect only.

c. It enables the students to speculate in choosing the correct answer.

d. It enables the students to cooperate to do the test.

In conclusion, because both subjective and objective test items have

strengths and weakness, there is no best form of test. Therefore, the teacher

should apply both of them in teaching learning process.

There are several types of objectives test. There are many varieties

of there new types test, but four kinds are in most common use, true-false,

multiple-choice, completion, matching.34 It will be discusses only the

multiple-choice and the completion types.

34 Nurkancana, Wayan and P.P.N. Sumartana. 1986. Evaluasi Pendidikan.Surabaya: Usaha Nasional.pg.29

Page 25: Jiptiain Iffahmursy 8344 6 Mythesis

25

2.3.2.1 Multiple-choice Test

Multiple-choice test is a test where a testee has to select one correct

answer from the option given. A multiple-choice item is usually set out in

such away that the candidate is required to select the anwer from a number

of given options, only one of which correct.35 In addition, Nurkancana and

Sumartana state that a multiple-choice item is an item which consist of

stem, which presents a problem situation, and several option, which

provide possible solutions to the problem.36 The option include the correct

answer and several wrong answer, called distracters, in which to distract

those students who are uncertain of the answer. Briefly, it can be described

as follows:

They usually. . . to work by train. ____________________ stem

a. Gone

b. Went

c. Going

d. Goes

e. Go _____________ correct option

35 Weir, Cyril J. 1990. Communicative Language Testing. UK: PrenticeHall International.pg.4336 Nurkancana, Wayan and P.P.N. Sumartana. 1986. Evaluasi Pendidikan.Surabaya: Usaha Nasional.pg.31

Distracter option

Page 26: Jiptiain Iffahmursy 8344 6 Mythesis

26

In multiple-choice test, items should be constructed in such a way

that students obtain the correct option by direct selection rather than the

elimination of incorrect options. A good distracter will attract will attract

more students from the lower group than the upper group. When item

distracter attract more students from the upper group than the lower group,

it is not a good distracter. And when item distracters do not attract both

upper and lower group, it is a non function distracter.

The characteristics of a good multiple-choice test construction are

as follows:37

1. Each multiple-choice item should have only one answer. This

answer must be absolutely correct, unless the instruction specifies

choosing the best option (as in vocabulary test)

2. Only one feature at a time should be tested. It has long been

standard practice to test only one feature at a time, it is usually less

confusing for the testee and it helps to reinforce a particular

teaching point.

3. Each option should be grammatically correct when placed in the

stem, except of course in the case of specific grammar test items.

37 Heaton, J.B. 1988. Writing English Language Test. New York:Longman.pg.28

Page 27: Jiptiain Iffahmursy 8344 6 Mythesis

27

4. All multiple-choice items should be at a level appropriate to the

linguistic ability of testees. The contexts, itself, should be a lower

level than the actual problem which the item is testing.

5. Multiple-choice items should be a brief and clear as possible

(though it is often desirable to provide short contexts for grammar

items).

6. In many test, items are generally arranged in rough order increasing

difficulty. It is generally considered important to have one or two

simple items to lead in the testee, especially if they are not familiar

with the kind of test being administered.

2.3.2.2 Completion Test

Completion test is a test where the students have to fill in or

complete a sentence or statement. In completion test, the students must

construct their own response rather than choosing from among given

choices.38 They fill in or complete a sentence from which a word or phrases

has been omitted. Therefore, when we are dealing with completion test, we

are filling in, giving a proper answer or completing a sentence or statement.

38 Tuckman, Bruce W. 1975. Measuring Educational Outcomes:Fundamental of Testing. USA: Harcourt Brace Javanovich.pg.79

Page 28: Jiptiain Iffahmursy 8344 6 Mythesis

28

In constructing the completion test, the keys are as follows:39

1. To strike a balance between leaving out so much that the item

becomes ambiguous and leaving out so little (or otherwise

providing so many clues) that the items become so easy.

2. Avoiding instances where the grammar of the sentences helps

determine the answer.

3. Completion items should have a single correct answer, preferably a

word or short phrase.

2.3.2.3 True and False

Usually there are more true answers than false on most tests, if

there is no guessing penalty, then guess. You have a 50% chance of getting

the right answer. So the testee should read through each statement

carefully, and pay attention to the qualifiers and keywords. If any part of

the question is false, then the entire statement is false but just because part

of a statement is true doesn't necessarily make the entire statement true.

Ideal test items:

39 Tuckman, Bruce W. 1975. Measuring Educational Outcomes:Fundamental of Testing. USA: Harcourt Brace Javanovich.pg.81

Page 29: Jiptiain Iffahmursy 8344 6 Mythesis

29

- Critical content should be readily apparent and identified for

analysis, avoiding cleverness, trickery, and verbal complexity

- Use simple, direct language in declarative sentences

- Present the correct part of the statement first, and vary the truth or

falsity of the second part if the statement expresses a relationship

(cause, effect--if, then)

- Statements must be absolute without qualification, subject to the

true/false dichotomy without exceptions

- Every part of a true sentence must be "true"

- If any one part of the sentence is false, the whole sentence is false

despite many other true statements.

Limitations of using true-false items

True-false items:

- incorporate an extremely high guessing factor

- can often lead an instructor to write ambiguous statements due to

the difficulty of writing statements which are unequivocally true or

false

- do not discriminate between students of varying ability as well as

other item types

- can often lead an instructor to favor testing of trivial knowledge

Page 30: Jiptiain Iffahmursy 8344 6 Mythesis

30

True-False Test Items

A true-false item can be written in one of three forms: simple,

complex, or compound. Answers can consist of only two choices

(simple), more than two choices (complex), or two choices plus a

conditional completion response (compound).

Sample true-false item:

- Simple

Conflict is essential in a play True False

- Complex

conflict is essential in a play True False Opinion

- Compound

conflict is essential in a play True False

If this statement is true, what makes it true?

2.3.2.4 Matching Test Items

In general, matching items consist of a column of stimuli presented on

the left side of the exam page and a column of responses placed on the

right side of the page. Students are required to match the response

associated with a given stimulus.

Advantages in using matching items

Page 31: Jiptiain Iffahmursy 8344 6 Mythesis

31

- require short periods of reading and response time, allowing you to

cover more content

- provide objective measurement of student knowledge

- provide highly reliable test scores

- provide scoring efficiency and accuracy

Limitations in using matching items

- have difficulty measuring learning objectives requiring more than

simple recall of information

- are difficult to construct due to the problem of selecting a common

set of stimuli and responses

Suggestions for writing matching test items

1. Include directions which clearly state the basis for matching the

stimuli with the responses. Explain whether or not a response can be

used more than once and indicate where to write the answer.

2. Use only homogeneous material in matching items.

Undesirable Directions: Match the following

1.____ Impressionist a. blue, red, yellow

2.____ Pop Art b. Claude Monet

3.____ primary colors c. Andy Warhol

d. Claude Debussy

Page 32: Jiptiain Iffahmursy 8344 6 Mythesis

32

Desirable Directions: On the line to the left of each art style in Column

I, write the letter of a representative artist from Column II. Use each name

only once.

1.____ Impressionist a. Jackson Pollack

2.____ Pop Artist b. Claude Monet

3.____ Abstract impressionist c. Andy Warhol

d. Claude Debussy

3. Arrange the list of responses in some systematic order if possible

(e.g. chronological, alphabetical)

4. Avoid grammatical or other clues to the correct response, e.g. avoid

sentence completion due to grammatical clues.

5. Keep matching items brief, limiting the list of stimuli to under 10.

6. Include more responses than stimuli to help prevent answering

through the process of elimination.

7. When possible, reduce the amount of reading time by including only

short phrases or single words in the response list.

2.4 Characteristics of a Good Test

Making a good test, a test maker should know the characteristics of

a good test. So that the test is qualified enough to be given and can

Page 33: Jiptiain Iffahmursy 8344 6 Mythesis

33

represent the degree of the students’ mastery over the language teaching

materials have been thought.

All good tests include three qualities namely validity, reliability,

and practicality.40 In this study, validity and reliability will be discussed

because they are the most important characteristics of a good test.

A teacher, who wishes to use a good test to make an important

decision about an individual or group, must be sure that the test possesses

two absolutely essentials characteristics, validity and reliability.41

2.4.1 Validity

Validity refers to the extent to which the results of an evaluation

procedure serve the particular uses for which they are intended.42 It means

that validity of a test measures what it is supposed to measure. If the test is

able to measure what its purposes, then the test has high validity. There are

40 Harris, David P. 1969. Testing Language as a Second Language. USA: McGraw-Hill.pg.1341 Bloom, Benjamin S. T all. 1981. Evaluation to Improve Learning. USA: McGraw-Hill.pg.7242 Grounlund, Norman E. 1976. Measurment and Evaluation in Teaching. New York:McMillan Publishing.pg.79

Page 34: Jiptiain Iffahmursy 8344 6 Mythesis

34

three types of validity: content validity, criterion-related validity, and

construct validity.43 However, only content validity will be discussed.

Content validity depends on a careful analysis of the language

being tested and of particular course objectives.44 The test should be so

constructed as to contain a representative sample of course, the relationship

between the test items and the course objectives always being apparent.

The test has content validity if the objectives stated in the curriculum. The

sample of activities to be included in a test is as representative of the target

domain as is possible.45 To know whether the test has content validity or

not, the test should be compared with the materials states in curriculum.

The test has high content validity if the test items cover the materials stated

in the curriculum.

2.4.2 Reliability

Reliability refers to the consistency of measurement. It means that

it shows the consistency of the test score or other evaluation results from

43 Grounlund, Norman E. 1976. Measurment and Evaluation in Teaching. New York:McMillan Publishing..pg.8144 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.16045 Weir, Cyril J. 1990. Communicative Language Testing. UK: Prentice HallInternational.pg.24

Page 35: Jiptiain Iffahmursy 8344 6 Mythesis

35

one measurement to another. The reliability of a test is a matter of how

consistently it produces similar results or consistent reliability, then it can

be said that the test has reliability.46 Published tests usually require test

reliability of 0, 85 or above while teacher- built tests are usually considered

adequate with reliabilities of 0, 60 or above.47

There are some factors effecting reliability of a test, those are:48

a. The extent of the sample of material selected for testing. It means

that the test which has bigger items will be more reliable than the

test which has small number of items.

b. The administration of the test. It means that the condition of

administrating of the test will affect the reliability of the test.

c. The instruction. The clarity of the instruction will affect the

students’ comprehension to answer the test.

d. Personal factors, such as motivation and illness.

e. Scoring the test. It means that the objectives test is more reliable

than the subjective test.

46 Oller, John W. 1979. Language Test at School. USA: Longman.pg.447 Tuckman, Bruce W. 1975. Measuring Educational Outcomes: Fundamental of Testing.USA: Harcourt Brace Javanovich.pg.25648Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.162

Page 36: Jiptiain Iffahmursy 8344 6 Mythesis

36

There are some methods to estimate reliability. Here, formula is used

since it avoids troublesome correlations and it involves only the test

mean and standard deviation,49 both of which are normally calculated

anyhow as a matter of routine.

The formula is:

r =

1N

N

2Nx

m(Nm1

Where:r = the reliability

N= the number of items in the test

m = the mean score on the test for all the testees

x = the standard deviation o all the testees’ score

2.5 Item Analysis

The items should be analyzed to determine their effectiveness. It

means that the test is not finished yet once the raw mark have been

obtained.50

49 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.16450 Harris, David P. 1969. Testing Language as a Second Language. USA: McGraw-Hill.pg.105

Page 37: Jiptiain Iffahmursy 8344 6 Mythesis

37

It need further analyzed in order to get information corcerning (1)

the performance of the students a group, thus informing the teacher about

the effectiveness of the teaching, (2) the performance of individual student,

and (3) the performance of each of the items comprising the test.51

Concerning the performance of the students as a group and

individual student, item analysis shows not only the types of errors most

frequently made, but also the actual reasons for the errors being made. It

helps the teachers to know how effective the teaching learning activities

are. For the items itself, items analysis shows which items will be used and

which items will be rewritten or replaced since it tells us whether an items

is too difficult or too easy, whether all the distracters function is intended,

and how will it discriminate between high and low score on test.

In items analysis, all items should be examined from the point of

view of (1) their difficulty level and (2) their level of discrimination.52

2.5.1 Index of Difficulty

The index of difficulty shows how easy or difficult the particular

item proved in test.53 It expresses the percentage of the students who

51 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.17852 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.178

Page 38: Jiptiain Iffahmursy 8344 6 Mythesis

38

answer the item correctly. In addition, Oller points out that items difficulty

is about how difficult or how easy a test item for the students being

investigated.54 A good test item must not too difficult or too easy for the

students.

The students’ score must be analyzed in order to know exactly the

index of difficulty of the test. The index of difficulty is calculated by using

formula below:55

Correct U + Correct L

F. V = 2n

Where :

F.V = the index of difficulty

Correct U = the number of students in upper

group who answer the

items correctly

53 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.17854 Oller, John W. 1979. Language Test at School. USA: Longman.pg.24655 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.182

Page 39: Jiptiain Iffahmursy 8344 6 Mythesis

39

Correct L = the number of students in lower

group who answer the

items correctly

n = the number of students in each group

The criteria to interpret the result above are as follows:56

0, 71 – 1,00 = easy

0, 31 – 0,70 = moderate

0, 00 – 0,30 = difficult

The criteria above show that if the index of difficulty shows 1,00, the

test is too easy since the students can answer all items. It is not good to be

given to the students. Moreover, if the index of difficulty shows 0,00, the

test is too difficult since the students cannot answer all the items. This test

56 Arikunto, Suharsimi. 1986. Dasar-dasar Evaluasi Pendidikan. Jakarta: BumiAksara.pg.212

Page 40: Jiptiain Iffahmursy 8344 6 Mythesis

40

is also not good be given. The test which is good to be given to the students

is the test with criterion between 0,31 – 0,70.57

It is important to recognize that an item which half of the students

answer correctly has the highest possible discriminating potential. Consider

an item which 80% of the upper group and 20% of the lower group answer

correctly. According to the rule of thumb for items answered by half or less

of the students, the maximum discriminating ability of the item is 80 plus

20, or 100. Since the index of discrimination of the item is 60, the

discriminating efficiency is 60%. As the difficulty of an item varies so that

more than half of the combined upper and lower groups answer the item

correctly, the discriminating ability will decrease from 100. The lower limit

of the maximum discriminating ability is zero when all of the combined

upper and lower groups, or none of them, answer an item correctly.

A useful rule of thumb in interpreting the index of discrimination is

to compare it with the maximum possible discrimination for an item. The

maximum possible discrimination is a function of item difficulty. When

half or less of the sum of the upper group plus the lower group answered

the item correctly, the maximum possible discrimination is the sum of the

57 Arikunto, Suharsimi. 1986. Dasar-dasar Evaluasi Pendidikan. Jakarta: BumiAksara.pg.212

Page 41: Jiptiain Iffahmursy 8344 6 Mythesis

41

proportions of the upper and lower groups who answered the item

correctly. For example, if 30% of the upper group and 10% of the lower

group answered the item correctly, the maximum possible discrimination is

30 plus 10, or 40.

Page 42: Jiptiain Iffahmursy 8344 6 Mythesis

42

2.5.2 Index of Discrimination

The discrimination index of item discriminates between the testees,

separating the more able testees from the less able (Heaton, 1988: 179). In

other words it can be said that the index of discrimination is the ability to

differentiate between students who achieve well (upper group) and those

who achieve poor (lower group). Estimate the index of discrimination is by

comparing the member of students in the upper group and the lower group

who answer the items correctly.

The index of discrimination can be calculated by using formula

below:58

Correct U – Correct L

D = n

Where:

D = the index of discrimination

Correct U = the number of students in upper

group who answer the

items correctly

58 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.182

Page 43: Jiptiain Iffahmursy 8344 6 Mythesis

43

Correct L = the number of students in lower

group who answer the

items correctly

n = the number of students in each

group

The criteria used to interpret the result above are as follows59:

0, 00 – 0, 20 = poor

0, 20 – 0, 40 = satisfactory

0, 40 – 0, 70 = good

0, 70 – 1, 00 = excellent

Discrimination indices can range from +1 (= an item which

discriminates perfectly) through 0 (= an item which does not discriminate

in any way at all) to -1 (= an item which discriminates in entirely the

wrong way).60 It means that if the test discriminates perfectly with the

index of discrimination +1, all the students in upper group can answer all

the items of the test correctly, while the students in the lower group cannot.

On the other hand, if the index of discrimination is -1, all the students in

upper group cannot answer all the items correctly, but all the students in

59 Arikunto, Suharsimi. 1986. Dasar-dasar Evaluasi Pendidikan. Jakarta: Bumi Aksara.pg.22360 Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.180

Page 44: Jiptiain Iffahmursy 8344 6 Mythesis

44

the lower group can. This kind of test is entirely wrong and must be

replaced. However, if both the students in the upper group and in the lower

group can or cannot answer the items correctly, so the index of

discrimination is 0. This kind of test does not discriminate in any way at

all.

2.6 Review of the Previous Studies

Concerning with this study, there are four studies taken before.

Those studies analyzes the quality of the teacher-made English test items

concerning its content validity, reliability, index of difficulty, and index of

discrimination. Those studies are:

1. An analysis of the English test items of the first term of local

summative test for the second year students of junior high schools

in Mojokerto done by Suharman. He finds that the test does not

have adequate content validity, has adequate discrimination

reliability, and has acceptable facility value, does not have adequate

discrimination index and have the effectiveness of distracters.

2. An analysis of the reading section of the English test items of UAN

2003/ 2004 done by Nurul Khoiriyah. She finds that the test has

Page 45: Jiptiain Iffahmursy 8344 6 Mythesis

45

high content validity, acceptable reliability, does not have

acceptable index of difficulty, has poor discrimination index, and

has effective distracters.