fundamental concepts and principles in language testing

FUNDAMENTAL CONCEPTS

AND PRINCIPLES IN

LANGUAGE TESTING

Subject: Language Testing

Instructor: Nguyễn Thanh Tùng, Ph.D.

Class: TESOL 2014B

1. Phạm Phúc Khánh Minh 4. Võ Thị Thanh Thư

2. Nguyễn Trần Hoài Phương 5. Đỗ Thị Bạch Vân

3. Nguyễn Ngọc Phương Thành 6. Ngô Thảo Vy

1. The importance of testing

2. Distinctions among test,

evaluation and measurement

3. Qualities of a language test

CONTENTS

1. The importance of testing

1.1. The relationship of testing and

teaching

“Testing and teaching are closely interrelated

that it is impossible to work in either field

without being constantly concerned with the

other.” (Heaton, J. B. 1988)

Good tests of grammar, translation or language manipulation

Good

communicative

tests of

language

1.1. The relationship of testing and

teaching

1.2. The elements of a good

classroom test

A good test should:

enable teachers to increase their effectiveness by

making adjustments in their teaching

help to locate the precise areas of difficultyencountered by the class or by the individual student

enable the teacher to ascertain which parts of the

language programme have been found difficult by

the class

provide the students with an opportunity to show their

ability to perform a certain task

1.3. Aspects to be tested

What should be tested?

Four skills in communicating: listening, speaking, reading, and writing

The language areas learnt: grammar and usage, vocabulary, and phonology

Language elements: nouns, verbs, adjectives, and so on

1.4. Testing the language skills

It is important to concentrate on types of

test items which are relevant to the ability to

use language for real-life communication,

especially in oral interaction.

Ways of assessing performance in the four

major skills may take the form of tests of:

listening (auditory) comprehension (short

utterances, dialogues, talks and lectures are given to

the learners);

speaking ability, usually in the form of an interview, a

picture description, role play and a problem-solving task involving pair work or group

work;

reading comprehension (questions are set to test the

students' ability to understand the gist of a text

and to extract key information on specific points in the text); and

writing ability, usually in the form of letters, reports,

memos, messages, instructions, and accounts

of past events, etc.

It is the test constructor's task to assess the

relative importance of these skills at the various

levels and to devise an accurate means of

measuring the student's success in developing

these skills.

1.5. Testing language areas

In an attempt to isolate the language areas

learnt, a considerable number of tests include

sections on:

Grammar usage

Vocabulary (concerned with word meanings,

word formation and collocations)

Phonology (concerned with phonemes, stress

and intonation)

• to measure students' ability to recognize appropriate grammatical forms and to manipulate structures

grammar and usage

• to measure students' knowledge of the meaning of words and the patterns and collocations in which they occur.

• may test their active or their passive vocabulary

vocabulary

• might attempt to assess the 3 sub-skills: ability to recognise and pronounce the significant sound contrasts, ability to recognise and use the stress patterns, and ability to hear and produce the melody or patterns of the tunes (i.e. the rise and fall of the voice)

phonology

1.6. Language skills and

language elements

Testing students' ability to handle the elements

of the language or testing the integrated skills

depends both the level and the purpose of

the test.

At all levels but the most elementary, it is

generally advisable to include test items which

measure the ability to communicate in the

target language.

1.7. Main item types of tests

Recognition

to test the recognition of correct words and

forms

Example: Choose the correct

answer and write A, B, C or D.

I've been standing here ___

half an hour.

A. since B. during C. while D. for

Production

to test if students

can produce

the correct answer

Example: Complete each blank with the correct word.

I've been standing here ___ half an

hour.

1.8. Sampling problems and

avoiding traps

The test must cover an adequate and

representative section of those areas and skills

which it is desired to test.

A good test should never be constructed in

such a way as to trap the students into giving an

incorrect answer.

2. Distinctions among test,

evaluation and

measurement

2. Distinctions among test, evaluation

and measurement

- Often used synonymously- For example: Giving a test to evaluate students’ language proficiency- Being essential to the development and use of language tests

2.1. Measurement

The process of quantifying the characteristics of persons according to explicit procedures and rules

Features Quantification

Characteristics

Rules and procedures

2.1.1. Quantification

- Assigning numbers- Differing from qualitative descriptions such as visual presentation, verbal or non-verbal accounts…

2.1.1. Quantification

1 2

• Scales of measurement:

+ Assigning numbers

+ Non-numerical categories, etc.

2.1.2. Characteristics

Whatever attributes or abilities we measure, it is these attributes or abilities and not the people themselves that we are measuring

- Both physical and mental characteristics

- Mental attributes: aptitude, intelligence, motivation, attitude, fluency in speaking, etc.

- Mental abilities: being able to do something , performance on a set of mental tasks The higher degrees of a given ability, the higher probability of correct performance on tasks of lower difficulty or complexity

2.1.3. Rules and procedures

Quantification must be done according to explicit rules and procedures

The observation of an attribute must be replicable for other observers, in other contexts and with other individuals

Many types of measures: rankings, rating scales and tests

2.2. Test

A psychological or educational test is a procedure designed to elicitcertain behavior from which one can make inferences about certain characteristics of an individual.

(Carroll, 1968:46)

For example: The Interagency Language Roundtable (ILR) oral interview – a speaking test:

+ A set of elicitation procedures (activities, questions & topics)

+ A measurement scale of language proficiency (0 5)

2.2. Test

Designed to obtain a specific sample of behavior

Provide the means for more focusing on the specific language

abilities that are of interest

Viewed as supplemental to other methods of measurement

The best means of assuring the sufficiency of the sample of language obtained

For example: the ILR oral interview, the TOELF, etc.

2.3. Evaluation

requires

The ability of the decision maker

The quality of the information: reliableand relevant

The systematic gathering information for the purpose of making decisions

For example: + Education decisions will be based on rumor+ Sex and motivation are relevant to learning strategies

2.3. Evaluation

- Not be exclusively quantitative information (verbal

descriptions, overall impressions, ratings, test scores, etc.)

- Not necessarily entail testing

- Tests can be for purely descriptive purposes - not evaluative

It is important to distinguish the information-providing

function of measurement from the decision-making function of

evaluation

2.4. Relationship among measurement,

tests, and evaluation

2.4. Relationship among measurement,

tests, and evaluation

1. An evaluation excludes tests and measures

Ex: Qualitative descriptions of student performance

2. A non-test measure for evaluation

Ex: Teacher ranking used for assigning grades

3. A test for purposes of evaluation

Ex: Using achievement test to determine student progress

4. A test not used for evaluation

Ex: Using proficiency test as a criterion in SLA research

5. A non-test measure not used for evaluation

Ex: Assigning code numbers to school subjects

3. Qualities of a language

test

3. Qualities of language tests

Usefulness of the test

Reliability

Construct validity

Authenticity

Interactiveness

Impact

Practicality

3.1. Reliability R

elia

bili

ty

Consistency of measurement

Consistent across different characteristics of the testing situation

3.1. Reliability

Example:

If the score of for the first student given by 3 examiners is

10/10. However, the score for the second students is just 2/10.

The scores is not consistent and would be considered to be

unreliable indicators of the ability we want to measure.

VALIDITY

the extent to which the test measures what it is supposed to measure

Content validity

Construct validity

Face validity

3.2. Validity

CO

NT

EN

T V

AL

IDIT

Y

The extent to which a test represent all facets of tasks within the domain being

tested

Example: One teacher gives students the final test. However, the test only covers the material for the last 3 weeks

Low content validity

3.2.1. Content validity

3.2.2. Construct validity

Construct validity

pertains to the meaningfulness of and appropriateness of the interpretations that

we make on the basis of test scores

the characteristics of the test task

construct definition

3.2.3. Face validity FA

CE

VA

LID

ITY

the extent to which a test is subjectively viewed as a covering the concepts it

purports to measure

Example: After a group of students sat a test, the teacher asked for feedback., particularly if they thought the test was a good one.

3.3. Authenticity

the degree of correspondence of the

characteristics of a given language test task to the

characteristics of a TLU task

provide a link between test performance and

the TLU tasks and domain we want to

generalize

the way test takes perceive the relative authentic of test task

can facilitate their test performance

3.4. Interactiveness

Interactiveness is the extent and type of involvementof the test take’s individual characteristics inaccomplishing a test task

Interactiveness is the heart

of many current views of

language teaching and

language leaning

Interactiveness is a

function of the extent and

type of involvement of the

test takes' language ability

and affective schemata

3.5. Impact

Impact

Micro level: individual

Macro level: society,

education system

Washback(Backwash): influence of testing on

teaching and learning.

3.6. Practicality

Practicality is the relationship between the resources

that will be required in the design, development,

and the use of the test and the resources that will be

available for these activities.

A practical test is one whose design, development,

and use do not require more resources than are

available.

Types of resources : human resources, material

resources, and time.

References

Heaton, J. B. (1988). Writing English language tests (New ed.).

London: Longman.

Bachman, L. F. (1997). Fundamental considerations in language

testing. Oxford: Oxford University Press.

Bachman, L. F. & Palmer, A. S. (1996). Language testing in Practice:

design and developing useful language tests. Oxford: Oxford

University Press.

Thank you!