a framework for automatic generation of grammar and vocabulary questions

24
A Framework for Automatic Generation of Grammar and Vocabulary Questions Ayako Hishino ( 星星星星 ) Lunan Huang Hiroshi Nakagawa ( 星星星星 ) University of Tokyo ( 星星星星 ) WorldCALL 2008

Upload: iria

Post on 04-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

A Framework for Automatic Generation of Grammar and Vocabulary Questions. Ayako Hishino ( 星野綾子 ) Lunan Huang Hiroshi Nakagawa ( 中川裕志 ) University of Tokyo ( 東京大学 ) WorldCALL 2008. Outline. Introduction Related Work The Data Structure Preprocess Extension to Japanese Language Summary. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Framework for Automatic Generation of Grammar and Vocabulary Questions

A Framework for Automatic Generation of Grammar and

Vocabulary Questions

Ayako Hishino (星野綾子 )Lunan Huang

Hiroshi Nakagawa (中川裕志 )University of Tokyo (東京大学 )

WorldCALL 2008

Page 2: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Outline

• Introduction• Related Work• The Data Structure• Preprocess• Extension to Japanese Language• Summary

2009/3/16 2

Page 3: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Introduction (1/5)

• With the Internet, the latest information spreads throughout the world with almost no time lag.

• One of the notable phenomena in the "flat" world is the outsourcing of highly specialized work throughout the world.

• This raises the need for education for ESP (English for Special Purposes) with which professionals in non-English speaking world master English in their own specailities.

2009/3/16 3

Page 4: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Introduction (2/5)

• While there are abundant resources to learn a language online, there are very scarce materials and few teachers that can help language learning in specialized areas.– For example, the latest news on the Internet

would be the perfect reading material, as there are online news websites specialized in many areas.

• Also, learning, and at the same time knowing, about the latest topics would be an exciting experience, thus helping to keep the learner motivated.

2009/3/16 4

Page 5: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Introduction (3/5)

• An automatic question generator provides independent learners with inexhaustive materials for their practice.

• It also makes it possible for a learner to practice with a variety of materials, from the latest news to a document of their own interest.

• We present two applications for AQG (automatic question generation): Sakumon, a question making assistance system and SakumonChallenge, a CAT (Computer Adaptive Testing) system that administers automatically generated questions.

2009/3/16 5

Page 6: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Introduction (4/5)

2009/3/16 6

Page 7: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Introduction (5/5)

• These questions are of the same format, which is multiple-choice fill-in-the-blank.

• We believe that this same format of question can test different kinds of knowledge.

• Question A tests on vocabulary, B tests on grammar, and C is symmetric combination of the two.

2009/3/16 7

Page 8: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Related Work (1/3)

• AQG has gained attention only recently as an application of NLP (natural language processing) and there have been only a few studies reported so far.

2009/3/16 8

Page 9: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Related Work (2/3)

• We take advantage of the output of a syntactic parser, which is a technology that analyzes the sentence into a nested phrases structure according to the language's grammar.

2009/3/16 9

Page 10: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Related Work (3/3)

• The result of sentence parsing is also called a parse tree, because of its resemblance to an up-side-down tree with branches.

• The lowest level next to the words shows POS (Part-Of-Speech) tags, which are assigned one to each word.

• In addition to these POS tags, a parse result tells us such information as which of adjacent words are grouped to make a phrase and which noun phrase goes with which verb.

2009/3/16 10

Page 11: A Framework for Automatic Generation of Grammar and Vocabulary Questions

The Data Structure (1/5)

• In place of an authoring tool for learning objects for general frameworks, our system has an authoring assistance system that allows the user to make questions on an online news article, just by clicking on a word in the text and selecting from the suggestions for alternatives.

• The data structure is designed to contain one article (whatever passages serve the same) on which questions are generated.

2009/3/16 11

Page 12: A Framework for Automatic Generation of Grammar and Vocabulary Questions

The Data Structure (2/5)

2009/3/16 12

Page 13: A Framework for Automatic Generation of Grammar and Vocabulary Questions

The Data Structure (3/5)

• At the beginning of the XML document, the basic information on the article, such as title, news source (which website it is from), and date of publication, is recorded.

• Two main parts come after the heading: 1) article and 2) grammar distractor candidates.

2009/3/16 13

Page 14: A Framework for Automatic Generation of Grammar and Vocabulary Questions

The Data Structure (4/5)

• The article part contains parsed sentences with up to seven candidate vocabulary alternatives attached to each word.

• To each candidate vocabulary alternative, all inflectional forms are attached.

• For example, for a verb, a candidate alternative contains the infinitive form, past, past participle, and gerund form.

2009/3/16 14

Page 15: A Framework for Automatic Generation of Grammar and Vocabulary Questions

The Data Structure (5/5)

• The second part is called cphrases, which contain grammar distractor candidates.

• In our methodology, the grammar distractor candidates are generated by converting a phrase in the parse tree.

• Each phrase refers to the original one by IDs given to the phrases.

2009/3/16 15

Page 16: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Preprocess (1/3)

• The data in the framework we have defined are automatically generated.

• In preprocess, data go through many steps in a pipeline manner.

1.HTML Parser: the raw texts are extracted from downloaded HTML files. We retain paragraph tags (<p>).

2.Sentence Splitter: The sentence boundaries are determined.

2009/3/16 16

Page 17: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Preprocess (2/3)

3. Sentence Parser: A sentence parser tokenizes and analyzes sentences in a bracketed structure.

4. POS Tagger: The TreeTagger lemmatizes and annotates POS tags to each token. The look-up and annotating frequency is also done here.

5. Distractor Selector: By consulting WordNet, the system appends a list of candidate vocabulary alternatives to each document.

2009/3/16 17

Page 18: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Preprocess (3/3)

6. Morphological Generator: A morphological generator is used to generate all possible inflectional forms for each word and each vocabulary alternative.

7. GrammarTarget Annotator: This finds the phrases matching the patterns and appends the converted phrases (grammar alternatives) to the document.

8. Distractor Indexer: The system indexes vocabulary alternatives for each token (to quicken the response time).

2009/3/16 18

Page 19: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Extension to the Japanese Language (1/4)

• Before doing migration work, people should pay attention to the differences between Japanese and English.– First, in Japanese sentences, all morphemes are

conjunct without spaces.– Second, Japanese emphasizes dependency

structure while English emphasizes phrase structure.

• Therefore, the points for a Japanese test ought to focus on katsuyo, or inflection and Ko-ou, adverb-predicate agreement, rather than grammatical structures as is done in English.

2009/3/16 19

Page 20: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Extension to the Japanese Language (2/4)

• Generation of a vocabulary question inherits the method based on frequency, which is language-dependent.

• As mentioned above, there are eight steps to getting the final XML file.

• First, the program should be adjusted for downloading Japanese news from designated websites.

2009/3/16 20

Page 21: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Extension to the Japanese Language (3/4)

• Since Japanese punctuation markers are simpler than English, a complex sentence-splitting algorithm is not necessary.

• For the subsequent steps, we need to employ the Japanese processing tool, Cabocha, which recognizes the inflectional forms of verbs, tags POS, and analyzes sentences into dependency structures.

• The main manual work to be done by people is to program grammar target rules.

2009/3/16 21

Page 22: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Extension to the Japanese Language (4/4)

• Once the distractors are obtained and put into XML files, the sakumon framework will do the rest of the work.

• The main work needed is to tag grammar targets.

• In general, people should analyze the target language using the NLP tools available for that language.

2009/3/16 22

Page 23: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Summary (1/2)

• We have described a framework for automatic generation of grammar and vocabulary questions.

• Currently, we have two applications based on this framework: Sakumon, a question-authoring assistance system, and SakumonChallenge, a computer adaptive testing system with automatically generated questions.

• We have defined the data structure and the method for automatically generating the data.

2009/3/16 23

Page 24: A Framework for Automatic Generation of Grammar and Vocabulary Questions

Summary (2/2)

• We have discussed possible extensions to this framework, using an example of extension to the Japanese language.

• Lastly, we would like to remind the readers that the framework we have shown is only a working example.

• Currently, we are working on improvements for future versions.

2009/3/16 24