인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

30
KAIST Education for the World, Research for the Future 인간의 경험 공유를 위한 태스크 컨텍스트 추출 표현 2012. 11. 29 류지희 웹사이언스공학 전공 정보검색 자연어처리 연구실

Upload: haklae-kim

Post on 24-Jun-2015

529 views

Category:

Technology


7 download

TRANSCRIPT

Page 1: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

KAIST Education for the World, Research for the Future

인간의 경험 공유를 위한

태스크 및 컨텍스트 추출 및 표현

2012. 11. 29

류지희

웹사이언스공학 전공

정보검색 및 자연어처리 연구실

Page 2: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Why Human Experience Sharing?

© 2012 IR&NLP Lab. All rights reserved. 2

Necessity of Experiential Problem Solving Knowledge

Page 3: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

© 2012 IR&NLP Lab. All rights reserved. 3

1. Loosen lug nuts on tire.

2. Install spare tire.

User Context Info

[On U.S. highway]

[1 year driving experience]

[Heading to New York]

[Female]

user

A. Change a Flat Tire When You Are a Woman Alone

1. Call AAA.

2. Be placed on “hold”.

B. Change a Tire like a Real Woman

Page 4: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Experience Mining

© 2012 IR&NLP Lab. All rights reserved. 4

Building a Relational Knowledge about Experiences

Event People Place Time

Play Soccer Yongho, … Expo Park 2011-08-10

Play Baseball Chulsoo, … Gapchun Park 2009-09-02

Event (Type)

People (Type)

Place (Type)

Time (Type)

(Sport) (student) (Park) (Summer)

Experiential Sentences &

Context

Experiential Knowledge

Web

Experiential Knowledge Distillation

Context-anchored

Automatic extraction

Aggregation & abstraction

Page 5: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

From What?

© 2012 IR&NLP Lab. All rights reserved. 5

Various types of open contents on the Web!

How-to articles

Blog posts

Microblog posts

Human Experiential KB

Human Task mining

Event Context mining

Place Semantics mining

Page 6: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Human Task Mining

© 2012 IR&NLP Lab. All rights reserved. 6

Page 7: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Human Task Model

© 2012 IR&NLP Lab. All rights reserved. 7

Topic

Goal

Action

Object Time Location

hasTopic

hasNextAction

hasObject hasTime hasLocation

hasAction

Page 8: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Human Task Extraction

© 2012 IR&NLP Lab. All rights reserved. 8

Title How to Make Omelet Soup

Step 1 Place the water or canned chicken broth

in a large saucepan.

Boil the sweet yellow onion for several

minutes.

Step 2 Add the powdered chicken broth along

with the canned mushrooms.

Boil the soup for a few more minutes,

and then add the chopped green onion.

Step 3 Drop the eggs into the simmering broth

a few minutes before you're ready to

serve the omelet soup.

(boil, onion)

(add, broth)

(boil, soup)

water broth

onion soup egg

Action Sequence

Goal

(place, water) (place, broth)

(drop, egg)

(add, onion)

Ingredients

Make Omelet Soup

Page 9: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Hybrid Extraction Method

© 2012 IR&NLP Lab. All rights reserved. 9

No

Yes

Yes

Eat fruit every day.

Turn off the car. (eat, fruit)

(turn off, car)

Syntactic Patterns

CRFs Model

Sentences

Retrieve and apply

a rule

Select the best

label sequence

Matched?

Prob. > threshold

Extract

verb and ingredients

Page 10: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Next Challenging Issues

© 2012 IR&NLP Lab. All rights reserved. 10

A large fraction of sentences (more than 40%) in how-to instructions are not imperative sentences.

Difficulties arising from variations in writing Scoping ambiguity

E.g. Clear or glitter nail polish should go on the nails.

Anaphora E.g. Make it fun and unique

Condition E.g. If your computers are only a few years old

Ellipsis E.g. So why don't you?

Implicit meaning E.g. Studying improves grades. (Study hard!)

Grammatical mistake E.g. IM a friend! (Make friend relationship in a instance messenger)

Case Percentage

Scoping Ambiguity

13.9%

Anaphora 13.1%

Condition 11.9%

Ellipsis 1.9%

Implicit meaning

1.3%

Grammatical mistake

1.3%

Etc. 56.6%

Case Percentage in all the clauses in

30 sample documents

Page 11: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Feature Sets

© 2012 IR&NLP Lab. All rights reserved. 11

Feature Type Feature Name Feature Values

Syntactic Features

Clause Type main, subordinate

Person 1st person, 2nd person, 3rd person

Auxiliary Verb will, shall, can, may, must, able to, …

Voice active, passive, n/a

Tense past, present, future

Polarity negated, non-negated

Feature Type Feature Name Examples

Modality Features

Obligation • You have to ask about the car.

Permission • You can search for the world weather.

Explanation • The cost for delivery is already included.

Supposition • You will have access to the weather.

Page 12: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Result: Actionable Clause Detection

© 2012 IR&NLP Lab. All rights reserved. 12

Task Used Feature Sets F1(NB) F1(DT) F1(SVM)

Actionable Clause

Detection

Syntactic Features

(micro only) 0.933 0.942 0.948

+ Modality Features

(micro &macro) 0.862 0.963 0.966

NB : Naïve Bayes DT : Decision Tree SVM : Support Vector Machines

Page 13: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Bridge to Semantic Web

© 2012 IR&NLP Lab. All rights reserved. 13

AcTN knowledge representation YAGO knowledge representation

Page 14: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Changing Data Representation

© 2012 IR&NLP Lab. All rights reserved. 14

Current Form

Refined tabular data records [plain text]

Ultimate Target Form

Well-designed ontology entries

[well-formed RDF]

Page 15: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Event Context Mining

© 2012 IR&NLP Lab. All rights reserved. 15

Page 16: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

What is an Event?

© 2012 IR&NLP Lab. All rights reserved. 16

Events are defined as situations that happen

Punctual (example 1-2) or last for a period of time (example 3-4)

States in which something holds true (example 5)

Examples Ferdinand Magellan, a Portuguese explorer, first reached the islands in search of spices.

(1)

A fresh flow of lava, gas and debris erupted there Saturday. (2)

11,024 people were evacuated to 18 disaster relief centers. (3)

“We’re expecting a major eruption,” he said in a telephone interview early today.

(4)

Israel has been scrambling to buy more masks abroad, after a shortage of several hundred thousand gas masks.

(5)

Page 17: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Event Expressions

© 2012 IR&NLP Lab. All rights reserved. 17

Event may be expressed in the following forms

Type Example

Verb A fresh flow of lava, gas and debris erupted there Saturday.

Noun

Israel will ask the United States to delay a military strike ag

ainst Iraq until the Jewish state is fully prepared for a possib

le Iraqi attack.

Adjective A Philippine volcano, dormant for six centuries, began expl

oding with searing gases, thick ash and deadly debris.

Predicative clause “There is no reason why we would not be prepared,” Mord

echai told the Yediot Ahronot daily.

Prepositional phrase All 75 people on board the Aeroflot Airbus died.

Page 18: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Feature Sets

© 2012 IR&NLP Lab. All rights reserved. 18

Basic Features

Named entity (NE) tags and an indication of whether the target noun is prenominal or not.

Lexical Semantic Features (LS)

The set of target nouns’ lemmas and their WordNet hypernyms

Dependency-based Features (DF)

Nouns become events if they occur with a certain surrounding context, namely, syntactic dependencies

Dependency-based Features sometimes need to be combined with Lexical Semantic Features

Page 19: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Comparing with Previous Work

© 2012 IR&NLP Lab. All rights reserved. 19

An improvement of about 0.22 (precision) and 0.09 (recall) over the state-of-the-art, respectively.

0.718

0.577

0.95

0.584

0.483

0.727

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

F1

Recall

Precision

Llorens et al. (2010) Proposed Method

Page 20: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Place Semantics Mining

© 2012 IR&NLP Lab. All rights reserved. 20

Page 21: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Place Semantics

© 2012 IR&NLP Lab. All rights reserved. 21

AS GPS-enabled mobile devices have come into wide use, Location based services catch popularity

But it is hard to provide appropriate context-aware services to users when the system only use user’s location, i.e. GPS(latitude, longitude)

Contrary to location, Place is space where people impart a meaning

If we know the meaning of the place, Place Semantics, we can serve much better suitable services to users

Page 22: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Motivation

© 2012 IR&NLP Lab. All rights reserved. 22

Scenario

Recently, Lena moved to Korea from USA. She doesn’t know Korean culture and geography at all because she didn’t leave outside USA before.

How about Olympic Bowling Alley?

Is there similar places with Brooklyn Bowl that I often visited in order to relieve stress?

No. Thanks! It’s NOT the place I wanted.

Brooklyn Bowl is a bowling alley in New York City. People enjoy bowling, have a party, drink beer and hold a music event in Brooklyn Bowl.

Page 23: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Place Semantics Mining

© 2012 IR&NLP Lab. All rights reserved. 23

People leave texts about “why they visit, what they do” when they check-in at Place on Foursquare

We can know the perception of places from those texts

We apply LDA to extract Place Semantics A document is composed of texts written in a place.

Place

“text”

Page 24: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Similarity between Two Places

© 2012 IR&NLP Lab. All rights reserved. 24

How about XL Night Club?

Is there similar places with Brooklyn Bowl that I often visited in order to relieve stress?

32%

27%

18% 11%

7%

5%

Have a party & Drink beer

Enjoy a music show

After work

Eat food

Watch sports game

Others

Brooklyn Bowl XL Night Club

41%

26%

30%

3%

Page 25: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Concluding Remarks

© 2012 IR&NLP Lab. All rights reserved. 25

Page 26: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Application of Our Results

© 2012 IR&NLP Lab. All rights reserved. 26

Semantic Annotation

Adds diversity and richness to text processing

Page 27: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

© 2012 IR&NLP Lab. All rights reserved. 27

Thank you!

Page 28: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

KAIST Education for the World, Research for the Future

Jihee Ryu ([email protected])

http://jihee.kr

IR&NLP Lab

http://ir.kaist.ac.kr

Yoonjae Jeong ([email protected])

Sung-Hyon Myaeng ([email protected])

http://ir.kaist.ac.kr/member/professor/

Eunyoung Kim ([email protected])

Page 29: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Reference

© 2012 IR&NLP Lab. All rights reserved. 29

1) Jung, Y., Ryu, J., Kim, K., Myaeng, S.H.: Automatic Construction of a Large-Scale Situation Ontology by Mining How-to Instructions from the Web. Web Semantics: Science, Services and Agents on the World Wide Web (2010)

2) Ryu, J., Jung, Y., Kim, K., Myaeng, S.H.: Automatic Extraction of Human Activity Knowledge from Method-Describing Web Articles. 1st Workshop on Automated Knowledge Base Construction (2010)

3) Park, K.C., Jeong, Y., Myaeng, S.H.: Detecting Experiences from Weblogs. 48th Annual Meeting of the Association for Computational Linguistics (2010)

4) Ryu, J., Jung, Y., Myaeng, S.H.: Actionable Clause Detection from Non-imperative Sentences in How-to Instructions: A Step for Actionable Information Extraction. 15th International Conference on Text, Speech and Dialogue (2012)

5) Jeong, Y., Myaeng, S.H.: Using Syntactic Dependencies and WordNet Classes for Noun Event Recognition. Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web in conjunction with the 11th International Semantic Web Conference 2012 (2012)

6) Carter, E., Donald, J.: Space and place: theories of identity and location. Lawrence & Wishart Ltd. (1993)

Page 30: 인간의 경험 공유를 위한 태스크 및 컨텍스트 추출 및 표현

Data Collection: How-to Articles

© 2012 IR&NLP Lab. All rights reserved. 30

General How-to Articles

1,850,725 articles from eHow & 109,781 articles from wikiHow

eHow Category Group # doc wikiHow Category Group # doc

Computers & Software, Internet 323,289 Computers, Electronics 18,265

Home Building & Design & Safety 307,277 Family Life, Home, Pets, Relationships 18,220

Culture, Holidays, Hobbies, Weddings 238,143 Hobbies, Holidays, Travel 14,514

Business, Investment, Personal Finance 153,458 Health, Sports 14,161

Arts, Entertainment, Music 149,426 Youth 9,161

Family, Parenting, Pets, Plants 135,909 Personal Care, Style 7,031

Cars, Car Repair 108,386 Education, Communications 6,775

Healthcare, Fitness, Sports 103,758 Finance, Business, Work 6,729

Education, Careers, Employment 103,717 Food, Entertaining 6,099

Electronics 101,403 Arts, Entertainment 5,151

Food, Recipes 63,553 Cars, Vehicles 2,316

Fashion, Beauty 62,406 Philosophy, Religion 1,359

Total (As from December 2011) 1,850,725 Total (As from December 2011) 109,781