paper reading : learning to compose neural networks for question answering

/251

Learning to Compose Neural Networks for Question Answering

Jacob Andreas et al.NAACL HLT 2016(Best Paper)

박상현 (ESCA Lab)

/252

Abstract (1/2) Dynamic Neural Module Network

이미지와 구조적 지식 베이스 모두에 적용 가능한 , 동적으로 조립되는 뉴럴 네트워크 QA 모델 .

/253

Abstract (2/2) 질문 문장을 구문 분석하여 모듈의 컬렉션으로부터 맞춤형 뉴럴 네트워크를 동적으로 구축 .

이 네트워크를 이미지 또는 지식 베이스에 적용하여 답변을 생성함 .

각 모듈의 매개변수와 Network Layout 매개변수는 강화학습을 통해 공동으로 학습됨 .

질문 파싱 모듈로부터뉴럴 네트워크 후보 생성뉴럴 네트워크 선택 답변 생성

Lookup Find Relate Describe ExistsAnd Images KB

/254

1. Introduction (1/3) 이 논문은 다양한 World representation 에 대한 QA 작업을 수행하는

Compositional, Attentional 모델을 제시함 .

논문의 모델은 공동으로 학습되는 두 가지 컴포넌트로 구성됨 .1) Neural Module Collection 2) Layout Predictor

Image, Knowledgebase

/255

1. Introduction (2/3) VQA 를 위한 모듈 기반의 뉴럴 네트워크는 이미 이전 논문 (Andreas

et al., 2016) 에서 제시됨 .

이전 논문 대비 본 논문의 개선점은 다음 두가지임 . 1) 학습이 가능한 뉴럴 네트워크 Layout 예측기 . 2) 이미지에서만 사용이 가능했던 Visual Primitive 를 Knowledge base 에 대해서도 추론이 가능하도록 확장 .

/256

1. Introduction (3/3) 이 모델의 학습데이터는 다음 세가지로 구성됨 .

world

question

answer

Unsupervised 학습을 수행 .

이 모델은 자연 이미지 (VQA) 와 US 지리 정보 (GeoQA)에 대한 QA 작업에서 state of the art 성능을 성취함 .

/257

2. Deep networks as functional pro-grams (1/4) 저자의 이전 논문에서 VQA 작업을 Modular Sub Problem 으로 분해하는 Heuristic 한 방법을 제시함 .

① 질문을 Stanford Parser 로 파싱하여 universal dependency representation(tree) 취득② 그 다음 , wh- 단어 또는 연결동사에 연결된 디펜던시의 집합을 필터함 ex) what is standing in the field? what(stand) what color is the cat? color(cat) is there a circle next to a square? is(circle, next-to(square))

③ 모든 Leaf 는 find 모듈 , 모든 내부 노드는 transform 또는 combine 모듈 , 그리고 루트 노드는 describe 또는 measure 모듈로 구성 ex) what color is the cat? describe[color](find[cat]) where is the truck? describe[where](find[truck]) 본 논문에서는 이 과정을 학습을 통해 결정

/258

2. Deep networks as functional pro-grams (2/4)

Attention

Labeling

“What color is the bird?”

“Where is the bird?”(find)

“What color is that part of the image?”

(describe)

/259

2. Deep networks as functional pro-grams (3/4)

Attention

Labeling

“Are there any state?”

“where are the states?”(find)

“does the state exist?”(Exists)

/2510

2. Deep networks as functional pro-grams (4/4) 2 contributions of this paper.

1) Knowledge base 에 대해서도 attention 메커니즘을 적용할 수 있도록 확장하고 일반화함 .

2) 모듈을 구조적으로 조립하는 것을 학습하는 모델 Dynamic Neural Module Network

질문 문장을 구문 분석하여 구성 가능한 모듈의 컬렉션으로부터 뉴럴 네트워크를 동적으로 구축하는 모델 .

/2511

3. Related work Database QnA

Wong & Mooney, 2007; Kwiatkowski et al., 2010; Liang et al., 2011; Andreas et al., 2013

Neural models for QnA Iyyer et al., 2014; Bordes et al., 2014; Yang et al., 2015; Malinowski et al., 2015

Visual QnA Simonyan and Zisserman, 2014; Xu et al., 2015; Yang et al., 2015

Formal logic and representation learning Beltagy et al., 2013; Lewis & Steedman, 2013; Malinowski & Fritz, 2014

Fixed tree structure using universal parser Bottou et al., 1997; Socher et al., 2011; Bottou, 2014

/2512

4. Model The goal

Layout model Predict Layout from a Question :

Execution model Generate answer from W/R :

Questions

World Representations(Images, Knowledge bases)

Answersmap

/2513

4.1. Evaluating Modules Execution Model :

z 의 substructure 를 명시적으로 언급할 때 , 를 다음과 같이 나타낼 수 있음 .

layout z 의 집합은 각 module 의 다음 두 가지 Type Constraint 에 의해 제한됨 . Attention : A distribution over pixels or entities Labels : a distribution over answers.

: 입력 W/R w 에 대한 레이아웃 z 의 출력

m 은 root 모듈 , h1, h2 는 submodule의 output(attention)

/2514

4.1. Evaluating Modules 다른 네트워크의 모듈 인스턴스끼리 파라미터를 공유 (Parameter Tying) 할 수 있음 .

각 모듈은 Parameter Arguments 또는 Ordinary Inputs 을 가짐 . Parameter Arguments

layout 으로부터 제공 받으며 , 어휘 요소에 대한 모듈의 기능을 특정할 때 사용됨 . ex) what color is the cat? describe[color](find[cat])

Ordinary Inputs : 하위 네트워크의 계산 결과 ex) what color is the cat? describe[color](find[cat])

/2515

4.1. Evaluating Modules

• ,… : world representation• W : world representation expressed as a matrix• σ : ReLU• h : attention• ( 는 h 의 k 번째 요소 )

• A, a, B, b, … : Global weights• : Weights associated

with the parameter argument i• i : Parameter Argument

𝜃𝑒

ex) describe[color](find[cat])

http://sheng-z.github.io/2016/06/17/pr-lcnn4qa/modules.png



/2516

4.1. Evaluating Modules 각 네트워크 레이아웃의 최상위 모듈이 describe 또는 exists 모듈이라고 가정하면 , 조립된 전체 네트워크는 출력 레이블 상의 분포에 상응함 .

학습을 위해 관찰된 z 에 대해 를 최대화 시킴 .

/2517

4.2. Assembling networks Layout 선정 과정

1) layout 후보 집합 생성 .2) 각 후보 Scoring 하여 Top 1 선택 .

/2518

4.2. Assembling networks1) layout 후보 집합 생성

① 입력 문장을 dependency tree 로 표현② wh-word 또는 연결동사에 붙어있는모든 명사 , 동사 , 전치사구를 수집③ 각 단어 , 구를 layout fragment 에 연관시킴 .

- 일반 명사 (city) : find- 고유 명사 (Georgia) : lookup

- 전치사구 (in) : relate④ layout fragment 집합의 하위 집합을 구성 .

- and 모듈로 모든 하위 fragment 를 결합- measure 또는 describe 모듈을 최상위에 얹음 .

논문의 오타로 판단됨 . measure 는 이전 논문에서 있었지만 본 논문에는 없어짐 . measure 대신 exists 가 와야 함 .

/2519

4.2. Assembling networks2) 각 후보의 점수를 측정하여 최종 선택 .

① 질문 문장의 LSTM representation 과 query(layout) 의 feature based representation 를 생성 .

② ① 에서 얻은 LSTM representation 과 feature representation 을 이용하여 Score 계산

③ 이 스코어로부터 확률분포를 얻기 위해 Softmax 로 정규화 수행 는 Layout Parameter

: x 는 질문문장 : i 번째 후보 네트워크 (z) 의 임베딩

/2520

4.2. Assembling networks 저자는 다음과 같은 이유로 강화학습을 이용 .

Key Constraint : 계산 비용이 비싼 execution model 의 평가량을 최소화 해야 하는 반면 , layout model의 평가 ( 모든 z 에 대한 계산 ;scoring 도 여기에서 이루어짐 ) 는 비용이 저렴함 .

이와는 반대로 , semantic parsing 에서는 쿼리 Execution model 은 계산 비용이 저렴하고 , 점수를 철저히 매기기에는 구문 분석 결과 집합이 너무 큼 . 오히려 이 모델의 제약 사항은 강화 학습에서 에이전트가 처하는 시나리오와 유사함 . (action 을 scoring 하는 비용은 저렴하지만 action 을 실행하고 보상을 취득하는 비용은 비쌈 )

/2521

4.2. Assembling networks 저자들은 자신들의 모델을 stochastic policy 로 표현하여 학습 과정을 모델링함 .

① 로부터 z 를 샘플링② 샘플링한 z 를 knowledge source 에 적용하고 답변 상의 분포를 얻음 .③ 네트워크 z 가 선택되면 , 를 최대화함으로써 execution model 을 학습시킬 수 있음 . 확률 분포에 의해 샘플링하는 과정은 미분이 불가능하므로 Policy Gradient Method 를 이용하여 를 최적화함 .

r : 보상

execution modellayout model

/2522

5. Experiment(VQA)

/2523

5. Experiment(GeoQA)

/2524

6. Conclusion Dynamic Neural Module Network :

비구조적 ( 예 : 이미지 ) 또는 구조적 ( 예 :XML 데이터 ) 에 대해 Q&A 작업 가능 Question, Answer, World Representation 만으로 모듈을 조립하는 과정을 학습 .

/2525

Q&A

paper reading : learning to compose neural networks for question answering

Technology