naistビッグデータシンポジウム - 情報 松本先生

19
Scientific Paper Analysis Yuji Matsumoto Computational Linguistics Lab Graduate School of Information Science March 6, 2015 Big Data Symposium at NAIST

Upload: ysuzuki-naist

Post on 13-Apr-2017

1.045 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: NAISTビッグデータシンポジウム - 情報 松本先生

Scientific Paper Analysis

Yuji MatsumotoComputational Linguistics Lab

Graduate School of Information Science

March 6, 2015Big Data Symposium

at NAIST

Page 2: NAISTビッグデータシンポジウム - 情報 松本先生

Large Scale Text DataData on the Web SNS: twitter, blog Wikipedia News, …Scientific/Technical documents Scientific Papers Legal documents: law reports, casebooks Patent documents

Page 3: NAISTビッグデータシンポジウム - 情報 松本先生

Knowledge BasesConstructed manually WordNet, Domain ontologiesConstructed by community (Wikipedia) FreebaseConstructed automatically NELL: Never-Ending Language Learning MindNet

Page 4: NAISTビッグデータシンポジウム - 情報 松本先生

ApplicationsKnowledge Graph (Google) Knowledge extracted from Freebase,

Wikipedia, …

Watson (IBM) Extracted from Wikipedia Deep QA

Page 5: NAISTビッグデータシンポジウム - 情報 松本先生

Structures of KBLinked structure entities and relations PDF

Entity: person, country, products, etc Relation: born_in(Barack Obama, Honolulu) locates_in(Honolulu, Hawaii) state_of(Hawaii, USA)

Page 6: NAISTビッグデータシンポジウム - 情報 松本先生

Natural Language AnalysisHow text is analyzed Word segmentation, Part-of-speech

tagging Named entity recognition Syntactic parsing Semantic disambiguation Semantic parsing Discourse analysis

Page 7: NAISTビッグデータシンポジウム - 情報 松本先生

Linked Knowledge Extraction

Named entity recognition Extraction of entities, concepts

Syntactic dependency parsing direct dependency between entities

Semantic parsing predicate argument structure analysis subject-predicate-object, relation between

entitiesDiscourse analysis co-reference – the same entity by different

mentions relation between facts: temporal, causal

Page 8: NAISTビッグデータシンポジウム - 情報 松本先生

8

We analyzed the effect on the binding and the activity of transcription factors at a regulatory element.

TPA induction inhibits the binding of the transcription factor NF-E2 to                                               this transcriptional control element.

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme Theme

S1

S2

S3

Semantic Parsing: Example

Katsumasa Yoshikawa, Sebastian Riedel, Tsutomu Hirao, Masayuki Asahara, Yuji Matsumoto,"Coreference Based Event-Argument Relation Extraction on Biomedical Text,“Journal of Biomedical Semantics, Volume 2, Supplement 5, S6, October 2011

Page 9: NAISTビッグデータシンポジウム - 情報 松本先生

9

"this element" in S2 is coreferent to… "a regulatory element" in S1

We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. Corefer

TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme Theme

S1

S2

S3

Co-reference analysis

Page 10: NAISTビッグデータシンポジウム - 情報 松本先生

10

The true argument (Theme) of binding is "a regulatory element“ and "this element" is just an anaphor of itTransitivity enables us to conflate the information

We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. (B) Corefer(C) Theme

TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme (A) Theme

S1

S2

S3

(A) Theme & (B) Corefer => (C) Theme

Information conflation

Page 11: NAISTビッグデータシンポジウム - 情報 松本先生

11

We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. CoreferTheme

TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme Theme

Theme

CoreferTheme

S1

S2

S3

Discourse analysis

Page 12: NAISTビッグデータシンポジウム - 情報 松本先生

Syntactic parsingNE chunking

Part-of-Speech(POS)tagging

Predicate-argumentStructure analysis

Coreferenceresolution

Relationextraction semantic/

contextprocessing

Machine Learning /Knowledge Acquisition

Document Structure Analysis

Knowledge

Bases(Dmain

Ontologies)

NLP Technologies for Document Analysis

12

Page 13: NAISTビッグデータシンポジウム - 情報 松本先生

What we can do with Scientific Papers

Knowledge extraction (domain knowledge)New fact discoveryContent-aware paper searchSummarization Automatic generation of abstracts Keyword generation Survey generation

Recommendation of related papersSimilar article/case search Structural similarity: papers, law reports,

patents

Page 14: NAISTビッグデータシンポジウム - 情報 松本先生

Example: Structured Abstract Generation

14

Page 15: NAISTビッグデータシンポジウム - 情報 松本先生

Related ProjectBig Mechanism (2014.07-, by DARPA)

http://www.darpa.mil/Our_Work/I2O/Programs/Big_Mechanism.aspx The Big Mechanism program aims to develop

technology to read research abstracts and papers to extract pieces of causal mechanisms, assemble these pieces into more complete causal models, and reason over these models to produce explanations. The domain of the program is cancer biology with an emphasis on signaling pathways.

Page 16: NAISTビッグデータシンポジウム - 情報 松本先生

Architecture of Big Mechanism

from Paul Cohen, “DARPA’s Big Mechanism Program”

Page 17: NAISTビッグデータシンポジウム - 情報 松本先生

Deep Language AnalysisComplex sentence structure analysisRobust Semantic ParsingDiscourse Analysis Co-reference Causal / Temporal relationRepresentation and Reasoning Explanation / AnticipationConfidence/credibility (of extracted facts / what is written in documents)

Page 18: NAISTビッグデータシンポジウム - 情報 松本先生

Large-scale Text Data

syntactic dependency structureargument structure, coreference

rhetorical / document structure

POS tags, phrase/NE chunking

relations ( temporal, causal, entailment )

18

Know

ledg

e Ba

seOn

tolo

gy

Language Processing and Document Analysis Layers

Document Analysis(Document Understanding, Similarity-based Search, Knowledge Discovery/Assembling)

Page 19: NAISTビッグデータシンポジウム - 情報 松本先生

We may be able to do more

Research Trend SurveyResearch (paper) Evaluation Content-aware citation analysis

Innovation Foresight Eg: Foresight and Understanding from

Scientific Exposition (FUSE) Project http://www.iarpa.gov/index.php/research-programs/fuse

Collaboration with people in application areas who need to read/understand documents