overview of patent retrieval task at ntcir-3

33
Overview of Patent Retrieval Task at NTCIR-3 Organizers: Makoto Iwayama (TIT / Hitachi) Atsushi Fujii (Univ. of Tsukuba / JST) Noriko Kando (NII) Akihiko Takano (NII) Collaboration with: JIPA (Japan Intellectual Property Association) Data provider: (2001/4 ~ 2002/10)

Upload: olina

Post on 09-Jan-2016

47 views

Category:

Documents


3 download

DESCRIPTION

Overview of Patent Retrieval Task at NTCIR-3. (2001/4 ~ 2002/10). Organizers: Makoto Iwayama (TIT / Hitachi) Atsushi Fujii (Univ. of Tsukuba / JST) Noriko Kando (NII) Akihiko Takano (NII) Collaboration with: JIPA (Japan Intellectual Property Association) Data provider: PATOLIS Co. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overview of Patent Retrieval Task at NTCIR-3

Overview of Patent Retrieval Task at NTCIR-3

Organizers:

Makoto Iwayama (TIT / Hitachi)

Atsushi Fujii (Univ. of Tsukuba / JST)

Noriko Kando (NII)

Akihiko Takano (NII)

Collaboration with:

JIPA (Japan Intellectual Property Association)

Data provider:

PATOLIS Co.

(2001/4 ~ 2002/10)

Page 2: Overview of Patent Retrieval Task at NTCIR-3

What is Patent Retrieval Task- Evaluation workshop for Patent Retrieval -

• Evaluating/comparing patent retrieval techniques of participating systems

• Investigating evaluation methods of patent retrieval techniques

• Constructing/providing test collections for patent retrieval

• Providing a forum for research groups of patent retrieval

Page 3: Overview of Patent Retrieval Task at NTCIR-3

How to Construct Test Collection (Basic Idea)

system1

system2pooling

relevancejudgment

evaluation (recall/precision)

assessors(human experts)

searchresults2

searchresults1

pooledresults

relevantdocs.

searchtopic

search target(doc.

collection)

Test Collection

runs

Page 4: Overview of Patent Retrieval Task at NTCIR-3

Why Patent

• Patent information processing is a good test bed of researches on information access– Real task, real user, real information need– Variety of tasks (IR, CLIR, Filtering, Summarization,

Text Mining)

• Unique document collection which is different from other well-studied document collections– Searching patents may be different from searching

news article (in TREC, CLEF) or searching technical abstracts (in NTCIR).

Page 5: Overview of Patent Retrieval Task at NTCIR-3

Characteristics of Patent Documents

• Structured documents– claims, purposes, effects, embodiments,

etc.

• Unusual style for claims (esp. Japanese)– Many sub-topics in single long sentence

• Large variation in document length– The longest patent in our collection has

30,000 unique words !

Page 6: Overview of Patent Retrieval Task at NTCIR-3

Characteristics of Patent Documents (cont)

• Many technical terms• Many new terms (defined by applicants)• General terms in claims

ex. floppy disk → external storage

• Classifications– IPC(International Patent Classification)– FI(File Index) and FT(File Forming Term) for

Japanese patents

Page 7: Overview of Patent Retrieval Task at NTCIR-3

Characteristics of Patent Retrieval

• Various search purposes– High recall is necessary in some situations– Claim interpretation is necessary in some

situations

• Difference between industries– Chemical formulas/materials are important in

chemistry– Images/diagrams are important in machinery– IPC is primal in some industries, but not in some

others (ex. business method patents)

Page 8: Overview of Patent Retrieval Task at NTCIR-3

Main Task (Technical Survey)

non-expert (manager)

expert(patent searcher)

reading news articles

+

clipping

memo

for supplementing, focusing, etc.

• patent survey before product development• patents as technical documents (cf. as legal documents)• cross-DB retrieval

Page 9: Overview of Patent Retrieval Task at NTCIR-3

How to Construct Test Collection (Basic Idea)

system1

system2pooling

relevancejudgment

evaluation (recall/precision)

assessors(human experts)

searchresults2

searchresults1

pooledresults

relevantdocs.

searchtopic

search target(doc.

collection)

runs

Page 10: Overview of Patent Retrieval Task at NTCIR-3

Example of Search Topic (JA)<TOPIC><NUM>0004</NUM><LANG>JA</LANG><PURPOSE> 技術動向調査 </PURPOSE><TITLE> バーコードなどの符号を比較し優劣を判定する装置</TITLE><ARTICLE><A-DOC><A-DOCNO>JA-981031179</A-DOCNO><A-LANG>JA</A-LANG><A-SECTION> 社会 </A-SECTION><A-AE> 無 </A-AE><A-WORDS>189</A-WORDS><A-HEADLINE> エポック社の特許侵害訴訟、バンダイが敗訴--東京地裁</A-HEADLINE><A-DATE>1998-10-31</A-DATE><A-TEXT>  カードゲームの特許を侵害されたとして、玩具(がんぐ)製造会社のエポック社がバンダイに2億6400万円の損害賠償を求めた訴訟で、東京地裁は30日、約1億1400万円の支払いを命じた。 森義之裁判長は、バンダイが1992年7月~93年3月に製造・販売した小型ゲーム機「スーパーバーコードウォーズ」のキー操作などの機能について「エポック社が持つ特許の技術的範囲に属する」と指摘した。</A-TEXT></A-DOC></ARTICLE><SUPPLEMENT> バーコードなどを読み込み、これに基づく数値を比較して勝敗を決定していればよい。</SUPPLEMENT><DESCRIPTION> バーコードなどの符号を複数読み込ませ、これら符号に対応する数値を比較することにより、これらの優劣/勝敗の判定を行うことで対戦を行う装置にはどのようなものがあるか。</DESCRIPTION><NARRATIVE> 「スーパーバーコードウォーズ」とは、小型ゲーム機の一種であり、キャラクターなどが描かれたカードに記録されたバーコードを読み込ませ、プレーヤーが攻撃や防御などのキー操作を行うことで、半リアルタイムに対戦を行うものである。符号の例としては、バーコードや磁気コードなどがあるが、これらに限定するものではない。</NARRATIVE><CONCEPT> 符号 バーコード コード 優劣 勝敗 比較 判定 </CONCEPT><PI>PATENT-KKH-G-H01-333373</PI></TOPIC>

<ARTICLE>

<SUPPLEMENT>

<DESCRIPTION>

<NARRATIVE>

<CONCEPT>

<PI>

<TITLE>

Page 11: Overview of Patent Retrieval Task at NTCIR-3

Search Topics

• 31 Japanese topics created by JIPA

• English, Korean and Chinese (simplified and traditional) topics translated from Japanese

Page 12: Overview of Patent Retrieval Task at NTCIR-3

How to Construct Test Collection (Basic Idea)

system1

system2pooling

relevancejudgment

evaluation (recall/precision)

assessors(human experts)

searchresults2

searchresults1

pooledresults

relevantdocs.

searchtopic

search target(doc.

collection)

runs

Page 13: Overview of Patent Retrieval Task at NTCIR-3

Patent Collections (from PATOLIS Co.)

kkh jsh paj

Type Full text abstract abstract

Format Free SGML-like SGML-like

Language Japanese Japanese English

Years 98,99 95 - 99 95 - 99

Number of documents

697,262 1,706,154 1,701,339

Bytes 18,139M 1,883M 2,711M

kkh: Publication of unexamined patent applicationsjsh: JAPIO Patent Abstractspaj: Patent Abstracts Japan

Page 14: Overview of Patent Retrieval Task at NTCIR-3

Relationships between Patent Collections

Translation(by JAPIO experts)

Modification of the original abstracts(by JAPIO experts)length normalization around 400 words and term normalization

kkh: (98,99)Full texts with author’s abstracts(in Japanese)

jsh: (95-99)Abstracts(in Japanese)

paj: (95-99)Abstracts(in English)

Page 15: Overview of Patent Retrieval Task at NTCIR-3

How to Construct Test Collection (Basic Idea)

system1

system2pooling

relevancejudgment

evaluation (recall/precision)

assessors(human experts)

searchresults2

searchresults1

pooledresults

relevantdocs.

searchtopic

search target(doc.

collection)

runs

Page 16: Overview of Patent Retrieval Task at NTCIR-3

Runs

• Mandatory runs– Search topic fields:

<ARTICLE> + <SUPPLEMENT> only (i.e. cross-DB)– Search target:

full texts (98, 99)– manual or automatic retrieval– mono-lingual or cross-lingual retrieval

• Optional runs– TREC-styled ad-hoc runs recommended

search from <DESCRIPTION> (+ <NARRATIVE>)

Page 17: Overview of Patent Retrieval Task at NTCIR-3

How to Construct Test Collection (Basic Idea)

system1

system2pooling

relevancejudgment

evaluation (recall/precision)

assessors(human experts)

searchresults2

searchresults1

relevantdocs.

searchtopic

search target(doc.

collection)

runs

pooledresults

Page 18: Overview of Patent Retrieval Task at NTCIR-3

Manual Search before Pooling

pooling

relevancejudgments

searchresults2

searchresults1

originalpool

relevantdocs.

relevantdocs.

skimmedpool

skimming

preliminarysearch

assessors(human experts)

by systems and experts

by experts only

by systems only

systems

relevantdocs.

evaluation

JIPA (Japan Intellectual Property Association)

Page 19: Overview of Patent Retrieval Task at NTCIR-3

Evaluation

• Recall/Precision graph

• Average precision

• R-precision

Page 20: Overview of Patent Retrieval Task at NTCIR-3

Submitted Runs

• 36 runs from 8 groups– 6 groups from Japan, 2 groups from overse

as– 5 groups from universities, 3 groups from c

ompanies– 20 mandatory runs, 16 optional runs– 5 cross-lingual runs (EJ) from 2 groups

Page 21: Overview of Patent Retrieval Task at NTCIR-3

A, mandatory

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

recall

prec

isio

n

LAPIN4(A)

DTEC1(M)

DOVE4(M)

brklypat1(A)

daikyo(M)

SRGDU5(A)

IFLAB6(A)

brklypat3(A,E)

A: autoM: manual

E: English topics

Recall/Precision-- mandatory, A --

Page 22: Overview of Patent Retrieval Task at NTCIR-3

A+B, mandatory

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

recall

prec

isio

n

LAPIN4(A)

DOVE4(M)

DTEC1(M)

daikyo(M)

brklypat1(A)

SRGDU3(A)

IFLAB6(A)

brklypat3(A,E)

A: autoM: manual

E: English topics

Recall/Precision-- mandatory, A+B --

Page 23: Overview of Patent Retrieval Task at NTCIR-3

A, optional

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

recall

prec

isio

n

LAPIN1(A,TDNC)

brklypat2(A,DN)

DOVE3(A,DN)

DOVE2(A,D)

SRGDU6(A,DN)

IFLAB2(A,D)

IFLAB3(A,DN)

IFLAB7(A,T)

brklypat4(A,DN,E)

A: autoM: manual

T: TITLED: DESCRIPTIONN: NARRATIVEC: CONCEPT

E: English Topics

Recall/Precision-- optional, A --

Page 24: Overview of Patent Retrieval Task at NTCIR-3

A+B, optional

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

recall

prec

isio

n

LAPIN1(A,TDNC)

DOVE3(A,DN)

brklypat2(A,DN)

DOVE2(A,D)

IFLAB2(A,D)

SRGDU4(A,DN)

IFLAB4(A,DN)

IFLAB7(A,T)

brklypat4(A,DN,E)

A: autoM: manual

T: TITLED: DESCRIPTIONN: NARRATIVEC: CONCEPT

E: English Topics

Recall/Precision-- optional, A+B --

Page 25: Overview of Patent Retrieval Task at NTCIR-3

Comparable Runs from the same group (brkly)

ad-hoc run from Japanese

ad-hoc run from English

cross-DB run from Japanese

cross-DB run from English

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

recall

prec

isio

n

Page 26: Overview of Patent Retrieval Task at NTCIR-3

Some Results

• Cross-DB retrieval is more difficult than ad-hoc retrieval

• Cross-lingual retrieval (EJ) is more difficult than Mono-lingual retrieval (JJ)

• Pseudo relevance feedback may be effective

Refer to our SIGIR2003 paper (to appear) for more detail of glass box evaluations

Page 27: Overview of Patent Retrieval Task at NTCIR-3

Median of Mean Average Precisions

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31topic ID

medi

an o

f av

era

ge p

recis

ions

AA+B

Page 28: Overview of Patent Retrieval Task at NTCIR-3

Human Experts vs. Systems- breakdown of relevant documents -

0

50

100

150

200

250

300

350

400

450

BSASBJAJBUAU

BS 1 34 44 2 15 43 9 29 7 3 6 15 0 6 2 2 15 5 4 0 1 0 1 7 15 17 26 0 0 1 16

AS 0 5 27 2 0 0 55 0 1 0 1 57 6 37 0 2 0 6 9 1 0 1 1 10 49 11 5 35 0 5 15

BJ 10 7 2 2 2 0 9 42 10 3 0 47 22 2 8 4 9 36 0 5 0 0 2 33 3 16 18 3 0 2 19

AJ 4 0 0 0 0 0 23 10 17 11 1 173 12 10 1 1 2 11 0 0 2 4 2 108 12 6 7 7 0 1 16

BU 7 13 4 4 6 0 5 33 7 4 3 29 5 10 3 1 8 18 2 1 1 0 4 16 6 16 38 7 5 7 4

AU 22 13 6 10 15 12 27 23 18 15 4 101 22 17 5 15 5 17 36 4 8 4 4 72 49 12 19 40 6 3 16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

topic ID

num

ber

of do

cum

ents

Page 29: Overview of Patent Retrieval Task at NTCIR-3

Recall of AJ+AU

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600 700 800 900 1000

ranking

recall all

mandatory

mandatory (auto)

pooling (rank = 30)

Human Experts vs. Systems- recall of the “A” relevant documents experts found -

Page 30: Overview of Patent Retrieval Task at NTCIR-3

Results 2

• Medians of Mean Average Precisions vary topic by topic.

• Numbers of relevant documents systems/experts found vary topic by topic.

Page 31: Overview of Patent Retrieval Task at NTCIR-3

Proposal-based task was also supported

• CRL– Using the Diff Command in Patent

Documents

• TIT– Rhetorical Structure Analysis of Japanese

Patent Claims using Cue Phrases

Page 32: Overview of Patent Retrieval Task at NTCIR-3

Summary

• The first evaluation workshop for patent retrieval

• Three kinds of patent collections

• Technical survey task– Cross-DB / Cross-lingual retrieval

• Relevance judgment by expert patent searchers (JIPA)

Page 33: Overview of Patent Retrieval Task at NTCIR-3

Patent Retrieval Task at NTCIR-4(currently on “Dry Run”)

• Joint project with JIPA

• 10 years’ Japanese full texts (5 years’ for search target)

• 10 years’ English Abstracts

• Invalidity search from claims

• Feasibility task for automatically creating “Patent Map”