1 資訊檢索策略與技巧 黃慕萱, chap.6 harter , chap. 7. 2 檢索策略 v.s....

36
1 資資資資資資資資資 資資資Chap.6 Harter Chap. 7

Upload: jessica-berry

Post on 22-Dec-2015

245 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

1

資訊檢索策略與技巧

黃慕萱, Chap.6

Harter , Chap. 7

Page 2: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

2

檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法

1979 , Marcia Bates ,” Information Search Tactics” Hartly

如何避免找到不相關文章的方法 處理找到過多或過少相關文章的可能對策

Palmer 指分區組合檢索和引用文獻滾雪球法

Pao 指布林邏輯、引用文獻及機率檢索策略

檢索策略 (search strategy) 針對一檢索問題之通盤考量或全面性之規劃 如分區組合檢索法、引用文獻滾雪球法… . 等

檢索技巧 (search heuristics) 為完成特定目的所採取的行動

Page 3: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

3

Briefsearch簡易檢索 最常見的檢索方式 快速簡單 fast and inexpensive 但常是低 recall ,低 precision 適用

主題明確 想瞭解資料庫製作者所使用的敘述語和索引詞彙

確認書目資料 已知書名、作者等

Page 4: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

4

Building Blocks Search分區組合檢索法 亦有人稱為“ block building” 或“ building block” 檢索方式

將索引問題分解成數個主題層面 (facets) 確定主題層面間的關係

通常 facets 間的關係為” AND” ,出現” OR” 或” NOT” 的情況較少

找出可代表各主題層面的檢索詞彙 利用布林邏輯” OR” 做聯集,以求完整性

使用率最高,早期參考晤談表格常依此設計

Page 5: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

5

Building Blocks Search Strategy--1/4

1. Conduct reference interviews

2. Formulate search objectives High recall High precision Moderate levels of recall and precision

3. Select database(s) and search system

4. Identify major concepts or facets and their logical relationships with one another

Page 6: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

6

Building Blocks Search Strategy--2/4

5. Identify search strings that represent the concepts

Words Full-text phrases Pieces of words Descriptors Identifiers Codes Non-semantic bibliographic characteristics

非主題相關的欄位,如資料類型、語言、年代等 包括同義詞、類同義詞、狹義詞、相關詞

fields to be searched

Page 7: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

7

Building Blocks Search Strategy--3/4

6. For each distinct facet of the search, a set of postings will be created for each search string within that facet. The sets are then combined into a single set representing that facet using Boolean OR

7. Following setp#6, the facets sets themselves will be combined with Boolean AND and NOT

8. Plan alternatives

Page 8: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

8

Building Blocks Search Strategy--4/4

9. Formulate the initial statements of the search in the command language of the system

10. Logon and put the search to the system11. Evaluate the intermediate results12. Iterate

Use the interactive features of the system to carry out search heuristics tactics, maneuvers, strategies, tricks, devices, approaches, to try to improve search results

Page 9: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

9

Building blocks approach

Facet A Facet B

Term A1 OR

Term A2 OR

……

…..

Term Ap

Term B1 OR

Term B2 OR

……

…..

Term Bq

Fact C

Term C1 OR

Term C2 OR

……

…..

Term Cr

Answer Set

Boolean combination of facets (AND, OR, NOT)

Page 10: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

10

Building Blocks search sample

Facet 1 Facet 2 Facet 3 Facet 4 Facet 5

RISK MEASUREMENT RISK AVERSION

BEHAVIORAL DECISION THEORY

INSURANCE

risk measurementassessmentchoicedecisionoutcome

risk aversionrisk avoidancerisk neutralityrisk pronerisk tendency

behavioraldecisiontheory

insurance contractbankfinancestockinvestmentadvertisement

Measurement of Risk Tendencies ( looking for high recall )

Boolean Combination:((RISK AND MEASUREMENT) OR RISK AVERSION OR BEHAVIORAL DECISION THEORY) NOT INSURANCE

Page 11: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

11

檢討結果重新檢索 想增加 recall 時

find additional concepts or search terms to add to one or more facets

delete a facet

想增加 precision 時 delete some of the more broader or more ambiguous terms

in the facets add an additional facet to be intersected with the others

Page 12: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

12

Successive facet strategies主題層面連續檢索法— 1/3 其他名稱

fewest postings first (最少筆數優先) most specific concept first (最精確概念優先) successive fractions (非以主題層面開始的連續檢索)

分區 v.s. 主題層面 分區檢索法使用所有主題層面 主題層面連續檢索法設法動用最少的主題層面

決定檢索問題的主題層面後,需確定其優先順序,視結果決定是否要繼續進行檢索

Page 13: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

13

Successive facet strategies--2/3

FirstFacet

SecondFacet

(optional)

OtherFacet

(optional)

OtherFacet

Solution Set

(optional)

AND

AND

例 1 :“ members and activities of 4-H clubs”例 2 :” the emotional, physical, and intellectual characteristics of children who have studied violin with the Suzuki method”

Page 14: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

14

Successive facet strategies--3/3

適用情況 當所有的主題層面以布林運算元結合,很可能產生零筆資料時

當檢索問題中有一至兩個主題層面涵義相當模糊時 當檢索問題具備其他非主題之檢索條件,如資料類型、語言、或出版年代等,可將此非主題檢索條件視為第一個檢索概念時

當檢索者寧願忍受誤引而不願失去相關文章時 當加入其他主題層面所花費的時間和金錢,可能會超越直接列印檢索結果時

當相關文獻過少,檢索者願意檢視一些相關度較低的文章時

Page 15: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

15

Pairwise Facets主題層面配對法— 1/3 將主題層面兩兩配對並取其交集,而後再聯集之

適用情形所有主題層面都同樣重要主題層面之精確性或模糊性相差不大將所有主題層面結合會導致零筆資料

注意:主題層面過多時,盡量以 3-4 個為執行交集的基本單位,以免混淆

Page 16: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

16

Pairwise Facets—2/3

分區組合檢索 主題層面配對檢索

A AND B AND C (A AND B) OR (A AND C) OR (B AND C)

Page 17: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

17

Pairwise Facets—3/3

Facet#1

Facet#2

Facet#3

SolutionSet B Solution

Set A

Sample: A doctoral student wants a high recall bibliography prepared on the relationship between facial musculature and the physiological (autonomic) responding of emotions, e.g., fear.

SolutionSet C

FINAL SOLUTION SET: A OR B OR C

AND

AND

AND

Page 18: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

18

Citation Pearl Growing

引用文獻滾雪球法以 high precision 為目的 由 100%precision (相關的文章),反推追求 recall 不斷從已知相關的文獻中,獲取檢索所需的 descri

ptors 、 identifiers 、 words ,重新進行檢索 適用情形

資料庫無索引典或詞彙集 新興學科

常需重複多次檢索,不適於初學者

Page 19: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

19

Other facet strategies

Multiple Briefsearch 利用不同的 database ,盡量取得 high recall

Interactive Scanning most time-consuming and interactive 如使用 classification codes, natural language

Implied Concepts掌握隱含性概念,視資料庫之主題性質,選用不同詞彙例: possible health hazards from foods cooked using micro

wave ovens

Page 20: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

20

Citation indexing strategies

利用引用 (citing) 與被引用 (cited) 文獻之間的關係,建構檢索策略

Offer highly interdisciplinary and multidisciplinary approaches to online searching

檢索策略 Cited publication 、 Cited Author 、 Cocited Authors國科會人文學研究中心人文學引用文獻資料庫( THC

I ) http://www.hrc.ntu.edu.tw/index.htm

Page 21: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

21

Non-subject, fact, and multiple database searching Non-subject searching

Document type 、 year of publication 、 language 、author 、 corporate source

doublelimiting Fact searching

Search for a known item Multiple database searching

注意收錄欄位和控制語言用法

Page 22: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

22

檢索技巧 (Heuristics)

Language Heuristics Command Language, Database and File

Structure Heuristics Recall and Precision Heuristics

Heuristics for Increasing RecallHeuristics for Increasing Precision

Personal Heuristics

Page 23: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

23

Language Heuristics—1/2

當有下列情形,應使用自然語言檢索 One or more of the concepts of interest involves a subtle n

uance of meaning One or more of the concepts of interest is highly specific One or more of the concepts is relatively new and appropri

ate terms in the controlled vocabulary don not exist A highly comprehensive search is desired (high recall) The literature to be searched is “soft”

Page 24: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

24

Language Heuristics—2/2

當有下列情形,應使用控制詞彙檢索The concepts of interest can be expressed precisely

and unambiguously in the controlled vocabularyA limited search retrieving a limited number of hig

hly pertinent items is desiredThe literature to be searched is “hard”

Page 25: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

25

Command Language, Database and File Structure Heuristics—1/2 Know the stop words used by the search

system Know the sort order associated with the binary

coding system used by the host computer Know which fields are searched by default, if

search fields are not explicitly specified

Page 26: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

26

Command Language, Database and File Structure Heuristics—2/2 Know the parsing rule used to index each field

searched瞭解基本索引檔所包含的欄位

Always question null sets注意檢索欄位所使用的索引法,如單字或片語

Understand Boolean operations with the null set and make use of this knowledge in reformulating search statements

Page 27: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

27

Questions to ask in low recall—1/2

Am I in the correct database? Have I overspecified the search problem? Is there anything done on the topic or problem?

Is there a literature on this search problem? Have sufficient search terms been included to

properly represent each concept of the search?

Page 28: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

28

Questions to ask in low recall—2/2

Where the proximity specifications placed on the search placed on the search terms too restrictive?

Was Boolean logic used correctly? Did I make a technical error, e.g., in spelling or

command syntax? Should I be searching in natural language fields? Have all word forms of search terms bee used?

Should truncation be employed?

Page 29: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

29

Heuristics for Increasing Recall --1/2

Use additional synonyms and near synonyms combined with Boolean OR to represent search concepts

Use more generic terms in addition to specific terms to represent search concepts

Use natural language in addition to controlled vocabulary terms

Search additional subject fields

Page 30: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

30

Heuristics for Increasing Recall --2/2

Delete AND and NOT facets form the formulation Increase term truncation Use less restrictive proximity operators, e.g., require t

hat terms appear in the same paragraph rather than the same sentence

Remove any restrictions from the formulation, e.g., language, date of publication, type of publication

Page 31: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

31

Questions to ask in low precision—1/2

Am I in the correct database? Have I underspecified the search problem? Do I need to disambiguate a concept of the

problem? Have I used Boolean logic correctly? Have I include vague or ambiguous terms, or

terms that are too generic?

Page 32: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

32

Questions to ask in low precision—2/2

Should I restrict search terms to elements of a controlled vocabulary?

Where the proximity specifications too loosely placed on the search terms?

Are false drops resulting from concepts having an unintended relationship with one another?

Has a search term been truncated too severely?

Page 33: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

33

Heuristics for Increasing Precision --1/2

Delete near synonyms and potentially ambiguous terms

Use more specific terms to represent concepts Use controlled vocabulary terms if a concept is

precisely represented by them; delete controlled vocabulary terms that do not describe a concept precisely

If multiple meaning does not appear to be a major problem, search natural language terms that represent the concepts of interest precisely

Page 34: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

34

Heuristics for Increasing Precision --2/2

If none of the above conditions applies, search fewer subject fields, deleting fields in the approximate order; full text, abstract, title, identifier, and descriptor

Add additional facets with AND and NOT Decrease term truncation Use more restrictive proximity operators Add restrictions to the formulation, e.g., by date of pu

blication, type of publication, language, etc.

Page 35: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

35

Personal Heuristics—1/2

Be flexible; stay loose; be willing to look at a search in more than one way. Avoid rigidity in thought and action.

Browse samples of retrieved citations to assess relevancy.

Browse samples of retrieved citations to generate additional search terms.

Be heuristic, interactive. Don’t do “fast batch” searching.

Page 36: 1 資訊檢索策略與技巧 黃慕萱, Chap.6 Harter , Chap. 7. 2 檢索策略 v.s. 檢索技巧 最早為軍方用語 各家看法 1979 , Marcia Bates , ”Information Search

36

Personal Heuristics—2/2

Evaluate one’s own work critically. Always be skeptical of search output. A mindless faith in controlled vocabularies is

not always justified. Be critical of the adequacy of artificial languages for the representation of concepts in documents.