apache solr소개 20120629

40
Apache Solr 소소 2012.06.29 소소소 [email protected]

Upload: dosang-yoon

Post on 07-Dec-2014

5.599 views

Category:

Technology


4 download

DESCRIPTION

An open-source search server based on the Lucene Java search library

TRANSCRIPT

Page 1: Apache solr소개 20120629

Apache Solr 소개

2012.06.29윤도상[email protected]

Page 2: Apache solr소개 20120629

2

Solr 기능• Schema

– 색인할 문서의 필드와 그 필드 타입을 쉽게 정의– Lucene 의 Analyzer 사용– Dynamic Field 를 지원 – Copy Field 를 사용하여 여러 field 를 검색 가능한 단일 field 로 묶을 수 있음 – 외부 파일을 통해 금지어 등을 설정할 수 있다 .

• Query – HTTP 인터페이스로 XML/XSLT, JSON, Python, Ruby 와 같은 응답 포멧 설정– 쿼리와 필드 값에 근거한 Faceted Search 제공 – query 로 검색 정렬을 정의 가능 – 용이한 검색 score 설정– query 에 특정 field 에 대한 가중치 부여 가능

• Core – query handler 와 확장 가능한 XML format– unique key field 에 기반하여 중복 문서 탐지

• Caching – query 결과 , 필터 , 문서에 대한 캐시 설정– 사용자 수준에서의 캐시 설정 지원

• Replication – rsync transport 를 통해 효과적인 분산 색인

• Admin Interface – cache, update, query 상태를 알려줌 . – Text Analyzer 에 대한 디버거 제공– 웹 쿼리 인터페이스 제공

Page 3: Apache solr소개 20120629

Architecture

Page 4: Apache solr소개 20120629

4

Overall Architecture

Page 5: Apache solr소개 20120629

5

Component

Page 6: Apache solr소개 20120629

6

High Availability

Page 7: Apache solr소개 20120629

7

Replication

Page 8: Apache solr소개 20120629

Configure

Page 9: Apache solr소개 20120629

9

Schema.xml

• Overall

<schema> <types> … </types> <fields> … </fields> <uniqueKey /> <solrQueryParser /> <copyField /> <dynamicField /></schema>

Page 10: Apache solr소개 20120629

10

Schema.xml

• Type

<types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0“ /> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt” /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true“ /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType></types>

Page 11: Apache solr소개 20120629

11

Schema.xml

• Fields

• uniqueKey– <uniqueKey>id</uniqueKey>

• solrQueryParser– <solrQueryParser defaultOperator="OR"/>

• copyField– <copyField source=“title" dest=“test"/>– <copyField source=“content" dest=“test"/>

• dynamicField– <dynamicField name="*_dt" type=“date" indexed="true" stored="true"/>– <dynamicField name="*_text" type="string" indexed="true" stored="true"/>

<fields> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name=“release_dt" type="date" indexed="true" stored="true" /> <field name="title" type="text_general" indexed="true" stored="true" /> <field name=“content" type="text_general" indexed="true" stored="true" /> <field name=“text" type="text_general" indexed="true" stored="true" /></fields>

Page 12: Apache solr소개 20120629

12

Schema.xml

• Example for bigram analyzer

• Dynamically Reload

<fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CJKWidthFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.CJKBigramFilterFactory"/> </analyzer></fieldType>

$curl ‘http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0’

[ 예 ) $ curl 'http://localhost:8981/solr/admin/cores?action=RELOAD&core=news ‘]

Page 13: Apache solr소개 20120629

Multi-Core

Page 14: Apache solr소개 20120629

14

설정 파일

1. solr 디렉토리에 solr.xml 설정파일 수정

2. solr 디렉토리에 core 의 홈 디렉토리 생성

3. 생성한 각 디렉토리에 conf 와 data 디렉토리를 생성한다 . data 경로는 solrconfig.xml 에서 아래와 같은 부분에서 설정할 수 있다 . <dataDir>${solr.data.dir:}</dataDir>

<solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores" defaultCoreName=“core1"> <core name=“core1" instanceDir=“core_dir1" /> <core name=“core2" instanceDir=“core_dir2" /> </cores></solr>

- solr- core_dir1- core_dir2

- solr- core_dir1

- conf- data

- core_dir2- conf- data

Page 15: Apache solr소개 20120629

Web Admin Interface

Page 16: Apache solr소개 20120629

16

Web Admin Interface

• Config, Schema, Distribution 정보 조회• Query Interface• 각종 통계

– Caches: lookups, hits, hitratio, inserts, evictions, size– RequestHandlers: requests, errors– UpdateHandler: adds, deletes, commits, optimizes– IndexReader, open-time, index-version, numDocs, maxDocs

• Analysis Debugger– 각 분석 단계에 대한 결과를 보여줌– 쿼리와 색인에 대한 매치에 대한 정보를 보여줌

Page 17: Apache solr소개 20120629

Solr Document

Page 18: Apache solr소개 20120629

18

XML

• Document

• Indexing

• Update

• Commit

<add> <doc> <field name="employeeId">05991</field> <field name="office">Bridgewater</field> <field name="skills">Perl</field> <field name="skills">Java</field> </doc></add>

$ curl http://localhost:8983/solr/update?commit=true -H “Content-Type: text/xml” \ --data-binary ‘<add><doc><field name="id">testdoc</field></doc></add>’

<delete> <id>05991</id> <id>06000</id> <query>office:Bridgewater</query> <query>office:Osaka</query></delete>

$ curl http://localhost:8983/solr/update -H “Content-Type: text/xml” \ --data-binary ‘<add><doc boost="2.5“><field name="employeeId">05991</field> \ <field name="office" boost="2.0">Bridgewater</field> </doc> </add>’

$ curl http://localhost:8983/solr/update -H “Content-Type: text/xml” \ --data-binary ‘<commit waitFlush="false" waitSearcher="false"/>’

Page 19: Apache solr소개 20120629

19

Json

• Document

• Indexing

• Update/Delete

• Commit

[ { "id" : "MyTestDocument", "title" : "This is just a test“ }]

$ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d \ ' [ { "id" : "MyTestDocument", "title" : "This is just a test" } ]'

$ curl http://localhost:8983/solr/update?commit=true

$ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d ' { "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} }, "add": {"doc": {"id" : "TestDoc2", "title" : "another test“} }, “delete”: {"id" : "TestDoc1“ } }, “delete”: {“query" : “Test“, 'commitWithin':'500' } }, }'

Page 20: Apache solr소개 20120629

20

CVS

• Document

• Indexing

• Example from Mysql Dump

fieldnames=id,,category100,”title”, ”This Value is "“food“”"

$ curl http://localhost:8983/solr/update/csv --data-binary @test.csv -H 'Content-type:text/plain; charset=utf-8'

fieldnames=id,title,category100,”title”, ”This Value is "“food“”"

[test.cvs] [test.cvs]

$ curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09&escape=\&stream.file=/tmp/result.text‘

Page 21: Apache solr소개 20120629

Data Handler Interface

Page 22: Apache solr소개 20120629

22

Full-import

• 테스트 DB 구성 예Create database solr;Grant alter, select, insert, update, delete on solr.* to solr@localhost identified by ‘solr’;

Create table maker (mid int primary key auto_increment,name varchar(30) not null,lastmodified datetime );

Create table product (id int primary key auto_increment,mid int not null,name varchar(30) not null,hname varchar(30) not null,lastmodified datetime );

Insert into maker(name, lastmodified) values('apple', '2012-05-11 17:00:00');Insert into maker(name, lastmodified) values('sony', '2012-05-11 17:00:00');Insert into maker(name, lastmodified) values('microsoft', '2012-05-11 17:00:00');

Insert into product(mid, name, hname, lastmodified) values(1, 'iphone', ' 아이폰 ', '2012-05-11 17:00:00');Insert into product(mid, name, hname, lastmodified) values(1, 'ipod', ' 아아팟 ', '2012-05-11 17:00:00');Insert into product(mid, name, hname, lastmodified) values(1, 'ipad', ' 아이패드 ', '2012-05-11 17:00:00');Insert into product(mid, name, hname, lastmodified) values(2, 'walkman', ' 워크맨 ', '2012-05-11 17:00:00');Insert into product(mid, name, hname, lastmodified) values(2, 'vaio', ' 바이오 ', '2012-05-11 17:00:00');Insert into product(mid, name, hname, lastmodified) values(3, 'windowsxp', ' 윈도우 xp', '2012-05-11 17:00:00');Insert into product(mid, name, hname, lastmodified) values(3, 'windowx7', ' 윈도우 7', '2012-05-11 17:00:00');

Page 23: Apache solr소개 20120629

23

Full-import

• MYSQL Connection 설정– Solrconfig.xml 파일에서 db 설정 파일을 지정한다 .

– db-data-config.xml 파일에서 데이터에 대한 SQL 문을 적용한다 .

<requestHandler name="/dataimport“ class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">db-data-config.xml</str> </lst></requestHandler>

<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver” url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/> <document> <entity name="product" query="select id, mid, name from product"> <field column="id" name="pid" /> <field column="mid" name="mid" /> <field column="name" name="pname" /> <field column=“hname" name=“hname" /> <entity name="maker" query="select mid, name from maker where mid = '${product.mid}'"> <field column="mid" name="mid" /> <field column="name" name="mname" /> </entity> </entity> </document></dataConfig>

Page 24: Apache solr소개 20120629

24

Full-import

• 색인 설정– Shema.xml 파일에서 검색 필드를 설정

– Solr 실행

– 색인 실행

<field name="pid" type="string" indexed="true" stored="true" required="true" /> <field name="mid" type="int" indexed="true" stored="true" multiValued="false“ /><field name="pname" type="text" indexed="true" stored="true" multiValued="true“ /><field name="mname" type="text" indexed="true" stored="true" multiValued="true“ />……..<defaultSearchField>pname</defaultSearchField><defaultSearchField>mname</defaultSearchField>……..<uniqueKey>pid</uniqueKey>……..<copyField source="pname" dest="text"/><copyField source="mname" dest="text"/>

java -Dsolr.solr.home="./example-DIH/solr/" -jar start.jar

http://localhost:8983/solr/db/dataimport?command=full-import

Page 25: Apache solr소개 20120629

25

Delta-import

• 테스트 DB 구성 예Insert into maker(name, lastmodified) values('Samsung', '2012-05-14 14:00:00');Insert into maker(name, lastmodified) values('LG', '2012-05-14 14:00:00');

Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyS', ' 겔럭시 S', '2012-05-14 14:00:00');Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyA', ' 겔럭시 A', '2012-05-14 14:00:00');Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyNote', ' 겔럭시노트 ', '2012-05-14 14:00:00');Insert into product(mid, name, hname, lastmodified) values(5, 'OptimusLTE', ' 옵티머스 LTE', '2012-05-14 14:00:00');Insert into product(mid, name, hname, lastmodified) values(5, 'VegaLTE', ' 베가 LTE', '2012-05-14 14:00:00');

Page 26: Apache solr소개 20120629

26

Delta-import

• MYSQL Connection 설정– db-data-config.xml 파일에서 데이터에 대한 SQL 문을 적용한다 .

– 색인 실행

<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/> <document> <entity name="product" pk="id“ query="select * from product“ deltaImportQuery="select * from product where id='${dataimporter.delta.id}'“ deltaQuery="select id from product where lastmodified > '${dataimporter.last_index_time}'"> <field column="id" name="pid" /> <field column="mid" name="mid" /> <field column="name" name="pname" /> <entity name="maker" pk="mid“ query="select mid from maker where mid='${product.mid}'"> <field column="mid" name="mid" /> <field column="name" name="mname" /> </entity> </entity> </document></dataConfig>

http://localhost:8983/solr/db/dataimport?command=delta-import

Page 27: Apache solr소개 20120629

Index

Page 28: Apache solr소개 20120629

28

Index

• 기존 데이터를 모두 지움

• 다음과 같이 post.jar 파일을 이용하여 색인함

※ 주의– 처음 색인 파일 생성시

– 색인 파일 갱신시

$ java -Durl=http://localhost:8983/solr/core1/update/?commit=true -jar post.jar core1_data.xml$ java -Durl=http://localhost:8983/solr/core2/update/?commit=true -jar post.jar core1_data.xml

<doc> <field name="id">id1</field> <field name=“title“>title1</field></doc>

<update> <doc> <field name="id">id1</field> <field name=“title“>title1</field> </doc></update>

$ java -Durl=http://localhost:$port/solr/update/?commit=true -Ddata=args -jar $dir/post.jar "<delete><query>*:*</query></delete>"

Page 29: Apache solr소개 20120629

Search

Page 30: Apache solr소개 20120629

30

Search Parameter

Parame-ter

De-fault

Description

q 검색 쿼리 . 예 ) q=video 혹은 q=title:spiderman^10 text:spiderman

start 0 검색된 결과 리스트에 대한 Offset

rows 10 반환될 결과 문서 수

fl *반환될 필드 ( 필드명은 comma 로 구분 )

예 ) fl=*,score 혹은 fl=id, name

qf 결과로써 제공받을 필드 지정 . 예 ) q=superman&qf=title subject

sort오름 / 내림차순으로 검색할 필드 지정예 ) sort=inStock asc, price desc 혹은 sort=price asc

wt Writer type. 예 ) wt=json 혹은 wt=xml

fq필터 쿼리 지정 ( 결과내 검색 기능 )

예 ) q=video&fq=superman

hl 하이라이트 필드 지정 . 예 ) hl=true&hl.fl=name, description

facet

Faceted Search예 ) facet=true&facet.field=cat facet.query=price:[0 TO 100]&facet.query=price:[100 TO *] 

debug-Query

검색결과에 debug 결과를 추가해 보여줌

Page 31: Apache solr소개 20120629

31

Query Examples

• mission 이나 impossible 이 포함되고 releaseDate 로 내림차순 검색– q=mission impossible; releaseDate desc

• mission 을 포함하고 actor 에 cruise 가 포함되지 않은 문서를 검색– q=+mission –actor:cruise

• mission impossible 이 붙고 , actor 에 cruise 가 포함되지 않은 문서 검색– q=“mission impossible” –actor:cruise

• title 에 spiderman 을 description 의 spiderman 보다 10 의 가중치 부여– q=title:spiderman^10 description:spiderman

• description 필드에서 spiderman 과 movie 가 10 단어 이내의 문서 검색– q=description:“spiderman movie”~10

• HDTV 를 반드시 포함하고 weight 이 40 이상인 문서를 검색– q=+HDTV +weight:[40 TO *]

• Wildcard queries• q=te?t• q=te*t• q=test*

Page 32: Apache solr소개 20120629

32

Search Relevancy

Page 33: Apache solr소개 20120629

33

Faceted Browsing

Page 34: Apache solr소개 20120629

Autocomplete

Page 35: Apache solr소개 20120629

35

Suggest

• 설정– Solrconfig.xml 에 suggest 기능을 추가한다 .

<searchComponent name="suggest" class="solr.SpellCheckComponent"> <lst name="spellchecker"> <str name="name">suggest</str> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str> <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str> <str name="field">name_autocomplete</str> </lst> </searchComponent> <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <str name="spellcheck">true</str> <str name="spellcheck.dictionary">suggest</str> <str name="spellcheck.count">10</str> </lst> <arr name="components"> <str>suggest</str> </arr> </requestHandler>

Page 36: Apache solr소개 20120629

36

Suggest

• 설정– Shema.xml 에 suggest 필드를 추가한다 .

• 검색 실행 (http://localhost:8983/solr/db/suggest?spellcheck.build=true)

<fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1” generateNumber-Parts="1"  catenateWords="1" catenateNumbers="1" catenateAll="0” splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer></fieldType><field name="name_autocomplete" type="text_auto" indexed="true" stored="true” multiVal-ued="false" /><copyField source="name" dest="name_autocomplete" />

http://localhost:8983/solr/db/suggest?q=윈도

http://localhost:8983/solr/db/suggest?q=겔

Page 37: Apache solr소개 20120629

Basic Dictionary- 동의어 /불용어 사전 -

Page 38: Apache solr소개 20120629

38

동의어 사전

• 항목 (synonyms.txt)

• 테스트 쿼리 [Query: window 7]

Window => windowxp window7 window8 window 7, door

Page 39: Apache solr소개 20120629

39

동의어 사전

• 테스트 쿼리 [Query: window]

• 테스트 쿼리 [Query: door]

Page 40: Apache solr소개 20120629

40

불용어 사전

• 항목 (stopwords.txt)

• 테스트 쿼리 [Query: window 7]

• 테스트 쿼리 [Query: window]

• 테스트 쿼리 [Query: door]

Window