solr6 の紹介(第18回 solr勉強会資料) (2016年6月10日)

SOLR 6 の紹介2016 年 6 月 10 日第 18 回 Solr 勉強会

自己紹介➤ 株式会社ロンウイット

➤ 西潟一生➤ コンサルタント

➤ Apache Solr, Apache ManifoldCF➤ コンサルティング➤ 技術サポート➤ トレーニング講師

などに従事

アジェンダ➤ Solr 5 からの変更点について

➤ サポートされる Java のバージョン➤ インデックスの互換性➤ スキーマの変更方法➤ スコア計算方法➤ レプリカ & シャードの削除コマンドの仕様変更➤ facet.date.* の仕様変更

➤ Solr 6 の新機能について➤ Parallel SQL➤ Streaming Expressions➤ Cross Data Center Replication➤ Graph Query Parser

など

Solr 5 からの変更点

Java 8 is required➤ Java 8 以上が必須

➤ SolrJ クライアントライブラリも含む

Index Format Changes➤ Solr 4 系以前のインデックスとは非互換

➤ 4 系のインデックスを利用したい場合は Solr 5.5 に含まれる　　　　　　　　　 Lucene IndexUpgrader を使用すること➤ Solr 6 から Solr 4 系インデックスを直接読めるようになるかも？

➤ https://issues.apache.org/jira/browse/SOLR-9051

Managed Schema is now the Default➤ Managed Schema がデフォルト

➤ schema.xml は使わない．スキーマ設定は Schema API を使う．➤ 従来通り schema.xml を使う場合は solrconfig.xml に以下を記述

<schemaFactory class=“ClassicIndexSchemaFactory”/>

➤ schema.xml から Managed Schema への移行は簡単➤ conf 内の managed-schema ファイルを削除し，作成済み schema.xml を conf に配置後， Solr 起動

➤ 以下の記述が入った managed-schema ファイルが新たに作成され，配置した schema.xml は schema.xml.bak にリネームされる

Managed Schema is now the Default (Example)➤ 追加curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field-type":{ "name":"myNewTxtField", "class":"solr.TextField", "positionIncrementGap":"100", "analyzer":{ "charFilters":[{ "class":"solr.PatternReplaceCharFilterFactory", "replacement":"$1$1", "pattern":"([a-zA-Z])\\\\1+" }], "tokenizer":{ "class":"solr.WhitespaceTokenizerFactory" }, "filters":[{ "class":"solr.WordDelimiterFilterFactory", "preserveOriginal":"0" }]}}, "add-field" : { "name":"sell-by", "type":"myNewTxtField", "stored":true }}' http://localhost:8983/solr/gettingstarted/schema

➤ 削除curl -X POST -H 'Content-type:application/json' --data-binary '{ "delete-field-type":{ "name":"myNewTxtField" }}' http://localhost:8983/solr/gettingstarted/schema

Default Similarity Changes➤ デフォルトのスコアの計算方法が TF/IDF から Okapi BM25 に変更

➤ 検索結果のランキング精度が改善➤ 参考資料

➤ https://www.elastic.co/blog/found-bm-vs-lucene-default-similarity

Replica & Shard Delete Command Changes➤ “DELETESHARD”, “DELETEREPLICA” コマンドで，以下のディレクトリがデフォルトで削除

➤ Instance ディレクトリ➤ Data ディレクトリ➤ Index ディレクトリ

➤ 削除したくない場合は以下のようなパラメータを false にする➤ deleteInstanceDir➤ deleteDataDir➤ deleteIndex

➤ 例　　　 http://localhost:8983/solr/admin/collections?

action=DELETEREPLICA&collection=test2&shard=shard2&replica=core_node3&deleteInstanceDir=false

facet.data.* Parameters Removed➤ Solr 3 系で deprecated となった facet.date パラメータが完全に削除

➤ facet.range で代用可

Doc Values➤ 非テキスト系フィールドで， DocValues がデフォルトで有効

➤ メモリ節約，ディスクサイズ増➤ 後述する Parallel SQL を使う時は DocValues を有効にしておくこと➤ 参考資料

➤ http://blog.johtani.info/blog/2014/10/02/elasticsearch-1-4-0-beta-released-ja/

➤ https://lucidworks.com/blog/2013/04/02/fun-with-docvalues-in-solr-4-2/

SOLR 6 の新機能

Parallel SQL➤ Solr で SQL が使用可能に

➤ 現在は Solr Cloud でのみ使用可

Example➤ HTTPcurl --data-urlencode 'stmt=SELECT fieldA, count(*) FROM collection1 GROUP BY fieldA ORDER BY count(*) DESC LIMIT 10'

http://localhost:8983/solr/collection1/sql?aggregationMode=facet

➤ JDBCConnection con = null;try { con = DriverManager.getConnection("jdbc:solr://" + zkHost + "?collection=collection1&aggregationMode=facet"); stmt = con.createStatement(); rs = stmt.executeQuery("SELECT fieldA, count(*) FROM collection1 GROUP BY fieldA ORDER BY count(*) DESC LIMIT 10"); while(rs.next()) { String a_s = rs.getString("fieldA"); }} finally { rs.close(); stmt.close(); con.close();}

Parallel SQL Specs➤ テーブル名＝コレクション名➤ 大小文字無視 (case insensitive)➤ サポートされる句

➤ WHERE➤ ORDER BY➤ LIMIT➤ DISTINCT

➤ GROUP BY➤ WHERE 句は Solr のシンタックス適用可

➤ OR 検索WHERE fieldA = ‘term1 term2’ → term1 OR term2 ※ デフォルトオペレーターが OR の場合

➤ 範囲検索WHERE fieldB = ‘[0 TO 100]’

➤ JDBC Driver または HTTP でリクエスト可

Limitations, etc➤ Solr Cloud のみで使用可➤ delete, insert, update 非対応➤ select されるフィールドは docValues=true 必須➤ フィールドの異なり数が多い場合は aggregationMode=map_reduce が高速　　　　　　

　　　そうでない場合は aggregationMode=facet が高速➤ map_reduce を指定する例

curl --data-urlencode 'stmt=SELECT fieldA FROM collection1 GROUP BY fieldA LIMIT 10' http://localhost:8983/solr/collection1/sql?aggregationMode=map_reduce

Streaming Expressions➤ 並列実行されたタスクが結合可能

➤ 現在は Solr Cloud でのみ使用可➤ まだ experimental

➤ Source Stream➤ search➤ jdbc➤ facet➤ stats➤ topic

➤ Stream Decorators➤ complement➤ daemon➤ innerJoin➤ intersect➤ hashJoin➤ merge➤ leftOuterJoin➤ outerHashJoin➤ parallel➤ reduce➤ rollup➤ select➤ top➤ unique➤ update

Streaming Expressions(Example)➤ 異なるコレクションの検索結果マージ例 (exampleDocs の books.json と hd.xml がインデキシング済み )curl --data-urlencode 'expr=merge (search(gettingstarted,q="*:*",fl="id,name",sort="id asc",qt="/export"), search(gettingstarted2,q="*:*",fl="id,name",sort="id asc",qt="/export"), on="id asc")' ‘localhost:8983/solr/gettingstarted/stream’…{"result-set":{"docs":[{"name":["Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300"],"id":"6H500F0"},{"name":["The Lightning Thief"],"id":"978-0641723445"},{"name":["The Sea of Monsters"],"id":"978-1423103349"},{"name":["Sophie's World : The Greek Philosophers"],"id":"978-1857995879"},{"name":["Lucene in Action, Second Edition"],"id":"978-1933988177"},{"name":["Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133"],"id":"SP2514N"},{"EOF":true,"RESPONSE_TIME":17}]}}

Cross Data Center Replication➤ データセンターを跨いだレプリケーションをサポート

➤ まだ experimental

➤ active/passive モードで動作➤ レプリケーションは，「元」から「先」への一方

通行➤ 「先」が変更されても「元」への反映はない

➤ 「先」は結果整合性をサポート

Graph Query Parser➤ Solr のドキュメントの関係性をツリー構造で表現でき，検索が可能に

➤ 以下のようなユースケースが考えられる➤ アクセスコントロール

➤ ドキュメントに紐付いているユーザーをトラバース➤ シソーラス辞書の構築

➤ 後述

Graph Query Parser(Example)➤ 登録

curl -H 'Content-Type: application/json' 'http://localhost:8983/solr/my_graph/update?commit=true' --data-binary '[

{"id":"A","foo": 7, "out_edge":["1","9"], "in_edge":["4","2"] }, {"id":"B","foo": 12, "out_edge":["3","6"], "in_edge":["1"] }, {"id":"C","foo": 10, "out_edge":["5","9"], "in_edge":["2"] }, {"id":"D","foo": 20, "out_edge":["4","7"], "in_edge":["3","5"] }, {"id":"E","foo": 17, "out_edge":[], "in_edge":["6"] }, {"id":"F","foo": 11, "out_edge":[], "in_edge":["7"] }, {"id":"G","foo": 7, "out_edge":["8"], "in_edge":[] }, {"id":"H","foo": 10, "out_edge":[], "in_edge":["8"] }]’

➤ 検索http://localhost:8983/solr/my_graph/query?fl=id&q={!

graph+from=in_edge+to=out_edge}id:A..."response":{"numFound":6,"start":0,"docs":[ { "id":"A" }, { "id":"B" }, { "id":"C" }, { "id":"D" }, { "id":"E" }, { "id":"F" } ]}

Graph Query Parser(Example)➤ 登録

curl -H 'Content-Type: application/json' 'http://localhost:8983/solr/my_graph/update?commit=true' --data-binary '[

{"id":"A","name": 果物 , "out_edge":["1","2","3"], "in_edge":[] }, {"id":"B","name": りんご , "out_edge":[], "in_edge":["1"] }, {"id":"C","name": みかん , "out_edge":[], "in_edge":["2"] }, {"id":"D","name": ぶどう , "out_edge":[], "in_edge":["3"] }, {"id":"E","name": 野菜 , "out_edge":["4","5"], "in_edge":[] }, {"id":"F","name": いちご , "out_edge":[], "in_edge":["4"] }, {"id":"G","name": スイカ , "out_edge":[], "in_edge":["5"] }, {"id":"H","name": 米 , "out_edge":[], "in_edge":[] }

➤ 検索http://localhost:8983/solr/my_graph/query?fl=name&q={!graph from=in_edge to=out_edge

returnRoot=false}name: 果物..."response":{"numFound":3,"start":0,"docs":[ { "name":" いちご " }, { "name":" ぶどう " }, { "name":" りんご " },]}

果物

みかんぶどうりんご

野菜

いちごスイカ

米

solr6 の紹介(第18回 solr勉強会 資料) (2016年6月10日)

Technology

solr6 の紹介(第18回 solr勉強会資料) (2016年6月10日)