elasticsearch 簡介

Download Elasticsearch 簡介

Post on 16-Jul-2015

774 views

Category:

Software

5 download

Embed Size (px)

TRANSCRIPT

  • elasticsearch. RESTful

    rueiancsie@gmail.com

    2015/3/10

  • Elasticsearch: The Definitive Guide Getting Started

    You know, for search

    life inside a cluster

    Distributed Document Store

    Mapping and Analysis

    Index Management

    inside a shard

    elasticsearch rails gem

  • elasticsearch

    Apache Lucene JSON

    RESTful API

    JAVA

  • Github

  • zip

    elasticsearch Marvel web

  • index, type

    Relational DB Databases Tables Rows Columns Elasticsearch Indices Types Documents Fields

    Index database

    Type table

    Field Column

  • PUT /megacorp/employee/1 { first_name": "John", "last_name" : "Smith", "age" : 25, "about" : "I am hero", "interests": [ "sports", "music" ] }

    1. megacorp Index2. employee type3. _id 1 JSON

  • GET /megacorp/employee/1

    or

    GET /megacorp/employee/_search?q=music

    or

    GET /megacorp/_search?q=hero

    or

  • { "took": 4, "hits": { "total": 1, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 0.095891505, "_source": { "first_name": "John", } } ] } }

  • Employee Directory Tutorial Enable data to contain multi value tags, numbers, and full text.

    Retrieve the full details of any employee.

    Allow structured search, such as finding employees over the age of 30.

    Allow simple full-text search and more-complex phrase searches.

    Return highlighted search snippets from the text in the matching documents.

    Enable management to build analytic dashboards over the data.

    http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_your_feet.html

  • index, shard ? elasticsearch index

    index () - database

    index () -

    inverted index -

    elasticsearch index shard shard

  • life inside a cluster

  • life inside a cluster cluster cluster.name elasticsearch

    cluster

    cluster cluster Index

    cluster

  • cluster

    GREEN shard

    YELLOW shard

    RED shard

    GET /_cluster/health

  • indexPUT /blogs { "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }

    blogs Index 3 shard shard shard

  • cluster shard elasticsearch

  • cluster shard

  • shard

    PUT /blogs/_settings { "number_of_replicas" : 2 }

  • shard

    Index elasticsearch shard

    routing _id routing

    shard

    shard = hash(routing) % number_of_primary_shards

  • 1. Node 1 2. Node 1 Shard 0 Node33. Node 3 shard

    Node 1 Node 2

  • 1. Node 1 2. Node 1 Shard 0 Shard 0

    Node 23. Node 2 Node1

  • 1. Node 1 2. Node 1 Shard 0 Node33. Node 3 _source

    retry_on_conflict 4. Node 3 shard

    Node 1 Node 2

  • 12 date 2014-xx-xx date 2014-09-15

    GET /_search?q=2014 # 12 results GET /_search?q=2014-09-15 # 12 results ! GET /_search?q=date:2014-09-15 # 1 result GET /_search?q=date:2014 # 0 results !

  • elasticsearch

    _all inverted index

    { "tweet": "However did I manage before Elasticsearch?", "date": "2014-09-14", "name": "Mary Jones", "user_id": 1 }

    "However did I manage before Elasticsearch? 2014-09-14 Mary Jones 1"

    _all

  • mapping

    elasticsearch type mapping

    { "gb": { "mappings": { "tweet": { "properties": { "date": { "type": "date", "format": "dateOptionalTime" }, "name": { "type": "string" }, "tweet": { "type": "string" }, "user_id": { "type": "long" } } } } } }

  • exact value full text elasticsearch exact value full text

    exact value Foo != foo

    full text UK United Kingdom jumping leap

  • inverted Index

    elasticsearch inverted index

    The quick brown fox jumped over the lazy dog

    Quick brown foxes leap over lazy dogs in summer

  • inverted Index inverted index

    quick brown

    Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------

    Term Doc_1 Doc_2 ------------------------- brown | X | X quick | X | ------------------------ Total | 2 | 1

  • inverted Index

    Quick quick

    foxes, dogs fox dog

    jumped, leap jump

    (tokenization)(normalization) analysis

    Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------

  • inverted Index

    Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------

    Term Doc_1 Doc_2 ------------------------- brown | X | X dog | X | X fox | X | X in | | X jump | X | X lazy | X | X over | X | X quick | X | X summer | | X the | X | X ------------------------

  • analysis analyzersAnalysis analyzer analyzer

    1. Character filters character filters html & and

    2. Tokenizer

    3. Token filters token filters Quick quick leap jump

  • elasticsearch analyzer

    ik mmseg

    https://github.com/medcl/elasticsearch-analysis-ik

    https://github.com/medcl/elasticsearch-analysis-mmseg

    Lucene Smart Chinese analysis

    https://github.com/elasticsearch/elasticsearch-analysis-smartcn

    http://www.sitepoint.com/efficient-chinese-search-elasticsearch/

  • 12 date 2014-xx-xx date 2014-09-15

    GET /_search?q=2014 # 12 results GET /_search?q=2014-09-15 # 12 results ! GET /_search?q=date:2014-09-15 # 1 result GET /_search?q=date:2014 # 0 results !

    1. 2014 _all 2. 2014-09-15 2014, 09, 15 _all

    2014 3. date exact value4. date exact value date 2014

  • elasticsearch

    query fetch

    GET /_search { "from": 90, "size": 10 }

  • query

    1. Node 3 from + size = 100 priority queue

    2. Node3 shard 0 1 shard priority queue from + size = 100

    3. shard IDs Node3 Node3 priority quere

  • fetch

    1. Node 3 IDs 10 Multi-GET shard

    2. shard metadata Node3

    3. Node3

  • elasticsearch 10 _score

    size from GET /_search?size=10&from=10000

    Index shard1000 shard 10010 50050 10 (10001 - 10010 )

  • scan scroll

    search_type=scan elasticsearch

    scroll=1m _scroll_id size shard size * number_of_primary_shards

    scan scroll index

    GET /old_index/_search?search_type=scan&scroll=1m { "query": { "match_all": {}}, "size": 1000 }

  • index

  • index elasticsearch index

    shard

    mapping

    Analyzer

    index

    index index template

    mapping dynamic_templates

  • index reindex

    Analyzer index

    Index zero downtime

  • index reindex index my_index

    my_index my_index_v1

    my_index_v2 scan scroll my_index_v1 my_index_v2

    my_index my_index_v2

  • inside a shard

  • inside a shard

    shard Lucene

    inverted index shard inverted index

  • inverted index

    inverted index

    processes

    cache cache

  • inverted index Lucene per-segment search segment

    inverted index segment segment.del segment

  • segments segment

    elasticsearch segment

  • elasticsearch: the definitive guide

    getting started

    searching - the basic tools

    full-body search

    search in depthaggregations

  • Logstash, Kibana, Shield, Marvel, Hadoop

    elasticsearch

    logstash - log elasticsearch

    kibana - elasticsearch

    shield - elasticsearch

    Marvel - elasticsearch

    elasticsearch Hadoop API

  • elasticsearch + rails

    gem rails

    chewy

    searchkick

    elasticsearch-rails

  • searchkick elasticsearch

    SQL like query DSL

    zero downtime reindex

    Did you mean

    searchkick Model elasticsearch index Model Roadmap

  • elasticsearch-rails

    elasticsearch rails searchkick elasticsearch index Model Model