Transcript
Page 1: Intro to elasticsearch

Your Data, Your Search !

问志光2016-06-27

Page 2: Intro to elasticsearch

Outline Information retrieval Indexing & Searching Elasticsearch

Page 3: Intro to elasticsearch

Information retrieval Information Retrieval(IR) is finding

material(usually documents) of an unstructured nature(usually text) that statisfies an information need from within large collections(usually stored on computers).

Search Engine is a software system that is designed to search for information. It’s a kind of implementation of IR.

Page 4: Intro to elasticsearch

What is search engine? A search engine is

An index engine for documents A search engine on indexes

A search engine is more powerful to do searches:

It’s designed for it !

Page 5: Intro to elasticsearch

Search Engine Architecture

Page 6: Intro to elasticsearch
Page 7: Intro to elasticsearch
Page 8: Intro to elasticsearch
Page 9: Intro to elasticsearch

Problems ?? How to store the data ? How to index the data ? How to search the data ?

Page 10: Intro to elasticsearch

How to store the data ?

INVERTED LIST

Page 11: Intro to elasticsearch

How to

the data ?

INDEX

Page 12: Intro to elasticsearch

the follow two files File1: Students should be allowed to

go out with their friends, but not allowed to drink beer.

File2: My friend Jerry went to school to see his students but found them drunk which is not allowed.

Page 13: Intro to elasticsearch

Step 1: Tokenzier Split doc into words Remove the punctuation Remove stop word (the, a, this, that etc.)

“Students”,“ allowed”,“ go”,“ their”,“ friends”,“ allowed”,“ drink”,“ beer”,“My”,“ friend”,“ Jerry”,“went”,“ school”,“ see”,“ his”,“ students”,“ found”,“ them”,“ drunk”,“ allowed”

Page 14: Intro to elasticsearch

Step2: Linguistic Processor Lowercase Stemming, cars -> car, etc. Lemmatizatio, drove -> drive, etc.

“student”,“ allow”,“ go”,“ their”,“ friend”,“ allow”,“ drink”,“ beer”,“my”,“ friend”,“ jerry”,“ go”,“ school”,“ see”,“ his”,“ student”,“ find”,“ them”,“ drink”,“ allow”

Term

Page 15: Intro to elasticsearch

Step3: IndexTerm Document ID

student 1allow 1go 1their 1friend 1allow 1… …

Dict Sort Posting list

Page 16: Intro to elasticsearch
Page 17: Intro to elasticsearch

How to

the data ?

SEARCH

Page 18: Intro to elasticsearch

Step1: User search query• Suppose you have the follow query:

lucene AND learned NOT hadoop

Page 19: Intro to elasticsearch

Step2: Lexical & Syntax Analysis Identify words and keywords

Words: lucene, learned, hadoop Keywords: AND, NOT

Building a syntax tree

lucene learned

hadoopAND

Not

Page 20: Intro to elasticsearch

Step3: Search Search in the Inverted List Sort, Conjunction, Disconjunction Scorer

Page 21: Intro to elasticsearch

full text search

RESTful API

real time,Search and

analytics engine

open source

high availability

schema free

JSON over HTTP

Lucene based

distributed

RESTful API

ElasticSearch

Page 22: Intro to elasticsearch

Elastic Search Distributed and Highly Available Search Engine.

Each index is fully sharded with a configurable number of shards. Each shard can have one or more replicas. Read / Search operations performed on either one of the replica shard.

Multi Tenant with Multi Types. Support for more than one index. Support for more than one type per index. Index level configuration (number of shards, index storage, ...).

Document oriented No need for upfront schema definition. Schema can be defined per type for customization of the indexing process.

Various set of APIs HTTP RESTful API Native Java API. All APIs perform automatic node operation rerouting.

(Near) Real Time Search. Reliable, Asynchronous Write Behind for long term persistency. Built on top of Lucene

Each shard is a fully functional Lucene index All the power of Lucene easily exposed through simple configuration / plugins.

Per operation consistency Single document level operations are atomic, consistent, isolated and durable.

Open Source under the Apache License, version 2 ("ALv2")

Page 23: Intro to elasticsearch

Terminologies of Elastic Search Cluster Node Index Shard

Page 24: Intro to elasticsearch

Cluster

● A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes

● A cluster is identified by a unique name which by default is "elasticsearch"

Terminologies of Elastic Search

Page 25: Intro to elasticsearch

Node

● It is an elasticsearch instance (a java process)

● A node is created when a elasticsearch instance is started

● A random Marvel Charater name is allocated by default

Terminologies of Elastic Search

Page 26: Intro to elasticsearch

Index

● An index is a collection of documents that have somewhat similar characteristics. eg:customer data, product catalog

● Very crucial while performing indexing, search, update, and delete operations against the documents in it

● One can define as many indexes in one single cluster

Terminologies of Elastic Search

Page 27: Intro to elasticsearch

Document

● It is the most basic unit of information which can be indexed

● It is expressed in json (key:value) pair. ‘{“user”:”nullcon”}’

● Every Document gets associated with a type and a unique id.

Terminologies of Elastic Search

Page 28: Intro to elasticsearch

Shard● Every index can be split into multiple shards

to be able to distribute data.● The shard is the atomic part of an index,

which can be distributed over the cluster if you add more nodes.

Terminologies of Elastic Search

Page 29: Intro to elasticsearch
Page 30: Intro to elasticsearch
Page 31: Intro to elasticsearch

A terminology comparisonRelational database Elasticsearch

Database IndexTable TypeRow DocumentColumn FieldSchema MappingIndex Everything is indexedSQL Query DSLSELECT * FROm tb … GET http://UPDATE tb SET … PUT http://

Page 32: Intro to elasticsearch

Playing with Elasticsearch

REST API: http://host:port/[index]/[type]/[_action/id]HTTP Methods: GET, POST,PUT,DELETE

Page 33: Intro to elasticsearch

Playing with Elasticsearch• Search

– curl –XGET http://localhost:9200/my_index/test/_search– curl –XGET http://localhost:9200/my_index/_search– curl –XPUT http://localhost:9200/_search

• Meta Data– curl –XPUT http://localhost:9200/my_index/_status

• Documents:– curl –XPUT http://localhost:9200/my_index/test/1– curl –XGET http://localhost:9200/my_index/test/1– curl –XDELETE http://localhost:9200/my_index/test/1

Page 34: Intro to elasticsearch

Example: IndexCurl –XPUT http://localhost:9200/my_index/test/1 -d ‘{ "name": "joeywen", "value": 100}’

Page 35: Intro to elasticsearch

Example: SearchCurl –XGET http://localhost:9200/my_index/_search –d ‘{ “query”: { “match_all”: {} }}’

Total number of docs

Relevance

Search time

Max score

Page 36: Intro to elasticsearch

Creating, indexing, or deleting a single document

Page 37: Intro to elasticsearch

Plugins-Kopf

Page 38: Intro to elasticsearch

Plugins-head

Page 39: Intro to elasticsearch

Web

Page 40: Intro to elasticsearch

Q&A


Top Related