elasticsearch with tire

Download ElasticSearch with Tire

Post on 10-May-2015

1.373 views

Category:

Documents

2 download

Embed Size (px)

DESCRIPTION

Introduction to how does a search engine do with elasticsearch and tire.

TRANSCRIPT

  • 1.ElasticSearch with Tire@AbookYun, Polydice Inc.Wednesday, February 6, 131

2. Its all about Search How does search work? ElasticSearch TireWednesday, February 6, 13 2 3. How does search work?A collection of articles Article.nd(1).to_json{ title: One, content: The ruby is a pink to blood-red colored gemstone. } Article.nd(2).to_json{ title: Two, content: Ruby is a dynamic, reective, general-purpose object-oriented programming language. } Article.nd(3).to_json{ title: Three, content: Ruby is a song by English rock band. }Wednesday, February 6, 13 3 4. How does search work?How do you search? Article.where(content like ?, %ruby%)Wednesday, February 6, 134 5. How does search work?The inverted indexT0 = it is what it isT1 = what is itT2 = it is a bananaa: {2}banana: {2}is: {0, 1, 2}it: {0, 1, 2}what: {0, 1}A term search for the terms what, is and it{0, 1} {0, 1} {0, 1, 2} = {0, 1}Wednesday, February 6, 13 5 6. How does search work? The inverted indexTOKEN ARTICLES ruby article_1article_2 article_3 pink article_1 gemstone article_1dynamicarticle_2 reective article_2programmingarticle_2 songarticle_3englisharticle_3 rockarticle_3Wednesday, February 6, 136 7. How does search work? The inverted indexArticle.search(ruby) ruby article_1 article_2 article_3 pink article_1 gemstone article_1dynamic article_2 reectivearticle_2programming article_2 song article_3english article_3 rock article_3Wednesday, February 6, 13 7 8. How does search work? The inverted indexArticle.search(song) ruby article_1 article_2 article_3 pink article_1 gemstone article_1dynamic article_2 reectivearticle_2programming article_2 song article_3english article_3 rock article_3Wednesday, February 6, 13 8 9. module SimpleSearch def index document, content tokens = analyze content store document, tokensputs "Indexed document #{document} with tokens:", tokens.inspect, "n" end def analyze content # Split content by words into "tokens" content.split(/W/). # Downcase every word map { |word| word.downcase }. # Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^d+/ || word ==} end def store document_id, tokens tokens.each do |token| ((INDEX[token] ||= []) [article1, article2, article3],language => [article1, article4],java => [article1, article4],also => [article1],stone=> [article3],song => [article2]}Wednesday, February 6, 13 12 13. How does search work?Search the indexSimpleSearch.search rubyResults for token ruby:* article1* article2* article3Wednesday, February 6, 1313 14. How does search work?Search is ... Inverted Index{ ruby: [1,2,3], language: [1,4] } +Relevance Scoring How many matching terms does this document contain? How frequently does each term appear in all your documents? ... other complicated algorithms.Wednesday, February 6, 13 14 15. ElasticSearch ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene. http://github.com/elasticsearch/elasticsearchWednesday, February 6, 1315 16. ElasticSearchTerminologyRelational DB ElasticSearchDatabase Index TableType Row DocumentColumnFieldSchemaMapping Index*EverythingSQLquery DSLWednesday, February 6, 13 16 17. ElasticSearchRESTful # Add document curl -XPUT http://localhost:9200/articles/article/1 -d { title: One } # Delete document curl -XDELETE http://localhost:9200/articles/article/1 # Search curl -XGET http://localhost:9200/articles/_search?q=OneWednesday, February 6, 13 17 18. ElasticSearch JSON in / JSON out # Query curl -XGET http://localhost:9200/articles/article/_search -d { query: {term: { title: One }} } # Results { _shards: {total: 5,success: 5,failed: 0}, hits: {total: 1,hits: [{_index: articles,_type: article,_id: 1,_source: { title: One, content: Ruby is a pink to blood-red colored gemstone. } }]}Wednesday, February 6, 13 18 19. ElasticSearchDistributed The discovery module is responsible for discovering nodes within a cluster, as well as electing a master node. The responsibility of the master node is to maintain the global cluster global cluster state, and act if nodes join or leave the cluster by reassigning shards.Automatic Discovery ProtocolNode 1 Node 2Node 3 Node 4 MasterWednesday, February 6, 1319 20. ElasticSearchDistributed by default, every Index will split into 5 shards and duplicated in 1 replicas. Index AA1 A2 A3 A4A5ShardsA1A2A3 A4A5ReplicasWednesday, February 6, 1320 21. ElasticSearch Query DSLQueries Filters- query_string- term- term- query- wildcard- range- boosting- bool- bool- and- ltered - or- fuzzy - not- range - limit- geo_shape - match_all- ... - ...Wednesday, February 6, 13 21 22. ElasticSearchQuery DSLQueriesFilters- query_string - term- term - query- wildcard With Relevance- With Cache range- boosting Without Cache - bool Without Relevance- bool - and- ltered- or- fuzzy- not- range- limit- geo_shape- match_all- ...- ...Wednesday, February 6, 1322 23. ElasticSearch Facets curl -X DELETE "http://localhost:9200/articles" curl -X POST "http://localhost:9200/articles/article" -d {"title" : "One", "tags" : ["foo"]} curl -X POST "http://localhost:9200/articles/article" -d {"title" : "Two", "tags" : ["foo", "bar"]} curl -X POST "http://localhost:9200/articles/article" -d {"title" : "Three", "tags" : ["foo", "bar", "baz"]} curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"eld" : "tags"} } }}Wednesday, February 6, 13 23 24. ElasticSearchFacets "facets" : { "tags" : { "_type" : "terms", "missing" : 0, "total": 5, "other": 0, "terms" : [ { "term" : "foo", "count" : 2 }, { "term" : "bar", "count" : 2 }, { "term" : "baz", "count" : 1 }] }Wednesday, February 6, 1324 25. ElasticSearchMapping curl -XPUT http://localhost:9200/articles/article/_mapping -d{ "article": { "properties": { "tags": { "type": "string", "analyzer": "keyword" }, "title": { "type": "string", "analyzer": "snowball", "boost": 10.0 }, "content": {"type": "string","analyzer": "snowball" } } } } curl -XGET http://localhost:9200/articles/article/_mappingWednesday, February 6, 1325 26. ElasticSearchAnalyzer curl -XPUT http://localhost:9200/articles/article/_mapping -d{ article: {properties: { title: { type: string, analyzer: trigrams } }} } curl -XPUT localhost:9200/articles/article -d { title: cupertino }C u p e rt i noC u pu p ep e r . . .Wednesday, February 6, 13 26 27. Tire A rich Ruby API and DSL for the ElasticSearch search engine. http://github.com/karmi/tire/Wednesday, February 6, 1327 28. Tire ActiveRecord Integration # New rails application $ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb # Callback class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks end # Create a article Article.create :title => "I Love Elasticsearch",:content => "...",:author => "Captain Nemo",:published_on => Time.now # Search Article.search doquery { string love }facet(timeline) { date :published_on, :interval => month }sort{ by :published_on, desc } endWednesday, February 6, 13 28 29. Tire ActiveRecord Integration class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks # Setting settings :number_of_shards => 3,:number_of_replicas => 2,:analysis => {:analyzer => {:url_analyzer => {tokenizer => lowercase,lter => [stop, url_ngram]}}}# Mappingmapping do indexes :title, :analyzer => :not_analyzer, :boost => 100 indexes :content, :analyzer => snowballend endWednesday, February 6, 1329 30. Reference # github http://github.com/elasticsearch/elasticsearch http://github.com/karmi/tire/ # Slides https://speakerdeck.com/kimchy/the-road-to-a-distributed-search-engine https://speakerdeck.com/karmi/elasticsearch-your-data-your-search-euruko-2011 https://speakerdeck.com/clintongormley/to-innity-and-beyondWednesday, February 6, 1330 31. ThanksWednesday, February 6, 1331