approaching join index: presented by mikhail khludnev, grid dynamics

56

Upload: lucidworks

Post on 14-Jul-2015

207 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Page 2: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Approaching Join Indexyet another one join algorithm

Mikhail Khludnevprincipal engineer

Page 3: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

PRIVILEGED AND CONFIDENTIAL

• Grid Dynamics is a Silicon Valley-based leading provider of scalable, next-generation commerce technology solutions

• Record of outperformance with Tier 1 retail clients

• Fortune 1000 client relationships

Page 4: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

About Me● principal engineer at Grid Dynamics

● spoke at few last LuceneRevolutions

● contributed BlockJoin query parser for Solr - {!parent}

● blogged about it at http://blog.griddynamics.com/

● tried to fix threads at DataImportHandler

http://google.com/+MikhailKhludnev

Page 5: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

You are expected to know● how Lucene searches/filters

● how it counts facets

● that there are segments

● what is DocValues

● why to join

● RDBMS joins: nested loop join, sort-merge join and hash join.

Page 6: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

I’m expected to know● query-time join

● index-time join

● yet another one join

Page 7: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Lucene/Solr Is Strong● searching

○ filtering

● analytics

○ facets

○ pivots

○ stats

Page 8: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Lucene/Solr Is Weak in

● robust joins

○ multiple entities

○ relations

Page 9: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Joins in General

Page 10: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Joins in General

Page 11: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Joins in General

PK=FK

Page 12: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Joins in General

PK=FK

Page 13: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Joins in General

PK=FK

Page 14: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

children

Joins in General

1:M

parents

Page 15: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Executing Join Query

q

Page 16: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Executing Join Query

q

Page 17: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Executing Join Query

q

Page 18: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Executing Join Query

qfq

Page 19: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Executing Join Query

qfq

Page 20: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Join in General

parents ∩ join-relation ∩ children

Page 21: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Join in General

parents ∩ join-relation ∩ children

● input● output● enumeration order

Page 22: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

JoinUtil

q

“25”

“17”

“17”

“25”

“25”“56”

“56”

“56”

“25”

“4”

“61”

FK[doc#]

Page 23: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

JoinUtil

q

FK[doc#]

“17”

“17”

“25”

“25”“56”

“56”

“56”

“25”

“4”

“61”

“25”

“25”

“17”

...

Page 24: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

JoinUtil

q

FK

“1” →△…“17”→△“25”→△...

“25”

“17”

...

“25”

“17”

“17”

“25”

“25”“56”

“56”

“56”

“25”

“4”

“61”

FK[doc#]

Page 25: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

JoinUtil

q

“1” →△…“17”→△“25”→△...

“25”

“17”

...

fq“25”

“17”

“17”

“25”

“25”“56”

“56”

“56”

“25”

“4”

“61”

FK[doc#]FK

Page 26: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Block Joindoc#

Page 27: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Block Joindoc#

Page 28: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Block Joindoc#1

0

0

1

0

0

1

0

0

1

0

Page 29: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Block Joindoc#1

0

0

1

0

0

1

0

0

1

0

q

Page 30: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Block Joindoc#1

0

0

1

0

0

1

0

0

1

0

qfq

Page 31: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Block Joindoc#1

0

0

1

0

0

1

0

0

1

0

qfq

Page 32: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Comparison

JoinUtil BlockJoin

searchingis fast

reindexingis fast

Page 33: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Comparison

JoinUtil BlockJoin

searchingis fast

reindexingis fast

Page 34: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Comparison

JoinUtil BlockJoin

searchingis fast < ? <

reindexingis fast > ? >

Page 35: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Comparison

JoinUtil BlockJoin

searchingis fast < ? <

reindexingis fast > ? >

doc#

Page 36: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Comparison

JoinUtil BlockJoin

searchingis fast < ? <

reindexingis fast > ? >

doc#

Page 37: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Comparison

JoinUtil BlockJoin

searchingis fast < ? <

reindexingis fast > ? >

doc#

Page 38: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Join Index

q

doc#[doc#]

3

6

0

3

10

0

63

Page 39: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Join Index

q

doc#[doc#]

3

6

0

3

10

0

63

Page 40: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Join Index

q

doc#[doc#]

fq 3

6

0

3

10

0

63

Page 41: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Join Index

qfq

doc#[doc#]

2

4

1

6

5 10

9

8

Page 42: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Join Index

q

doc#[doc#]

fq

2

4

1

6

5 10

9

8

Page 43: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Join Index

q

doc#[doc#]

fq

2

4

1

6

5 10

9

8

Page 44: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Benchmarking 2.9 M docs

https://github.com/m-khl/lucene-solr/tree/dvjoin-benchmark

Latency, ms the bigger the worse

10

27

14BlockJoin

(i-time)

Join Index

JoinUtil(q-time)

Page 45: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

doc#[doc#]Indexing is still a problem

2

43

6

103

10

0

63

5 10

9

8

Page 46: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Further Plans● incremental join-index update

● perhaps just calculate and cache it

● join in both directions

● calculate optimal execution plan of segments enumeration

Page 47: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Summary● Joins in General

● JoinUtil vs Block-join

● {!scorejoin } - SOLR-6234

● updatable DocValues

● opportunities for improving query-time joins:○ eliminate term enum○ choose lower cardinality side for enumeration

Page 48: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

References● Searching relational content with Lucene's BlockJoinQuery

http://blog.mikemccandless.com/● Solr Experience: search parent-child relations. Part I

Solr block-join supporthttp://blog.griddynamics.com/

● https://wiki.apache.org/solr/Join● http://www.slideshare.net/martijnvg/document-relations● https://issues.apache.org/jira/browse/SOLR-6234 {!scorejoin }

● Updatable DocValues Under the Hoodhttp://shaierera.blogspot.com/

● Subject: How to openIfChanged the most recent merge?at: [email protected]

Page 49: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Thanks for Joining us!

Page 50: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Off scope

Page 51: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Joins’ Zoo in Lucene True Joins

● query-time join

○ JoinUtil

○ {!join }

○ {!scorejoin } - SOLR-6234

● index-time join aka block-join {!

parent}

Page 52: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Joins’ Zoo in Lucene Workarounds

● term positions/SpanQueries

● FieldCollapsing/Grouping

● term decoration

○ spatial

● multivalue fields

True Joins● query-time join

○ JoinUtil

○ {!join },

○ {!scorejoin } - SOLR-6234

● index-time join aka block-join {!

parent}

Page 53: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Joins’ Zoo in Lucene Workarounds

● term positions/SpanQueries

● FieldCollapsing/Grouping

● term decoration

○ spatial

● multivalue fields

True Joins● query-time join

○ JoinUtil

○ {!join },

○ {!scorejoin } - SOLR-6234

● index-time join aka block-join {!

parent}

Page 54: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

Two phase update problem

Subject: How to openIfChanged the most recent merge?at: [email protected]

Page 55: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics

JoinUtil● query-time

● indexing is fast

● searching is slow, why?

○ expensive term enum

○ single enumeration order

BlockJoin● index-time

● reindexing whole block is as expensive as mandatory

● searching is darn fast, however

○ can’t reorder child docs

Page 56: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics