technische universität münchen massively parallel sort-merge joins (mpsm) in main memory...

29
Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and Thomas Neumann Technische Universität München

Upload: wyatt-mcwilliams

Post on 27-Mar-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems

Martina Albutiu, Alfons Kemper, and Thomas Neumann

Technische Universität München

Page 2: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Hardware trends …

• Huge main memory• Massive processing

parallelism• Non-uniform Memory

Access (NUMA)• Our server:

– 4 CPUs– 32 cores– 1 TB RAM– 4 NUMA partitions

2

CPU 0

Page 3: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Main memory database systems

• VoltDB, Hana, MonetDB• HyPer: real-time business intelligence queries on

transactional data*

3* http://www-db.in.tum.de/research/projects/HyPer/

Page 4: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

How to exploit these hardware trends?

• Parallelize algorithms• Exploit fast main memory access

Kim, Sedlar, Chhugani: Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. VLDB‘09

Blanas, Li, Patel: Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs. SIGMOD‘11

• AND be aware of fast local vs. slow remote NUMA access

4

Page 5: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

NUMA partition 1 NUMA partition 3

NUMA partition 2 NUMA partition 4

hashable

Ignoring NUMA

5

core 1

core 2

core 3

core 4

core 5

core 6

core 7

core 8

Page 6: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

How much difference does NUMA make?

100%

scaled execution

time

sort partitioningmerge join

(sequential read)

2275

6 m

s

4173

44 m

s

1000

ms

7440

ms

837

ms

1294

6 m

s

6

remotelocal

synchronized

sequentialremote

local

Page 7: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

The three NUMA commandments

C1

Thou shalt not write thy neighbor‘s memory randomly -- chunk the data,

redistribute, and then sort/work

on your data locally.

7

C2

Thou shalt read thy neighbor‘s memory only sequentially

-- let the prefetcher hide

the remote access latency.

C3

Thou shalt not wait for thy neighbors -- don‘t use fine-

grained latching or locking and avoid synchronization points of parallel

threads.

Page 8: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Basic idea of MPSM

R

S

chunk R

chunk S

R chunks

S chunks

8

Page 9: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Basic idea of MPSM

• C1: Work locally: sort• C3: Work independently: sort and merge join• C2: Access neighbor‘s data only sequentially

MJ

chunk R

chunk S

sort R chunks locally

sort S chunks locally

R chunks

S chunks

merge join chunksMJ MJ MJ

9

Page 10: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Range partitioning of private input R

• To constrain merge join work• To provide scalability in the number of parallel workers

10

Page 11: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

• To constrain merge join work• To provide scalability in the number of parallel workers

Range partitioning of private input R

R chunks range partition R

range partitioned R chunks

11

Page 12: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

• To constrain merge join work• To provide scalability in the number of parallel workers

S is implicitly partitioned

Range partitioning of private input R

range partitioned R chunks

sort R chunks

sort S chunksS chunks

12

Page 13: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

• To constrain merge join work• To provide scalability in the number of parallel workers

S is implicitly partitioned

Range partitioning of private input R

range partitioned R chunks

sort R chunks

sort S chunksS chunks

MJ MJMJMJ merge join only relevant parts

13

Page 14: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Range partitioning of private input

• Time efficientbranch-freecomparison-freesynchronization-free

and

• Space efficient densely packed in-place

by using radix-clustering and precomputed target partitions to scatter data to

14

Page 15: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Range partitioning of private input

919

7

19

3211

17

2234

318

2026

21

17

23

31

2026

chun

k of

wor

ker

W1

chun

k of

wor

ker

W2

histogram of worker W1

7 = 00111 <16≥16

17 = 10001

histogram of worker W2

<16≥16

43

34

prefix sum of

worker W100

prefix sum of

worker W243

W1

W2

W1

W2

1

19

19=10011

15

5

2=00010

2

Page 16: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Range partitioning of private input

919

73

211

17

2234

318

2026

chun

k of

wor

ker

W1

chun

k of

wor

ker

W2

histogram of worker W1

7 = 00111 <16

17 = 10001

histogram of worker W2

<16

43

34

prefix sum of

worker W100

prefix sum of

worker W243

73

9

1

4

17

31

2

8

21

23

2026

W1

W2

W1

W2

1

19

19=10011

16

≥16

≥165

2=00010

Page 17: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Real C hacker at work …

Page 18: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Skew resilience of MPSM

• Location skew is implicitly handled• Distribution skew:

– Dynamically computed partition bounds– Determined based on the global data distributions of R and S– Cost balancing for sorting R and joining R and S

18

Page 19: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Skew resilience

1. Global S data distribution– Local equi-height histograms (for free) – Combined to CDF

71

1015223166

2121725334278

81 90S1 S2

# tuples

key value

CDF

19

50

16

13

Page 20: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Skew resilience

2. Global R data distribution– Local equi-width histograms as before– More fine-grained histograms

134

2

31

208

6

histogram

<8[8,16)[16,24)≥24

13

820

31

3211

8 = 01000

2 = 00010

20

R1

Page 21: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Skew resilience

3. Compute splitters so that overall workloads are balanced*: greedily combine buckets, thereby balancing the costs of each thread for sorting R and joining R and S are balanced

# tuples

key value

CDF

histogram

3211

+ =42

6

13

820

31

21* Ross and Cieslewicz: Optimal Splitters for Database Partitioning with Size Bounds. ICDT‘09

Page 22: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Performance evaluation

• MPSM performance in a nutshell:– 160 mio tuples joined per second– 27 bio tuples joined in less than 3 minutes– scales linearly with the number of cores

• Platform HyPer1:– Linux server – 1 TB RAM– 4 CPUs with 8 physical cores each

• Benchmark:– Join tables R and S with schema

{[joinkey: 64bit, payload: 64bit]}– Dataset sizes ranging from 50GB to 400GB

22

Page 23: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Execution time comparison

• MPSM, Vectorwise (VW), and Blanashash join*

• 32 workers• |R| = 1600 mio

(25 GB), varying size of S

* S. Blanas, Y. Li, and J. M. Patel: Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs. SIGMOD 2011

23

Page 24: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Scalability in the number of cores

• MPSM andVectorwise (VW)

• |R| = 1600 mio (25 GB),|S|=4*|R|

24

Page 25: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

25

Location skew

• Location skew in Rhas no effect because of repartitioning

• Location skew in S:in the extreme caseall join partners ofRi are found in onlyone Sj (either localor remote)

Page 26: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Distribution skew: anti-correlated data

without balanced partitioning with balanced partitioning

26

Page 27: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Distribution skew : anti-correlated data

27

Page 28: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Conclusions

• MPSM is a sort-based parallel join algorithm• MPSM is NUMA-aware & NUMA-oblivious• MPSM is space efficient (works in-place)• MPSM scales linearly in the number of cores • MPSM is skew resilient• MPSM outperforms Vectorwise (4X) and Blanas et al‘s

hash join (18X)• MPSM is adaptable for disk-based processing

– See details in paper

28

Page 29: Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and

Technische Universität München

Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems

Martina Albutiu, Alfons Kemper, and Thomas Neumann

Technische Universität München

THANK YOU FOR YOUR ATTENTION!