let’s talk about storage & recovery methods for non...

23
Let’s Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems Joy Arulraj, Andrew Pavlo, Subramanya R. Dulloor

Upload: lyliem

Post on 20-Apr-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Let’s Talk About

Storage & Recovery Methods for

Non-Volatile Memory Database Systems

Joy Arulraj, Andrew Pavlo, Subramanya R. Dulloor

Non-Volatile Memory (NVM) DRAM SSD DISK NVM

Read Latency 1x 500x 105x 2-4x Write Latency 1x 5000x 105x 2-8x Persistence û ü ü ü Byte-level access ü û û ü Write endurance ü   û   ü   û  

Executive Summary

Design  a  DBMS  storage  engine  for  NVM  v Re-­‐examine  tradi-onal  assump-ons  v Storage  and  recovery  op-miza-ons  

NVM Hardware Emulator •  Configure DRAM load and store latency •  Throttle memory bandwidth •  Two interfaces

– Filesystem Interface (fread/fwrite) – Allocator Interface (malloc/free)

DBMS Platform •  Lightweight DBMS

– NVM-only design – No volatile DRAM – Runs on the hardware emulator

•  Pluggable backend storage architecture •  Timestamp-based concurrency control

3 Storage Engines

ENGINE TYPE TABLE STORAGE LOGGING EXAMPLE

In-Place Updates ü ü VoltDB

Copy-on-Write Updates ü û LMDB

Log-Structured Updates û ü LevelDB

#1: In-Place Updates

TABLE  

TUPLE          UPDATED  TUPLE  

WRITE  AHEAD  LOG  

SNAPSHOTS  INDEX  

     TUPLE  DELTA  

         UPDATED  TUPLE  

1

2

3

Optimizing for NVM •  Non-volatile pointer

– Non-volatile data structures – Valid even after system restarts

•  Exclusively use allocator interface – Byte-addressable NVM – Not filesystem interface

WRITE  AHEAD  LOG  

     TUPLE  DELTA  1

ONLY  POINTERS  NOT  ACTUAL  DATA  

TABLE  

TUPLE          UPDATED  TUPLE  2

NON-­‐VOLATILE  INDEX  

#1: NVM In-Place Updates

SNAPSHOTS  

       û  Benefits  

v Reduce  data  duplica-on  v Almost  instantaneous  recovery  v No  redo  log,  only  an  undo  log  

#2: Copy-on-Write Updates

Dirty  Directory  Current  Directory  

Leaf  2  Leaf  1   Updated  Leaf  1  

Master  Record  

Copy-­‐on-­‐Write  B+Tree  

TUPLE   UPDATED  TUPLE  1

2

3

#2: NVM Copy-on-Write Updates

Dirty  Directory  Current  Directory  

Leaf  Leaf  

Master  Record  

TUPLE  

2

3

ONLY  POINTERS  ONLY  POINTERS  

ALLOCATOR-­‐BASED  (NOT  FILE-­‐BASED)  

Updated  Leaf  1  

UPDATED  TUPLE  1

Benefits  v Support  smaller  B+tree  nodes  v Cheaper  dirty  directory  crea-on  v Reduces  data  duplica-on  

#3: Log-Structured Updates

INDEX  MEMTABLE  

WRITE  AHEAD  LOG  

TUPLE  DELTA  

TUPLE  DELTA  

INDEX  SSTABLES  

TUPLE  DELTA  

TUPLE  

1

2 3

WRITE  AHEAD  LOG  

INDEX  

TUPLE  DELTA  

TUPLE  DELTA  

INDEX  

TUPLE  DELTA  

TUPLE  

1

2 3

SSTABLES  MEMTABLE  

ONLY  POINTERS  

IMMUTABLE  MEMTABLE  

#3: NVM Log-Structured Updates

       û  Benefits  

v Cheaper  SSTable  crea-on  v Reduces  data  duplica-on  

Summary •  Storage optimizations

– Avoid unnecessary data duplication – Leverage byte-addressability

•  Recovery optimizations – NVM-optimized recovery protocols – Non-volatile data structures

Experimental Evaluation •  NVM Hardware Emulator

– NVM latency = 2x DRAM latency •  Yahoo! Cloud Serving Benchmark

– 6 storage engines – 2 million records + 1 million transactions – Write-heavy workload – High-skew setting

Throughput

0  

400000  

800000  

1200000  

Throughp

ut  (txn/sec)  

Storage  Engines  

INP  COW  LOG        NVM-­‐INP  NVM-­‐COW  NVM-­‐LOG  

Tradi5onal  

NVM-­‐Op5mized  

2x  

4x  2x  

Write Endurance

0  

100  

200  

300  NVM

 Stores  (M)  

Storage  Engines  

INP  COW  LOG        NVM-­‐INP  NVM-­‐COW  NVM-­‐LOG  

75%  

60%  

80%  

Recovery Latency

0.01  

1  

100  

10000  

1000   10000   100000  

Recovery  Laten

cy  (m

s)  

Number  of  transac\ons  

INP  COW  LOG        NVM-­‐INP  NVM-­‐COW  NVM-­‐LOG  

Instant Recovery

Takeaways •  Designing for NVM is important

– Higher throughput – Longer device lifetime – Faster recovery

•  System design principles – Non-volatile data structures – Need a system-level rethink

Peloton @ CMU •  Hybrid storage hierarchy

– NVM + DRAM oriented design •  HTAP workloads

– Real-time analytics and fast transactions

END jarulraj@

Thanks !

Filesystem Interface •  Optimized for byte-addressable NVM •  Bypass page-cache & block device layer •  File I/O requires only one copy

– 7x better performance than EXT4

Allocator Interface •  NVM-aware memory allocator

– No system calls – Bypass kernel’s VFS layer

•  Flush CPU cache for durable writes – 10x better performance than FS interface