let’s talk about storage & recovery methods for non...
TRANSCRIPT
Let’s Talk About
Storage & Recovery Methods for
Non-Volatile Memory Database Systems
Joy Arulraj, Andrew Pavlo, Subramanya R. Dulloor
Non-Volatile Memory (NVM) DRAM SSD DISK NVM
Read Latency 1x 500x 105x 2-4x Write Latency 1x 5000x 105x 2-8x Persistence û ü ü ü Byte-level access ü û û ü Write endurance ü û ü û
Executive Summary
Design a DBMS storage engine for NVM v Re-‐examine tradi-onal assump-ons v Storage and recovery op-miza-ons
NVM Hardware Emulator • Configure DRAM load and store latency • Throttle memory bandwidth • Two interfaces
– Filesystem Interface (fread/fwrite) – Allocator Interface (malloc/free)
DBMS Platform • Lightweight DBMS
– NVM-only design – No volatile DRAM – Runs on the hardware emulator
• Pluggable backend storage architecture • Timestamp-based concurrency control
3 Storage Engines
ENGINE TYPE TABLE STORAGE LOGGING EXAMPLE
In-Place Updates ü ü VoltDB
Copy-on-Write Updates ü û LMDB
Log-Structured Updates û ü LevelDB
#1: In-Place Updates
TABLE
TUPLE UPDATED TUPLE
WRITE AHEAD LOG
SNAPSHOTS INDEX
TUPLE DELTA
UPDATED TUPLE
1
2
3
Optimizing for NVM • Non-volatile pointer
– Non-volatile data structures – Valid even after system restarts
• Exclusively use allocator interface – Byte-addressable NVM – Not filesystem interface
WRITE AHEAD LOG
TUPLE DELTA 1
ONLY POINTERS NOT ACTUAL DATA
TABLE
TUPLE UPDATED TUPLE 2
NON-‐VOLATILE INDEX
#1: NVM In-Place Updates
SNAPSHOTS
û Benefits
v Reduce data duplica-on v Almost instantaneous recovery v No redo log, only an undo log
#2: Copy-on-Write Updates
Dirty Directory Current Directory
Leaf 2 Leaf 1 Updated Leaf 1
Master Record
Copy-‐on-‐Write B+Tree
TUPLE UPDATED TUPLE 1
2
3
#2: NVM Copy-on-Write Updates
Dirty Directory Current Directory
Leaf Leaf
Master Record
TUPLE
2
3
ONLY POINTERS ONLY POINTERS
ALLOCATOR-‐BASED (NOT FILE-‐BASED)
Updated Leaf 1
UPDATED TUPLE 1
Benefits v Support smaller B+tree nodes v Cheaper dirty directory crea-on v Reduces data duplica-on
#3: Log-Structured Updates
INDEX MEMTABLE
WRITE AHEAD LOG
TUPLE DELTA
TUPLE DELTA
INDEX SSTABLES
TUPLE DELTA
TUPLE
1
2 3
WRITE AHEAD LOG
INDEX
TUPLE DELTA
TUPLE DELTA
INDEX
TUPLE DELTA
TUPLE
1
2 3
SSTABLES MEMTABLE
ONLY POINTERS
IMMUTABLE MEMTABLE
#3: NVM Log-Structured Updates
û Benefits
v Cheaper SSTable crea-on v Reduces data duplica-on
Summary • Storage optimizations
– Avoid unnecessary data duplication – Leverage byte-addressability
• Recovery optimizations – NVM-optimized recovery protocols – Non-volatile data structures
Experimental Evaluation • NVM Hardware Emulator
– NVM latency = 2x DRAM latency • Yahoo! Cloud Serving Benchmark
– 6 storage engines – 2 million records + 1 million transactions – Write-heavy workload – High-skew setting
Throughput
0
400000
800000
1200000
Throughp
ut (txn/sec)
Storage Engines
INP COW LOG NVM-‐INP NVM-‐COW NVM-‐LOG
Tradi5onal
NVM-‐Op5mized
2x
4x 2x
Write Endurance
0
100
200
300 NVM
Stores (M)
Storage Engines
INP COW LOG NVM-‐INP NVM-‐COW NVM-‐LOG
75%
60%
80%
Recovery Latency
0.01
1
100
10000
1000 10000 100000
Recovery Laten
cy (m
s)
Number of transac\ons
INP COW LOG NVM-‐INP NVM-‐COW NVM-‐LOG
Instant Recovery
Takeaways • Designing for NVM is important
– Higher throughput – Longer device lifetime – Faster recovery
• System design principles – Non-volatile data structures – Need a system-level rethink
Peloton @ CMU • Hybrid storage hierarchy
– NVM + DRAM oriented design • HTAP workloads
– Real-time analytics and fast transactions
Filesystem Interface • Optimized for byte-addressable NVM • Bypass page-cache & block device layer • File I/O requires only one copy
– 7x better performance than EXT4