1
In-Memory Database전준민 , 정주성 , 이한민 , 곽하녹
2
Table of Contents
1. Introduction
2. Disk Resident DB vs In-Memory DB
3. Column Store
4. Durability
5. Data Overflow
6. Products of IMDB
7. Optimization Aspects on IMDB
3
1. IntroductionWhat is In-Memory Database (IMDB) ?
Architecture
Rise of IMDB
Applications
Myths about IMDB
4
What is In-Memory Database (IMDB)?• An in-memory database system is a database manage-
ment system that stores data entirely in main memory.
5
What is In-Memory Database (IMDB)?
6
Architecture
• Fast data access• Algorithms optimized on
main memory• Efficient memory usage• Durability
7
Rise of the IMDB
• Multicore Processors• Cheaper and Bigger Memories• Demands on Fast Databases
8
Rise of the IMDB
9
Rise of the IMDB
10
Applications
• Low-latency, high volume systems
11
Myths about IMDB
• Given the same amount of RAM, disk DBs can perform at the same speed as IMDBs (by using caching technology).
• If a RAM disk is created and a traditional disk DB is de-ployed on it, it delivers the same performance as an in-memory database.
• write on disk• buffer manager• indexes for disk• redundant data
12
2. Disk Resident DB (DRDB) vs In-Memory DB (IMDB)DRDB vs IMDB : Overview
Indexes
Concurrency Control
13
DRDB vs IMDB : Overview [1]
DRDB IMDB
File I/O Carries File I/O burden No file I/O burden
Storage UsageAssumes storage is abun-dant
Uses storage more effi-ciently
AlgorithmsAlgorithm optimized for disk
Algorithms optimized for memory
CPU Cycles More CPU cycles Less CPU cycles
Persistence Non-volatile Volatile
Lock Fine Locks Coarse Locks
14
Indexes: B+-Tree in DRDB [2]
• The redundant data are kept in some index structures, to reduce I/O.
15
Indexes: T-Tree in IMDB [3]
• The indexes in IMDBs are focused on reduced memory consumption and CPU cycles.
• In the early 90's, Lehman and Carey proposed the T-tree as an index structure for main memory database.
• The T-tree indexes are more efficient than B-trees in that they require less memory space and fewer CPU cy-cles.
16
Indexes: T-Tree in IMDB
• The T-tree evolved from AVL Trees and B-Trees.
17
Indexes: Hash indexes in IMDB
• Hash indexes are used for key-value based in-memory databases (cache servers) such as Redis and Mem-cached.
18
Concurrency Control
• In DRDBs, locking granules are low level.• To reduce contention• To increase parallelism
• In IMDBs, locks are coarse-grained thanks to fast pro-cessing.
• Locking granules like a relation or an entire database• No need to look up hash table• Serial scheduling is enough in most cases
19
3. Column StoreWhat is Column Store?
Benefits of Column Store
Delta Storage
20
What is Column Store?
• Column Store• stores data tables as columns
of data rather than as rows of data
21
Benefits of Column Store [4]
• Column stores are more suitable in IMDB than row stores • Better parallelism• Better compression• Faster data access
• Using parallel processing.• Especially for aggregations.
.
22
23
Benefits of Column Store: Parallel-ism [5] • Column storage can easily be separated into equal parts
which leads to effective parallel processing. • Highly parallelized scan operations are available which
are faster than indexed searches.• The row store cannot compete if processing is set-ori-ented and requires column operations, but most appli-cations are based on set-oriented processing and not di-rect tuple access.
24
Benefits of Column Store: Parallel-ism • Highly parallelized scan operations using column stores
are faster than using just ordinary indexes.
25
Benefits of Column Store: Compres-sion• Column store allows highly efficient compression be-
cause the columns contain only few distinct values.• Compression
26
Delta Storage [6]
• Since writing on compressed column stores in real time is inefficient, delta storage techniques are used.
• Delta Storage• optimized for write operations
• Main Storage• compressed column store
27
Delta Storage
• INSERT • insert a new record in the delta storage. The merge process will
move the record from delta to main.
• DELETE • A DELETE statement will select the record and mark it as invalid by
setting a flag (for main or delta). The merge process will delete the record from memory once there is no open transaction active for it anymore.
• UPDATE• An UPDATE statement will insert a new version of the record. The
merge process will move the latest version from delta to main. Old versions will be deleted once there is no open transaction active for them anymore.
28
Delta Storage: Simplified View of In-sert-Only Approach
29
Delta Storage
• The merge process starts when the delta storage grows big enough.
30
4. DurabilityLogging and Checkpointing
Command Logging
NVM Logging
31
Durability• Durability is difficult to support in IMDBs• Many IMDBs have added durability via the following
mechan-isms• Checkpoints• Transaction logging
32
Checkpointing
• Checkpoints in DRDB• Bring pages on disk up to date• Reduce the work of recovery
• Checkpoints in IMDB• Make a copy of the data on disks (snapshot)• Truncate the logs
33
Logging and Checkpointing
Transaction
Log Buffer
Mem
ory
Physica
l Disk
Memory Ta-blespace log sync
Checkpoint Image File
REDO Log File
• Problem• Log I/O becomes bot-
tleneck
• How long do we need to keep the log?
• Until the next check-point
34
• TPCC benchmarking on DRDBs (New Order transaction)
• Logging takes up a non-small portion
• Larger portion for IMDBs
Logging and Checkpointing [7]
35
Command Logging [8]
• Light-weight, coarse-grained logging technique• Logical logging• Advantages
• Write substantially fewer bytes per transaction than physical logging • Reduce run time overhead
• Disadvantages• Slow recovery
• Failures that require recovery to ensure system availability are much less frequent
• 1.5X higher throughput than main-memory optimized imple-mentation of physical logging
36
Command Logging
37
• NVM (Non-Volatile Memory)• low read/write latency like DRAM• persistent write like SSD
NVM Logging [9]
DRAM NAND Flash NVM
Byte-Address-able
Yes No Yes
Capacity 1X 4X 2-4X
Latency 1X 400X 3-5X
38
• DBMS relies on both DRAM and NVM
NVM+DRAM Architecture
39
5. Data overflowAnti-caching
Project Siberia
40
Data overflow
• Datasets may not fit in DRAM• IMDB Solutions
• Anti-caching• Project Siberia
41
Anti-caching [10]
• Used in H-Store• Cold data is moved to disk in a safe manner• Bloom filter used for tracking data• Manage cold data by maintaining a LRU chain
42
Anti-caching
43
Anti-caching
• Fine-grained eviction• eviction is performed at tuple-level, not page-level
• Non-blocking fetches• a transaction that accesses evicted data is simply aborted and
then restarted at a later point
44
Project Siberia [11]
• Used in Hekaton• Automatically and transparently maintain cold data on
cheaper secondary storage• Allow more data to fit in memory• Log-based management of cold data
45
6. Products of IMDBH-Store / VoltDB
Hekaton
SAP HANA
In-memory NoSQL Databases
46
Products of IMDB
47
H-Store / VoltDB
• Distributed row-based in-memory relational database • Targeted for high-performance OLTP processing• Light-weight logging strategy• Anti-caching
48
Hekaton
• Memory-optimized OLTP engine• Fully integrated into Microsoft SQL server• Multi-version concurrency control • Project Siberia
49
SAP HANA
• A distributed in-memory database featured for the inte-gration of OLTP and OLAP
• Provides rich data analytics functionality by offering multiple query language interfaces (e.g., standard SQL, SQLScript, MDX, WIPE, FOX and R)
50
SAP HANA
• Three-level column-oriented unified table structure
51
In-memory NoSQL Databases
• RAMCloud• Distributed in-memory key-value store, featured for low la-
tency, high availability and high memory utilization
• Bitsy• Embeddable in-memory graph database that implements the
Blueprints API, with ACID guarantees on transactions based on the optimistic concurrency mode
52
Comparison of IMDB [12]
Sys-tems Data Model Work-
loads Indexes Fault Toler-ance
Memory Overflow
Relational Databases
H-Store relation(row) OLTPhashing, b+-tree, binary tree
command log-ging, checkpoint, replica
anti-caching
Hekaton relation(row) OLTPlatch-free hashing, Bw-tree
logging, check-point, replica
Project Siberia
SAP HANA
relation, graph, text OLTP, OLAP timeline index
logging, check-point, standby server
table/parti-tion-level swapping
NoSQL Databases
RAM-Cloud key-value object op-
erations hashing logging, replica N/A
Graph Databases Bitsy N/A OLTP
optimistic con-currency con-trol
logging, backup N/A
53
7. Optimization Aspects on IMDB
54
Optimization Aspects on IMDB [12]
Aspects Concerns Related Work
Index cache consciousness, time/space efficiency T-Tree, CSS-Trees, CSB+-Trees, BD-Tree
Data Layout cache consciousness, space efficiency
columnar layout, HANA Hybrid Store, log structure
Concurrency Con-trol overhead, correctness virtual snapshot, transaction memory,
MVCC
Query Processing code locality, time efficiency stored procedure, JIT compilation, sort-ing
Fault Tolerance durability, correlated failures, availability
group commit and log coalescing, NVM, command logging, remote logging
Data Overflow locality, paging, hot/cold classification
anti-caching, Hekaton Siberia, data compression, virtual memory manage-ment, pointer swizzling
55
References[1] Garcia-Molina, Hector, and Kenneth Salem. "Main memory database systems: An overview." Knowledge and Data Engineering, IEEE Transactions on 4.6 (1992): 509-516.
[2] Comer, Douglas. "Ubiquitous B-tree." ACM Computing Surveys (CSUR) 11.2 (1979): 121-137.
[3] Lehman, Tobin J., and Michael J. Carey. "A study of index structures for main memory database management systems." Conference on Very Large Data Bases. Vol. 294. 1986.
[4] Abadi, Daniel J., Samuel R. Madden, and Nabil Hachem. "Column-stores vs. row-stores: how different are they really?." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008
[5] Plattner, Hasso. "A common database approach for OLTP and OLAP using an in-memory column database." Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 2009.
[6] Färber, Franz, et al. "The SAP HANA Database--An Architecture Overview."IEEE Data Eng. Bull. 35.1 (2012): 28-33.
56
References[7] Harizopoulos, Stavros, et al. "OLTP through the looking glass, and what we found there." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.
[8] Malviya, Nirmesh, et al. "Rethinking main memory oltp recovery." Data Engi-neering (ICDE), 2014 IEEE 30th International Conference on. IEEE, 2014.
[9] DeBrabant, Justin, et al. "A Prolegomenon on OLTP Database Systems for Non-Volatile Memory." Proceedings of the VLDB Endowment 7.14 (2014).
[10] DeBrabant, Justin, et al. "Anti-caching: A new approach to database man-agement system architecture." Proceedings of the VLDB Endowment 6.14 (2013): 1942-1953.
[11] Eldawy, Ahmed, Justin Levandoski, and Paul Larson. "Trekking through siberia: Managing cold data in a memory-optimized database." Proceedings of the VLDB Endowment 7.11 (2014).
[12] Zhang, Hao, et al. "In-memory big data management and processing: A sur-vey." (2015).