연세대학교 yonsei university data processing systems for solid state drive yonsei university...

연세대학교Yonsei Univer-sity

Data Processing Systems for Solid State

Drive

Yonsei UniversityMincheol Shin

2015.11.23

Overview

• Main Target : Data Processing Systems with SSD

• Purpose : Improving I/O Performance

• Data Processing System– Relational Database Management System

• e.g. Oracle, MySQL, PostgreSQL, SQLite

– Distributed Data Processing System• e.g. Hadoop Distributed File System, MapReduce, Hive, Hbase, Tajo,

Spark

– Key-value Store• e.g. Redis

Outline

• Solid State Drive (SSD)• RDBMS on Solid State Drive• Big Data Processing for Solid State Drive

Solid State Drive: Flash Memory [VLDB2011Tut2]

• Great Performance !!– High I/O Performance: 41 MB/s Read, 7.5 MB/s Program [Micron 2014]

– Fast Random Access: Under 0.1 ms (HDD: 2.9 to 12 ms)

– Low Energy Consumption

• Four Constraints of NAND Flash Memory– C1: Program granularity (2KB~16KB)

– C2: Must erase a block before updating a page (256KB ~ 1MB)

– C3: Pages must be programmed sequentially within a block

– C4: Limited lifetime (104 ~ 105)

4k Page4k Page

A Erase Block (1 MB)

[VLDB2011Tut2] P. Bonnet, L. Bouganim, I. Koltsidas, S. D. Viglas, VLDB 2011 Tutorial: System Co-Design and Data management for Flash Devices

Solid State Drive

• Solid State Drive (SSD)– Definition: Persistent data storage without disks nor a drive motor.– Support Traditional Block I/O

• Characteristics for SSD– Fast Random Access (inherited from flash memory)– Read/Write Imbalance (inherited from flash memory)– Exploiting Internal Parallelism (SSD internal structure)– In-Storage Processing

SSD

HostI/F

(SATA, SAS, PCIE)

Read(addr)

Write(addr, data)

Internal Algorithm (FTL)

Mapping

Wear leveling

Garbage Collection

Physical Storage

Flash Chips

Flash Chips

Flash Chips

Flash Chips

Flash Chips

Flash Chips

ReadPro-gramErase

Solid State Drive: Flash Translation Layer (FTL)

• Flash Translation Layer– Convert the block I/O operations to internal operations

– Three Major Components • Mapping

– Map Logical Block Address(LBA) to physical page

• Garbage Collection

• Wear Leveling– To extend lifetime of SSD

Logical

Physical

Block 1 Block 2 Block 3 Block 4

Update

v v v v I I v I v v

Block 2 Block 3 Block 4

v v v v I I I I v v v

Block 2 Block 3 Block 4

Erase

Solid State Drive: Internal Parallelism

• SSD can read/write the data in parallel

SSD

HostI/F

(SATA, SAS, PCIE)

Flash Package

Flash Package

Flash Package

Flash Package

Flash Package

Flash Package

Flash Package

Flash Package

Channel-level Parallelism(N Parallel Channels)

Package-level parallelism(Interleaving)

Memory

Time

Read 1 Transfer 1

Read 3 Transfer 3

Read 5 Transfer 5

Read 7 Transfer 7

Read 2 Transfer 2

Read 4 Transfer 4

Read 6 Transfer 6

Read 8 Transfer 8

Package 1 (Ch. 1)

Package 2 (Ch. 1)

Package 3 (Ch. 2)

Package 4 (Ch. 2)

Channel 1

Channel 2 Data 2 Data 4 Data 6 Data 8

Data 1 Data 3 Data 5 Data 7

Solid State Drive: Internal Parallelism

• Using internal parallelism, SSD achieves – High performance for sequential I/O

• Similar to Striping (RAID 0)• Seq. bw for SATA SSD

– Write : 450 MB/s– Read : 500 MB/s

– High performance for concurrent I/O

[VLDB2012Roh] H. Roh, S. Park, S. Kim, M. Shin, S-W. Lee,B+-tree index optimization by exploiting internal parallelism of flash-based Solid State Drives

Solid State Drive: In-Storage Processing

• SSD has CPU and Memory for FTL

• Host Interface is bottleneck !– H/I has lower bandwidth than internal bandwidth of SSD

• Two approaches– Light-weight filter in SSD

• Transfer smaller data through H/F• Filter tuples using predicates

– Sub-modules in SSD• e.g. Transaction management with COW

• Need special SSD to implement ISP– OpenSSD, SmartSSD and so or

DBMS on Solid State Drive

• Main research areas:– Buffer Management– Index Management– Query Processing– Transaction Management

• Most of researches using SSDs focused on storage I/O

DBMS on Solid State Drive: Index Management

• FD-tree– Exploit sequential bandwidths of SSDs– B-Tree + sorted runs

• PIO B-tree– Exploit internal parallelism of

SSDs– Access to multiple B-tree node

along multiple paths

DBMS on Solid State Drive: Query Processing

• FlashJoin: PAX based query processing – NSM layout

• Most typical page layout• Tuples are stored in a contiguous

region

– PAX layout• Values of columns are stored

in contiguous region (minipage)• Originally, PAX is designed for reducing cache miss in CPU cache

– FlashScan reads only needed minipages– FlashJoin joins minipages read by flashScan

DBMS on Solid State Drive: Query Processing

• FMSort– Exploit internal parallelism of SSD– During merge phase,

DBMS on Solid State Drive: Transaction Mgmt.

• X-FTL: Shadow Paging in SSD– Writing operations of SSD is similar to Copy-on-write

• When a page is updated, the modified page is written to an empty page.• And then, invalidate old page

– X-FTL maintains old pages until transaction is committed.– There is no copying the original pages

Big Data on Solid State Drive

• 3 approaches to improve performance using SSDs– Complete replacement

• Higher cost per capacity

– Selective replacement• e.g. intermediate results on SSDs, HDFS data on HDDs

– SSD as a cache• Commercial/Noncommercial cache SW exist• Open source : bcache, flashcache, enhanced IO, DM-cache • Project with SK Telecom

• Archival Storage of HDFS– Store replica into 4 tiers of storage

• ARHIVE : slowest and biggest capacity storage (petabyte of storage)• DISK, SSD, RAM_DISK• https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.ht

ml#Storage_Types:_ARCHIVE_DISK_SSD_and_RAM_DISK

• Issues– Industry leads Big Data processing platform area– There is no standard model– Because CPU overhead are too high

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Storage_Types:_ARCHIVE_DISK_SSD_and_RAM_DISK



연세대학교 yonsei university data processing systems for solid state drive yonsei university...

Documents