软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/ncis2014-keynote-zhangxiaodong.pdf ·...

35
软件管理是高效使用固态硬盘的关键 张晓东 美国俄亥俄州立大学 合作单位:Intel, Samsung, and VMWare 1 9/21/2014

Upload: vanquynh

Post on 15-Feb-2018

264 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

软件管理是高效使用固态硬盘的关键

张晓东

美国俄亥俄州立大学

合作单位:Intel, Samsung, and VMWare

1 9/21/2014

Page 2: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Flash Memory is Affordable and Widely Used

NAND Flash

consumer client enterprise

2

• NAND Flash Memory are widely used: from cell phones to cloud systems

Increasingly wide adoption

100nm

10nm2007 2008 2009 2010 2011 2012 2013 2014

Bit cost reduction$10/GB $1/GB

$0.35/GB

Page 3: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

• # P/E cycles in flash memory decreases as chip technology scaling continues

Endurance of Flash Memory is Reduced

3

Floating gate

Electron tunnel through Oxide

Oxide traps created by inject/extract

Electrons

Cell

Cell Cell

Cell Cell

Cell

Bit Line

Word Line

BL BL

WLFlashBlock

• Cell size and distance between cells shrink

• Oxide layer becomes thinner• Higher percentage of oxide 

traps interfere P/E

Write: injecting  electrons Into floating gateErase: extracting electrons From floating gate

Page 4: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

• Noise in NAND flash memory increases as chip technology scaling continues

Error Rate Increases in Flash Memory

4

Victim cell

Cell

Cell Cell

Cell Cell

Cell

BLWL

BL BL

WL

Cell‐to‐cell interferenceRandom telegraph noise (RTN): “burse noise” caused by thin interface 

Retention noiseElectrons leaks from the floating gate

Higher error rate

Page 5: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

The Cost Reduction is not Free

5

Page 6: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Rapid Reduction of P/E Cycles

6

Page 7: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

SSDs not widely used in Enterprise Sys

Using SSD as an acceleration device, instead of using SSD alone, thus creating a storage system with “dual” devices, such as Intel RST, Apple Fusion Drive.

Source: Flexstar SSD Test Market Analysis, June 2012http://info.flexstar.com/Portals/161365/docs/SSD_%20Testing_%20Market_Analysis.pdf

7

Page 8: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Distinguished Merits and Limits

Random reads are fast at a low cost Low power

Sequential accesses may not be cost effective The number of writes is increasingly limited Error rate increases

A whole SSD solution may be suboptimal Cost, reliability and performance

A hybrid storage system can best utilize SSD Random reads on SSD, sequential reads/writes on HDD Minimizing unnecessary writes to SSD

8

Page 9: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Software Solutions at OS Level 

9

• Conquest [USENIX’02]• SmartSaver [ISLPED’06]• ReadyBoost [MS’06]• TurboMemory[ToS’08] • L2ARC [CACM’08]• FlashCache [ISCA’08] • other …

SSD‐cached DiskSSD‐cached DiskSSD‐cached Disk

• Hystor

Disk‐cached SSDDisk‐cached SSDDisk‐cached SSD Hybrid StorageHybrid StorageHybrid Storage

• Soundararajan [FAST’10]

• Cache‐based solutions– SSD – a secondary‐level cache– HDD– the permanent storage – Cache replacement policy

• Limitations– Weak locality memory misses– Intensive write traffic– Non‐trivial system changes– High‐cost on‐line replacement

• Frequent on‐access updates• 10‐20x Larger SSD space

Page 10: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Hystor: A cost‐efficient hybrid storage*

• A small data set– Semantically critical : F/S metadata blocks

– Performance critical : small, randomly accessed blocks

10* Collaborated work at Intel® Labs

• A large data set• Less frequently accessed• Sequentially accessed 

SSD High-performance, high-cost

HDDLow-cost, high-capacity

A prototype system developed at Ohio State and Intel® Labs .

Page 11: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Identifying data blocks for SSDs

• A metric highly correlated to latency– Latency (optimal)– Frequency– Request size– Reuse distance– Seek distance– combinations

11

Percentage of total latency

Percentage of total blocks

Latency curve: optimal

The metric is highly correlated to latency 

The metric is uncorrelated to 

latency

Page 12: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

• A metric highly correlated to latency– Latency (optimal)– Frequency– Request size– Reuse distance– Seek distance– combinations

• Frequently used small blocks

Identifying the high‐cost data blocks

12

The best metric:Frequency/Request size

Page 13: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

0

2

4

6

8

10

12

14

16

18

20

Archive Postmark Mail TPC‐H Q1

Speedu

p (X)

Hystor Performance Evalution

HDD‐Only

20%

40%

60%

80%

100%

Full‐SSD

Performance Evaluation• Measurement System

– Intel® D975BX, 2.66GHz Intel ® Core™ 2 Quad, 4GB Memory– LSI ® MegaRAID 8704 SAS Card, Seagate ® 15k.5 SAS HDD, Intel X25‐E SSD– Fedora Core 8 Linux, Linux Kernel 2.6.25.8

• Experimental Results

13

3x/2% 3.2x/0.9%11%/7%

62.5%/0.4%

11.7x HDD: baseline

SSD: optimal

Proportional toworking‐set 

size

Page 14: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Impact of Hystor • Hystor was presented in ACM ICS 2011 (Best Paper Award) 

• Hystor lays a foundation for Apple hybrid storage product Fusion Drive‐ A new storage product on the market since October 23, 2012 ‐ Consisting of a small SSD (128 GB) and a large hard drive (1 TB) ‐ The hybrid storage is managed by OS in a single space

• Comments on Hystor from Apple: –Hystor is a well‐designed system, and its paper discussed several key systems trade‐offs in details. The Apple software engineers had carefully and systematically evaluated Hystor. This work had a significant influence in the design of Apple's Fusion Drive. Some design elements and algorithms in Hystor have been directly used in Apple's Fusion Drive.

• Steve Jobs’ Philosophy on technology transfer for Apple: –Picasso had a saying “Good artists copy, great artists steal”. We have always been shameless about stealing great ideas … 

14

Page 15: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

hStorage‐DB: A Software Solution for Database

• DBs have different storage QoS requirements – Different access patterns – Different priorities of data processing requests– Dynamic changes of requirements 

• Hybrid storage can well satisfy diverse QoS of DB requests   – should be automatic and adaptive with low overhead – But with challenges

Page 16: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Existing Interface between Applications and Storage

read/write(int fd, void *buf, size_t count);

On‐disk location In‐memory data Request size

This interface cannot pass specific requirements to storage  

Page 17: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Challenges for Hybrid Storage Systems to Satisfy Different QoS Requirements

• DBMS (What I/O services do I need as a storage user?)  – Classifications of I/O requests based on QoS– hStorage awareness– DBMS enhancements to utilize classifications automatically

• hStorage (What can I do for you as a service provider?) – A clear definition of supported QoS classifications– Hide device details to DBMS– Efficient data management between hybrid devices

• Communication between DBMS and hStorage– Rich information to deliver but limited by interface abilities– Need a standard and general purpose protocol

Page 18: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

DBA‐based Approach 

• DBAs decide data placement among heterogeneous devices based on experiences

• Limitations:– Significant human efforts: expertise on both DB and storage.– Large granularity, e.g. table/partition‐based data placements – Static storage layout:  

• Tuned for the “common” case• Could not well respond to execution dynamics 

Indexes Other data

DBMS

SSD HDD

Page 19: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Monitoring‐based Solutions

• Storage systems automatically make data placement and replacement decisions, by monitoring access patterns– LRU (a basic structure), LIRS (MySQL), ARC (IBM I/O controller)– Example products from industry:

• Solid State Hybrid Drive (Seagate)• Easy Tier (IBM)

• Limitations:– Takes time to recognize access patterns

• Hard to handle dynamics in short periods 

– With concurrency, access patterns cannot be easily detected– Certain critical insights are not access patterns related

• Domain information (available from DBMS) is not utilized

Page 20: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

What information from DBMS we can use?

• System catalog– Data type: index, regular table– Ownership of data sets: e.g. VIP user, regular user

• Query optimizer – Orders of operations and access path – Estimated frequency of accesses to related data 

• Query planner– Access patterns

• Execution engine – Life cycles of data usage

Semantic information for I/O requests is not organized

Page 21: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Goal: organize/utilize DBMS semantic Information 

Buffer Pool

Query Optimizer

Checkpoint

Vacuum

Bkgd. processes Connection pool

User1 User2。。。

DBMS

SequentialRandom

Repeated scan

Sys table Index User Table Temp data

The mission of hStorage‐DB is to fill this gap.

Storage

Semantic gap

Page 22: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Structure of hStorage‐DB

Buffer Pool Manager

Storage ManagerInfo 1 ... Info N QoS policy

(Policy assignment table)

Request + Semantic Information

Storage System Control Logic

I/O Request + QoS policy

SSD SSD……HDD HDD

Query Optimizer Query 

Planner

Execution Engine

Page 23: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Highlights of hStorage‐DB

• Policy assignment table– Stores all the rules to assign a QoS policy for each I/O request – Assignments are made on organized DB semantic information

• Communication between a DBMS and hStorage– The QoS policy for each I/O request is delivered by protocol of “Differentiated Storage Services” (DSS, SOSP’11)

– hStorage system makes action accordingly

• hStorage‐DB in practice – First application for Intel product DSS 

Page 24: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Caching Priorities as QoS Policies

• Priorities are enumerated– E.g. 1, 2, 3, …, N– Priority 1 is the highest priority

• Data from high‐priority requests can evict data cached for low‐priority requests

• Special “priorities”– Bypass

• Requests with this priority will not affect in‐cache data– Eviction

• Data accessed by requests with a eviction “priority” will be immediately evicted out of cache

– Write buffer

Page 25: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Policy Assignment Table

Sequential accesses Priority 1

Priority 2

Priority N

Bypass

Eviction

Write Buffer

…Random accesses

Temporary data accesses

Temporary Data delete

Updates

Page 26: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Experimental setup

• Dual‐machine setup (with 10GB Ethernet)– A DBMS: hStorage‐DB based on PostgreSQL– A dedicated storage system, with an SSD cache

• Configuration– Xeon, 2‐way, quad‐core 2.33GHz, 8GB RAM,– 2 Seagate 15.7K rpm HDD– SSD cache: Intel 320 Series 300GB (use 32GB)

• Workload– TPC‐H  @30SF (46GB with 7 indexes)

Page 27: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Diverse Request Types in TPC‐H

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Tmp. Rand. Seq.

• Most queries are dominated by sequential requests• Queries 2,8,9,20,21 have a large number of random requests• Query 18 has a large number of temporary data requests

Page 28: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Performance Comparisons: HDD, SSD‐caching, hStorage‐DB, SSD‐only 

• hStroage‐DB:– Random accesses are in SSD – Temporary data is cached and evicted timely– Sequential accesses are bypassed – Data in SSD are timely Updated 

8950 8694

6146 5990

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Query 18

Execution tim

e (sec)

HDD‐only LRU hStorage‐DB SSD‐only

Page 29: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Virtualization is Basic Infrastructure for Cloud

Conventional system setups Computing Virtualization

Running each application on a dedicated cluster

Running all applications on consolidated hardware resources

29

• Benefits:– Reduced management cost– Resource consolidation: E.g., when Hadoop becomes I/O intensive, its CPU 

resources could be allocated to other applications transparently

Page 30: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

The Core of System Virtualization: Hypervisor

Virtualization packs the whole stack of hardware + OS + applications into a portable virtual machine (VM) package

30

Operating System

Application

Hypervisor

App

OS

App

OS

App

OS

App

OS

Host: the physical machine

Guest: each VM is also called a guest

Hypervisor provides the supporting environment for each VM package and manages hardware resources among multiple VMs.

Page 31: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

S‐CAVE: Hypervisor‐based SSD Caching

1. The SSD cache is directly managed by the hypervisor2. Hypervisor makes efforts to best utilize the SSD cache for each VM

31

Storage System

HDD HDD HDD HDD HDD HDD HDD

Hypervisor

VM VM VM

controller

Page 32: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

What Do We Gain?

• Benefits– Transparent to VMs: no modification to guest OS or applications– A “global” view of all VMs’ I/O activities– Full access privilege to storage devices

• Must address the following challenges– Effective and accurate cache space allocation– Responsive to VMs’ I/O dynamics– Low overhead implementation

• In practice – S‐CAVE is implemented in VMware hypervisor as a software solution

32

Page 33: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

S‐CAVE: System Design – an overview

Hypervisor

VM1 VM2 VMn

Storage System

Cache Monitor Cache Monitor Cache Monitor

Cache Space 

Allocator

Block Interface

SSD

• Cache Monitor– One for each running VM– Watching usage of allocated SSD space– Report usage status, and space demand

• Cache Space Allocator– A central control for space allocation– Determines how much SSD cache space 

each VM should be allocated33

Page 34: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Summary• SSD and HDD should be utilized by their best merits

– Random reads are fast and cost‐effective in SSD – Sequential accesses are fast and cost‐effective in HDD– Writes to SSD must be limited 

• Communication between applications and storage systems– Detect access patterns, and let hybrid storage serve accordingly – Hystor =>  Fusion Drive (Apple) and Hybrid Aggregate (NetApp)– App expresses access patterns and expected storage services – Storage system make actions accordingly: hstorage‐DB, S‐CAVE

• Other efforts– LDPC‐in SSD: placing advanced ECC in SSD 

34

Page 35: 软件管理是高效使用固态硬盘的关键acs.ict.ac.cn/ncis2014/slides/NCIS2014-Keynote-ZhangXiaodong.pdf · • NAND Flash Memory are widely used: ... Flexstar SSD Test Market

Thank you!

35