fast and safe performance recovery on os reboot

25
Fast and Safe Performance Recovery on OS Reboot Kenichi Kourai Kyushu Institute of Technology

Upload: giacomo-franklin

Post on 01-Jan-2016

36 views

Category:

Documents


3 download

DESCRIPTION

Fast and Safe Performance Recovery on OS Reboot. Kenichi Kourai Kyushu Institute of Technology. OS Recovery. crash. reboot. recovered OS. memory leak. reboot. OS reboot is a final but powerful recovery technique For recovery from OS crashes Against Mandelbugs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fast and Safe Performance Recovery on OS Reboot

Fast and Safe Performance Recovery on OS Reboot

Kenichi KouraiKyushu Institute of Technology

Page 2: Fast and Safe Performance Recovery on OS Reboot

OS RecoveryOS reboot is a final but powerful recovery

techniqueFor recovery from OS crashes

Against MandelbugsA rebooted OS rarely crashes again

For software rejuvenationAgainst aging-related bugsA rebooted OS restores

its normal staterecovered

OS

reboot

reboot

memoryleak

crash

Page 3: Fast and Safe Performance Recovery on OS Reboot

Performance Degradation (1/2)OS reboot degrades the performance of

file accessesThe file cache on memory is lost

Disk access increases due to frequent cache misses

It takes long time to fill the file cacheReading file blocks from a disk is slowMost of free memory is used for the file cache

file cache

reboot slow disk

Page 4: Fast and Safe Performance Recovery on OS Reboot

Performance Degradation (2/2)Disk access also degrades the

performance of the other virtual machines (VMs)VMs share a physical disk

Frequent disk access occupies the bandwidthPrefetching makes the situation worse

Burst of disk access

VM VM

disk

rebootedVM

OS

Page 5: Fast and Safe Performance Recovery on OS Reboot

Performance Recovery is NeededOS recovery does not complete until the

performance is also recoveredTraditional OS reboot restores only the

functionalitiesFast reboot techniques have been proposed

Page 6: Fast and Safe Performance Recovery on OS Reboot

Warm-cache RebootA new OS recovery mechanism with fast

performance recoveryIt preserves the file cache during OS reboot

An OS can reuse it after the rebootIt guarantees the consistency of the file

cacheUsing the virtual machine monitor (VMM)

filecache

reboot

filecache

VMM

VM discard

corruptedcache

Page 7: Fast and Safe Performance Recovery on OS Reboot

Reusing the File CacheCollaboration between an OS and the VMMThe VMM re-allocates the same physical

memory to a rebooted VMA rebooted OS reserves the memory pages

used for the file cacheObtaining meta data from the VMM

filecache

VMM

rebootre-allocatedeallocate

VM

filecache

reserve

Page 8: Fast and Safe Performance Recovery on OS Reboot

Cache ConsistencyOur definitionConsistent if the contents of the file cache

are the same as those of disksConsistent when a file block is read from a diskInconsistent when the file cache is modifiedConsistent when it is written back to a disk

diskfile cache

read

VM

modify

write back

Page 9: Fast and Safe Performance Recovery on OS Reboot

Maintaining Cache ReusabilityThe warm-cache reboot allows an OS to

reuse only consistent file cacheThe VMM is suitable for maintaining the

reusabilityIt is isolated from an OSIt can mediate all disk accessesIt can track all modification to cache pages

VMM

VM

disk

modify cachepages

file cache

Page 10: Fast and Safe Performance Recovery on OS Reboot

Reusability Management (1/3)The VMM makes a cache page reusable

after it reads data from a diskIt protects the page before the read

To detect page corruption by an OS during the read

The VMM can still write data to the page

VMM

read

VM

read request

readrequest

protect read reusable

possiblecorruption

disk

Page 11: Fast and Safe Performance Recovery on OS Reboot

Reusability Management (2/3)The VMM makes a cache page non-

reusable before an OS modifies its contentsIt unprotects the page at the same time

To enable the OS to modify the page

VMM

VM

modify request

unprotectmodifyrequest

non-reusable &unprotect

possiblecorruption

write

Page 12: Fast and Safe Performance Recovery on OS Reboot

Reusability Management (3/3)The VMM makes a cache page reusable

again after it writes data in the page to a diskIt protects the page before the write

To detect page corruption during the write

VMM

VM

write request

write

writerequest

protect write reusable

possiblecorruption

disk

Page 13: Fast and Safe Performance Recovery on OS Reboot

File Cache and Metadata (1/2)ConsistentWhen data and metadata are written back,

or both are notWhen only metadata are written back

E.g. Ext3 writeback mode, Ext2

metadata

file cachedata

metadata

memory disk

metadata

Page 14: Fast and Safe Performance Recovery on OS Reboot

File Cache and Metadata (2/2)Maybe inconsistentWhen only data is written back, and

When the file size is changed, orWhen the i-node pointers are changed

E.g. Ext3 ordered mode

disk

old metadata

memory

Page 15: Fast and Safe Performance Recovery on OS Reboot

ImplementationCacheMindBased on Xen/Linux

The VMM maintainsVM memoryP2M-mapping table

The VMM maintainsper-VM dataCache-mapping tableReuse bitmap

blkback blkfront

domain 0 domain U

VMM

disk

Per-VMdata

cache

Page 16: Fast and Safe Performance Recovery on OS Reboot

Cache-mapping TableA hash table from file blocks

to cache pagesDomain U adds and

removes its entriesIt looks up matching

entries after OS rebootUsing hypercalls

domain U

VMM

cache-mappingtable

hypercall

cache

Page 17: Fast and Safe Performance Recovery on OS Reboot

Reuse BitmapA bitmap for reuseable

cache pagesDomain 0 sets and clears

its bitsUsing hypercalls

The VMM clears its bitsWhen cache pages are

unprotected

domain 0 domain U

VMM

reuse bitmap

hypercall

blkback blkfront

disk

cache

unprotect

Page 18: Fast and Safe Performance Recovery on OS Reboot

ExperimentsPurposesTo show that the warm-cache reboot

achieves fast performance recoveryFile access, web server

To confirm that it does not reuse inconsistent file cachefault injection

ServerCPU: 2 dual-core OpteronMemory: 12 GBDisk: Ultra 320 SCSINIC: Gigabit Ethernet

ClientCPU: 2 Core 2 QuadMemory: 4 GBNIC: Gigabit Ethernet

Page 19: Fast and Safe Performance Recovery on OS Reboot

Throughput of File Reads (1/2)We measured the read throughput of a 1-

GB fileAll file blocks were on the file cache

1st 2nd 3rd 4th 5th 6th0

200400600800

100012001400

normal rebootwarm-cache reboot

thro

ug

hp

ut

(MB

/s)

before reboot after reboot

Our reboot achievedbetter performance

16% degradationat maximum

Page 20: Fast and Safe Performance Recovery on OS Reboot

Throughput of File Reads (2/2)Next, we used a file-backed virtual diskDisk blocks are cached on domain 0

1st 2nd 3rd 4th 5th 6th0

200400600800

100012001400

normal rebootwarm-cache reboot

thro

ug

hp

ut

(MB

/s)

before reboot after reboot

Degradation is mitigatedfrom 90% to 46%

Page 21: Fast and Safe Performance Recovery on OS Reboot

Throughput of a Web ServerWe measured the changes of the

throughput during OS reboot

60% degradationfor 90 seconds

5% degradationfor 60 seconds

Page 22: Fast and Safe Performance Recovery on OS Reboot

Fault Injection (1/2)We measured inconsistent cache reusesWe injected various faults into the OS kernelFirst, we disabled the consistency

mechanism

DST INIT BR

PANIC

FREE

COPY

STAC

K0

1020304050607080

no crashprocess crashkernel crash

incon

sis

ten

t re

use

(%)

The file cache isoften corrupted

Page 23: Fast and Safe Performance Recovery on OS Reboot

Fault Injection (2/2)Next, we enabled the consistency

mechanismMost of reboots did not reuse inconsistent

cacheReused file cache was inconsistent only for

DSTExt3 failed to write back

Faults were injectedinto ext3

The file cache was notcorruptedReusing it is correct DST

05

1015202530354045

disabledenabled

incon

sis

ten

t re

use

(%)

Page 24: Fast and Safe Performance Recovery on OS Reboot

Related WorkRio File Cache [Chen et al.’96]Reusing dirty file cache after OS crashRelying on an OS

RootHammer [Kourai et al.’07]Preserving VMs during VMM reboot

Hybrid Hard Drive [Samsung&Microsoft],Turbo Memory [Intel]Including large non-volatile disk cache

Page 25: Fast and Safe Performance Recovery on OS Reboot

ConclusionWe proposed the warm-cache rebootIt achieves fast performance recovery by

reusing the file cache16% degradation at maximum

The VMM maintains consistency of the file cacheConsistent, or not-corrupted at least

Future workReducing overheads of protecting cache

pagesImpact on write performance is large