fast and safe performance recovery on os reboot
DESCRIPTION
Fast and Safe Performance Recovery on OS Reboot. Kenichi Kourai Kyushu Institute of Technology. OS Recovery. crash. reboot. recovered OS. memory leak. reboot. OS reboot is a final but powerful recovery technique For recovery from OS crashes Against Mandelbugs - PowerPoint PPT PresentationTRANSCRIPT
Fast and Safe Performance Recovery on OS Reboot
Kenichi KouraiKyushu Institute of Technology
OS RecoveryOS reboot is a final but powerful recovery
techniqueFor recovery from OS crashes
Against MandelbugsA rebooted OS rarely crashes again
For software rejuvenationAgainst aging-related bugsA rebooted OS restores
its normal staterecovered
OS
reboot
reboot
memoryleak
crash
Performance Degradation (1/2)OS reboot degrades the performance of
file accessesThe file cache on memory is lost
Disk access increases due to frequent cache misses
It takes long time to fill the file cacheReading file blocks from a disk is slowMost of free memory is used for the file cache
file cache
reboot slow disk
Performance Degradation (2/2)Disk access also degrades the
performance of the other virtual machines (VMs)VMs share a physical disk
Frequent disk access occupies the bandwidthPrefetching makes the situation worse
Burst of disk access
VM VM
disk
rebootedVM
OS
Performance Recovery is NeededOS recovery does not complete until the
performance is also recoveredTraditional OS reboot restores only the
functionalitiesFast reboot techniques have been proposed
Warm-cache RebootA new OS recovery mechanism with fast
performance recoveryIt preserves the file cache during OS reboot
An OS can reuse it after the rebootIt guarantees the consistency of the file
cacheUsing the virtual machine monitor (VMM)
filecache
reboot
filecache
VMM
VM discard
corruptedcache
Reusing the File CacheCollaboration between an OS and the VMMThe VMM re-allocates the same physical
memory to a rebooted VMA rebooted OS reserves the memory pages
used for the file cacheObtaining meta data from the VMM
filecache
VMM
rebootre-allocatedeallocate
VM
filecache
reserve
Cache ConsistencyOur definitionConsistent if the contents of the file cache
are the same as those of disksConsistent when a file block is read from a diskInconsistent when the file cache is modifiedConsistent when it is written back to a disk
diskfile cache
read
VM
modify
write back
Maintaining Cache ReusabilityThe warm-cache reboot allows an OS to
reuse only consistent file cacheThe VMM is suitable for maintaining the
reusabilityIt is isolated from an OSIt can mediate all disk accessesIt can track all modification to cache pages
VMM
VM
disk
modify cachepages
file cache
Reusability Management (1/3)The VMM makes a cache page reusable
after it reads data from a diskIt protects the page before the read
To detect page corruption by an OS during the read
The VMM can still write data to the page
VMM
read
VM
read request
readrequest
protect read reusable
possiblecorruption
disk
Reusability Management (2/3)The VMM makes a cache page non-
reusable before an OS modifies its contentsIt unprotects the page at the same time
To enable the OS to modify the page
VMM
VM
modify request
unprotectmodifyrequest
non-reusable &unprotect
possiblecorruption
write
Reusability Management (3/3)The VMM makes a cache page reusable
again after it writes data in the page to a diskIt protects the page before the write
To detect page corruption during the write
VMM
VM
write request
write
writerequest
protect write reusable
possiblecorruption
disk
File Cache and Metadata (1/2)ConsistentWhen data and metadata are written back,
or both are notWhen only metadata are written back
E.g. Ext3 writeback mode, Ext2
metadata
file cachedata
metadata
memory disk
metadata
File Cache and Metadata (2/2)Maybe inconsistentWhen only data is written back, and
When the file size is changed, orWhen the i-node pointers are changed
E.g. Ext3 ordered mode
disk
old metadata
memory
ImplementationCacheMindBased on Xen/Linux
The VMM maintainsVM memoryP2M-mapping table
The VMM maintainsper-VM dataCache-mapping tableReuse bitmap
blkback blkfront
domain 0 domain U
VMM
disk
Per-VMdata
cache
Cache-mapping TableA hash table from file blocks
to cache pagesDomain U adds and
removes its entriesIt looks up matching
entries after OS rebootUsing hypercalls
domain U
VMM
cache-mappingtable
hypercall
cache
Reuse BitmapA bitmap for reuseable
cache pagesDomain 0 sets and clears
its bitsUsing hypercalls
The VMM clears its bitsWhen cache pages are
unprotected
domain 0 domain U
VMM
reuse bitmap
hypercall
blkback blkfront
disk
cache
unprotect
ExperimentsPurposesTo show that the warm-cache reboot
achieves fast performance recoveryFile access, web server
To confirm that it does not reuse inconsistent file cachefault injection
ServerCPU: 2 dual-core OpteronMemory: 12 GBDisk: Ultra 320 SCSINIC: Gigabit Ethernet
ClientCPU: 2 Core 2 QuadMemory: 4 GBNIC: Gigabit Ethernet
Throughput of File Reads (1/2)We measured the read throughput of a 1-
GB fileAll file blocks were on the file cache
1st 2nd 3rd 4th 5th 6th0
200400600800
100012001400
normal rebootwarm-cache reboot
thro
ug
hp
ut
(MB
/s)
before reboot after reboot
Our reboot achievedbetter performance
16% degradationat maximum
Throughput of File Reads (2/2)Next, we used a file-backed virtual diskDisk blocks are cached on domain 0
1st 2nd 3rd 4th 5th 6th0
200400600800
100012001400
normal rebootwarm-cache reboot
thro
ug
hp
ut
(MB
/s)
before reboot after reboot
Degradation is mitigatedfrom 90% to 46%
Throughput of a Web ServerWe measured the changes of the
throughput during OS reboot
60% degradationfor 90 seconds
5% degradationfor 60 seconds
Fault Injection (1/2)We measured inconsistent cache reusesWe injected various faults into the OS kernelFirst, we disabled the consistency
mechanism
DST INIT BR
PANIC
FREE
COPY
STAC
K0
1020304050607080
no crashprocess crashkernel crash
incon
sis
ten
t re
use
(%)
The file cache isoften corrupted
Fault Injection (2/2)Next, we enabled the consistency
mechanismMost of reboots did not reuse inconsistent
cacheReused file cache was inconsistent only for
DSTExt3 failed to write back
Faults were injectedinto ext3
The file cache was notcorruptedReusing it is correct DST
05
1015202530354045
disabledenabled
incon
sis
ten
t re
use
(%)
Related WorkRio File Cache [Chen et al.’96]Reusing dirty file cache after OS crashRelying on an OS
RootHammer [Kourai et al.’07]Preserving VMs during VMM reboot
Hybrid Hard Drive [Samsung&Microsoft],Turbo Memory [Intel]Including large non-volatile disk cache
ConclusionWe proposed the warm-cache rebootIt achieves fast performance recovery by
reusing the file cache16% degradation at maximum
The VMM maintains consistency of the file cacheConsistent, or not-corrupted at least
Future workReducing overheads of protecting cache
pagesImpact on write performance is large