using next generation nvram as a write buffer for flash … · 2019-05-23 · nvram write buffer...
Embed Size (px)
TRANSCRIPT

Using Next Generation NVRAM as a Write Buffer for Flash M b d St D iMemory-based Storage Devices
강 수 용강 수 용
한양대학교

Motivation
Solid-State Disk 성능낮은 random write 성능Random write에 대한 효율적 관리 필요
Device Sequential Random 8KB Price $ Power iops/$ iops/watt
SCSI 15k rpm 75 MBps 200 iops 500$ 15 watt 0.5 13
SATA 10k rpm 60 MBps 100 iops 150$ 8 watt 0.7 12
Flash - read 53 MBps 2,800 iops ?? 400$ 0.9 watt 7.0 3,100
Flash - write 36 MBps 27 iops ?? 400$ 0.9 watt 0.07 30
< Jim Gray, “Flash Disk Opportunity for Server-Application”, Microsoft Research, 2007 >

Motivation
플래시 메모리를 위한 차세대 NVRAM 의 활용플래시 메모리를 위한 차세대 NVRAM 의 활용
메타데이터 저장소로 활용메타데이터 저장소로 활용• Mapping info. 저장에 따른 부팅시간의 단축• 메타데이터에 대한 접근을 흡수
쓰기 버퍼로 활용• Flash memory에 대한 접근 횟수 감소• Random write 성능 향상
기타 방안으로 취급되어서는 안될 다른 활용 방안들기타 활용 방안들기타 활용 방안들

NVRAM Write Buffer for Hard Disks
하드 디스크를 위한 쓰기 버퍼 알고리즘고려 사항• 시간 지역성
– 접근 횟수를 줄임– LRW (Least Recently Written page) 등– Metric : buffer hit ratioMetric : buffer hit ratio
• 공간 지역성– 접근 비용을 줄임– LST(Largest Segment per Track), CSCAN 등– Metric : delay
Stack Model : LRW + LSTStack Model : LRW + LSTWOW : LRW + CSCAN

Differences between HDD & SSD
H/W의 관점 ll kH/W의 관점 : well knownS/W의 관점 : Mapping Layer 의 존재
Mapping Layer의 형태: FTL, FilesystemMapping 방식: 블록 맵핑, 섹터 맵핑잦은 랜덤 쓰기 발생시 merge로 인한 성능 저하• Extra operations (erase, valid page copy) 발생

NVRAM Write Buffer for Flash Memory
시간 지역성 고려시간 지역성 고려Try to increase the buffer hit ratio
Mapping 기법의 동작 특성 고려Mapping 기법의 동작 특성 고려Try to reduce the No. of extra operations
FLASH Memory
PRAM MRAM FRAMWrite Buffer
F T L

How to reduce the number of extra operations?

Necessity of Page Clustering
write A3 ‘ to L1
write A4’ to L1
Data block
A1
A2
B1
B2
C2’
A2’… D1’ C1’ B2’ B1’ A4’ A3’
write B1’ to L2
write B2’ to L2
Merge A and L1A
A2
A3
A4
B
B2
B3
B4
D2’
A1’
D1’
… D1 C1 B2 B1 A4 A3
Page Level Management
(smart Copy)
write C1’ to L1
………..write A1’ to L1
A B
C1
C2
C3
D1
D2
D3D1
C1’
B2’D Cluster
D1’C Cluster
C1’B Cluster
B1’A Cluster
A1’ A2’
write A1’ to L1
write A2’ to L1
write A3’ to L1
write A4’ to L1
C
C4
D
D4
Log blockB1’
A4’
A3’
D2’ C2’ B2’ A3’ A4’
Block Level Management
write A4 to L1
write B1’ to L2
write B2’ to L2
Merge A and
Log block
NVRAM NVRAM Management
BAST FTL
Merge A and L1 (switch)
write C1’ to L1
………..L1 L2
Flash MemoryBAST FTL Operation
Flash Memory

Data Structure of Page Cluster
Cluster List Page List
Block Number Page Page Page…Cluster List Page List
Block Number
Block Number
Page Page Page
Page Page PageBlock Number Page Page Page
……
Block Number Page Page

Cluster-based Algorithm 1
LRU-CL t R tl U d (W itt ) Cl t fi tLeast Recently Used (Written) Cluster first클러스터의 recency를 고려하여 교체 클러스터를 선정Sequential write cluster list를 유지하여 우선적으로Sequential write cluster list를 유지하여 우선적으로교체대상으로 삼음
SequentialSequentialWrite Cluster
LRU Cl t Li t
MRU LRU
LRU Cluster List
Recency

Cluster-based Algorithm 2
LCLargest Cluster first클러스터의 크기만을 고려하여 교체 클러스터 선정
LRU Cluster Pointer Array1 2 3 4 61 62 63 64…..
Block number
Pinned
Cluster size
Block number
Pinned
Cluster size
Block number
Pinned
Cluster size
Block number
Pinned
Cluster size
Block number
Pi d
Block number
Pi d
Block number
Pi d
Block number
Pi d
Recency
Pinned
Cluster size
Pinned
Cluster size
Pinned
Cluster size
Pinned
Cluster size
y
……. ……. ……. …….
Size

Cluster-based Algorithm 3
Greedy-DualRecency와 cluster size를 동시에 고려Time * cluster size 의 값이 큰 클러스터를 교체 대상으로 선정Time * cluster size 의 값이 큰 클러스터를 교체 대상으로 선정
i i i i i i i i
LRU Cluster Pointer Array
1 2 3 4 61 62 63 64…..
Time Time Time Time Time Time Time Time…..
Block number
Pointer Array
Block numberBlock numberBlock number
Pinned
Cluster size
Time
Pinned
Cluster size
Time
Pinned
Cluster size
Time
Pinned
Cluster size
Time
Block number
Pinned
Block number
Pinned
Block number
Pinned
Block number
Pinned
Recency Pinned
Cluster size
Time
Pinned
Cluster size
Time
Pinned
Cluster size
Time
Pinned
Cluster size
Time
y
Size

Is there any off-line optimal algorithm?Is there any off line optimal algorithm?

Off-line Optimal Algorithm
We can knowWhen a page (or cluster) will be referenced again in the future
However, it is hard to knowHow many extra operations will occur in the future by replacing this page (or cluster) now
Possibility of finding an Optimal algorithmVery, very hard Maybe intractable!

Off-line Pseudo Optimal Algorithms
MIN-C (MIN Cluster)MIN-C (MIN Cluster)기존의 MIN 알고리즘을 클러스터에 적용• 미래에 가장 나중에 참조될 클러스터를 교체대상으로 선정미래에 가장 나중에 참조될 클러스터를 교체대상으로 선정• 다음에 참조될 거리 값(d)을 이용
d(B)=6
clusterA
clusterB
clusterC
clusterD
write buffer
e f d a g b d b c f e e …..
write sequence write buffer
d(A)=4, d(B)=6, d(C)=9, d(D)=7
victim cluster = Cvictim cluster C

Off-line Pseudo Optimal Algorithms
MIN CSMIN-CSMIN-C 정책에 Cluster size도 고려d(x)*s(x)
d(B)*s(B)=36
cluster A e f d a g b d b c f e e …..
( ) ( )
cluster B cluster C cluster Dsize=2
write buffer
e f d a g b d b c f e e …..
write sequence size=6 size=1 size=3
d(A)*s(A)=8, d(B)*s(B)=36, d(C)*s(C)=9, d(D)*s(D)=21
victim cluster = B

Off-Line Pseudo Optimal Algorithms
Careful conjectureCareful conjecture Optimal algorithm• Would be cluster-basedWould be cluster based• Would consider the
cluster size
MetricWhi h i i i bWhich one is more important metric between ‘Recency’ and ‘Cluster size’?M b t hi t f th b h i f MIN CSMaybe we can get hint from the behavior of MIN-CS

“Recency” versus “Size” in MIN-CS
Size (rrank)
Recency (rank)
WB size= 0.5M WB size= 2M
Recency (rank)

“Recency” versus “Size” in MIN-CS
Size (rrank)
Recency (rank)
WB size= 8M WB size= 16M
Recency (rank)

Performance ?

Workload
Sequential and cache insensitiveSequential and cache insensitive
Desktop Trace
Attribute Value
Filesystem FAT32
Attribute Value
Filesystem NTFS y
Running Application
Web surfing, emails sending / receiving, document typesetting, and gaming, multimedia file download
Running Application
Web surfing, emails sending / receiving, document typesetting, and gaming,multimedia file download
Duration One month
Locality 75 of total requests access 41 of total page's
Duration One month
Locality 75 of total requests access 23 of total page's
Total page written 14,147,956 Total page
written 58,545,851

Workload
Random and cache sensitiveRandom and cache sensitive
Database Trace
Attribute Value
Filesystem EXT2Filesystem EXT2
Running Application PostgreSQL
Duration One Week
Locality 75 of total requests access 6 of total page's
Total page written 16,659,675

Simulation
Underlying FTL: BASTMetrics
Write Buffer page hit ratioExtra Overhead for Extra Operations
• Erase, valid page copy

Results
FAT Trace

Results
NTFS trace

Results
TPCC trace

Write Buffer-Aware FTL ?

Observations and Insight
Observations클러스터 단위의 Flushing• Random write의 성질이 희석됨• FTL에서 Log block의 필요성이 낮아짐
로그블록 기반 FTL의 문제점합• 로그 블록에 쓴 이후에 데이터 블록과 병합하는 과정에서 smart
copy로 인한 오버헤드가 큼
InsightFlush되는 클러스터와 플래시 메모리의 데이터 블록을Flush되는 클러스터와 플래시 메모리의 데이터 블록을즉시 병합 Instant Merge operation

Smart Copy versus Instant Merge
Data block Log blockFree blockData block
4 valid page copies and 2 erases
0
21
013
000000
2
01
0
21
NVRAM
ClusterClusterCluster
3 3
Erase Erase Update Map
3
Erase
0
1
3
0
1
3
0
1
3
Smart copy of log block scheme FTL
Log block Data block
1 valid page copy and 1 erase333
01
Log block
0
2
Data block
1
3
Update Map
23
Erase
Instant Merge

Optimistic FTL
Be OPTIMIST!
“적절한 크기의 Write Buffer 가 있고, 클러스터 단위의버퍼교체정책을 사용할경우, Flush 되는클러스터는 충분히 많은페이지를 포함할 것이다”
AndUse Instant Merge
• No Log block except one for IM
Use only block mapping (No page mapping)

Optimistic FTL
1
2
Data block Log block(1)
(2) Victim Cluster2
3
4
3
4
10
11
(3)
(4)
5
6
(5)
(6)
7(7)
7
8(8) 17 24
20
Free block Data block
F 111
(10) Erase (9) Update map 21
22
Free
Free
…
12………
1
2
1
2
………
NVRAM Write Buffer
…
…………………………

Is it OK to be an Optimist ?p

Simulation Result

Simulation Result

Simulation Result

BAST vs Optimistic FTL

Conclusion
SSD를 위한 쓰기버퍼 관리Page Clustering 필요클러스터 기반 교체정책 필요• Hit ratio와 Extra overhead 동시 고려• Ex. Greedy-Dual
찰• 최적 알고리즘에 대한 고찰 필요
WB-aware FTL 개발 필요E O ti i ti FTL• Ex. Optimistic FTL