ieee 2009 sangwon seo(kaist), ingook jang kyungchang woo, inkyo kim jin-soo kim, seungyoul maeng...
TRANSCRIPT
HPMR : Prefetching and Pre-shuffling in Shared MapReduce Computation Envi-ronment
IEEE 2009
Sangwon Seo(KAIST), Ingook Jang
Kyungchang Woo, Inkyo Kim
Jin-Soo Kim, Seungyoul Maeng
2013.04.25 파일처리 특론
김태훈
2 /27
Contents
1. Introduction
2. Related Work
3. Design
4. Implementation
5. Evaluations
6. Conclusion
3 /27
Introduction
It is difficult to deal internet services Enormous volumes of data Generate a large amount of data which needs
to be processed every day
To solve the problem, use MapReduce programing model Support distributed and parallel processing
for largescale data-intensive applicationdata-intensive application e.g : data mining, scientific
simulation
4 /27
Introduction
Hadoop; based on MapReduce Since hadoop is distributed system, it’s called HDFS(Hadoop
distributed file system)
HDFS cluster is consist of A Single NameNode
master server that manages the namespace of a file system, regulates clients’ access to file
A Number of DataNode manage storage directly attached to each DataNode
HDFS placement policy place each of three replicas on each node in the local rack
Advantage : improve write performance by cutting down inter-rack write traffic
/27
Introduction
5
Essential to reduce the shuffling overhead to improve the overall perfor-mance of the MapReduce computation. the network bandwidth between nodes is also an important factor of the shuffling overhead.
Node1
file
file
Input format
Split
Split
Split
RR RR RR
map
map
map
Com-binerParti-tioner
(sort)
reduce
Output Format
Node2
file
file
Input format
Split
Split
Split
RR RR RR
map
map
map
Com-binerParti-tioner
(sort)
reduce
Output Format
Writeback toLocal HDFS
store
RecordReaders
“Shuffling” process(over the N/W)
Files loaded from HDFS stores
6 /27
Introduction
Hadoop’s basic principle Moving computation is better
Better to migrate the computation closer
It’s used for when the size of data set is huge the migration of the computation minimizes
network congestion and increase the overall throughput1) of the system.
1)Throughput : 지정된 시간 내 전송된 처리량
7 /27
Introduction
HOD(Hadoop-On-Demand, developed by Ya-hoo!) a management system for provisioning virtual Hadoop
cluster over a large physical large physical clusterAll physical nodes are shared by more than one Yahoo! En-
gineers
Increase the utilization of physical resource
When the computing resources are shared by multiple users, Hadoop policy(‘Moving com-putation’) is not effective Because resource are shared
Resource e.g : computing n/w, hardware resource
8 /27
Introduction
To solve the that problem, two optimiza-tion scheme is proposed Prefetching
Intra-block prefetching Inter-block prefetching
Pre-shuffling
9 /27
Related work
J. Dean and S. Ghemawat Traditional prefetching techniques
V. Padmanabhan and J.Mogul, T.Kroeger and D. long, P. Cao,E. Felten et al., Prefetching method to reduce I/O latency
10/27
Related work
Zaharia et al., LATE(Longest Approximation Time to End)
More efficiently in the shared environment
Drayd(Microsoft) Can be expressed as direct acyclic graph
The degree of data locality is highly related to the MapReduce performance
11/27
Design(Prefetching Scheme)
Intra2)-block prefetching Bi-directional processing A simple prefetching technique that prefetches data within a single block
while performing a complex computation
ComputationIn progress
PrefetchingIn progress
2)Intra : 안 내부
Fig.1. The intra-block prefetching in Map Phase
ComputationIn progress
PrefetchingIn progress
Fig.2. The intra-block prefetching in Reduce Phase
Assigned input split for map task
Expected data for reduce task
12/27
Design(Prefetching Scheme)
While a complex job is performed in the left side, the to be-required data are prefetched and assigned in parallel to the corresponding task
Advantage of Intra-block prefetching 1. Using the concept of processing bar that monitors
the current status of each side and invokes a signal if synchronization is about to be broken
2. Try find the appropriate prefetching rate at which the performance can be maximized while minimizing the prefetching overheadCan be minimize the network overhead
3)At which : when, where
13/27
2
2
3
3
Design(Prefetching Scheme)
Inter-block prefetching runs in block level, by prefetching the expected block replica4) to a
local rack
• A2, A3, A4 is prefetching the required blocks D=Distance
n1
n2
n3
D=1 D=5 D=8
block
block
block
block
block
block
1
1
4)replica : 복제본
14/27
Design(Prefetching Scheme)
Inter-block prefetching runs in block level, by prefetching the expected block replica4) to a
local rack
• A2, A3, A4 is prefetching the required blocks4)replica : 복제본
15/27
Design(Prefetching Scheme)
Inter-block prefetch-ing processing Algo-rithm 1. Assign map task to
the node that are the nearest to the required blocks
2. The predictor gener-ates the list of data blocks, B, to be prefetched for the target task t
16/27
Design(Pre-Shuffling Scheme)
Pre-Shuffling pro-cessing The pre-shuffling
module in the task scheduler looks over input split or candidate data in the map phase, and predicts which reducer the key-value pairs are partitioned into.
17/27
Design(Optimization)
LATE(Longest Approximation Time to End) algorithm How to robustly perform specu-
lative execution to maximize performance under heteroge-nous environment Did not consider data locality
that can accelerate the MapReduce computation further
D-LATE(Data-aware LATE) al-gorithm Almost the same LATE, except that
a task is assigned as nearly as possible to the location where the needed data are present
18/27
Implementation – Optimizer scheduler)
Optimized scheduler Predictor module
Not only finds stragglers, but also predicts candi-date data blocks and the reducers into which the key-value pairs are parti-tioned
D-LATEThese predictions, the opti-
mized scheduler perform the D-LATE algorithm
19/27
Implementation – Optimizer scheduler)
Prefetcher To Monitor the status of
worker threads and to man-age the prefetching syn-chronization with processing bar
Load Balancer Check the logs(include dis usage
per node and current n/w traffic per data block)
Invoke to maintain load bal-ancing based on disk usage and n/w traffic
20/27
Evaluation
Two dual-core 2.0Ghz AMD, 4GB main memory 400GB ATA Hard disk drives Gigabit Ethernet n/w interface card The entire nodes are divided in to 40racks which are con-
nected with L3 routers Yahoo! Grid which consists of 1670 nodes All test configured that HDFS maintains four replicas for each
data block, whose size is 128MB Three type of workload ; wordcount, search log aggregator, simi-
larity calculator
21/27
Evaluation
Fig8, #1 : smallest ratio of number of nodes to the num-ber of map tasks.
#5 : due to significant reduction in shuffling overhead
Fig7, We can observe that HPMR shows significantly bet-ter performance than the na-tive Hadoop for all of test sets
22/27
Evaluation
The prefetching latency is affected by disk overhead or n/w congestion
Therefore, the long prefetching latency in-dicates that the corre-sponding node is heavily loaded
Prefetching rate increases beyond 60%
23/27
Evaluation
This means that HPMR assures consistent performance even in the shared environment such as Yahoo!Grid where the available bandwidth fluctuates severely.
4Kbps ~ 128Kbps
24/27
Conclusion
Two innovative schemes The prefetching scheme
Exploits data locality
The pre-shuffling scheme Reduce the network overhead required to shuffle key-value
pairs
HPMR is implemented as a plug-in type compo-nent for Hadoop
HPMR improves the overall performance by up to 73% compared to the native Hadoop
Next, step we plan to evaluate a more compli-cated workload such as HAMA(Open-source Apache incubator project)
/27
Appendix : MapReduce Example
MapReduce Example : Weather data set 분석 하나의 레코드는 라인 단위로 저장되며 , 이때 저장 타입은 ASCII 형태 하나의 파일에서 각 필드는 구분자없이 고정길이로 저장되어 있음 레코드 예제 ) 0057332130999991950010103004+51317+028783FM-
12+017199999V0203201N00721004501CN0100001N9-01281-01391102681
질의 1901 년 ~ 2001 년 동안 작성된 NCDC 데이터 파일들로부터 각 년도별 가장 높은
기온 (F) 을 측정하라
25
Input:Chunk(64MB) 단위 데이터 파일
1st Map:파일로부터
<offset, 레코드 >추출
2nd Map:각 레코드로부터< 연도 , 기온 >
추출
Shuffle:연도별 데이터 그룹으로 정리
Reduce:최종 결과
병합 및 반환
/27
Appendix : MapReduce Example
1st Map : 파일에서 , <Offset, Record> 추출 <Key_1, Value> = <offset, record>
<0, 0067011990999991950051507004...9999999N9+00001+99999999999...>
<106, 0043011990999991950051512004...9999999N9+00221+99999999999...>
<212, 0043011990999991950051518004...9999999N9-00111+99999999999...>
<318, 0043012650999991949032412004...0500001N9+01111+99999999999...>
<424, 0043012650999991949032418004...0500001N9+00781+99999999999...>
...
2nd Map : 각 레코드별 Year, Temp 추출 <Key_2, Value> = <year, Temp>
<1950, 0>
<1950, 22>
<1950, −11>
<1949, 111>
<1949, 78>
…
26
연도 기온
27/27
Appendix : MapReduce Example
Shuffle
2nd Map 의 결과가 너무 많기 때문에 , 이를 각 연도별 데이터 그룹으로 다시 정리 Reduce 과정에서 병합시 , 처리 비용 감소
Reduce : 모든 Map 의 후보집합을 병합하여 최종 결과 반환
<1950, 0><1950, 22><1950, −11><1949, 111><1949, 78>
<1949, [111, 78]><1950, [0, 22, −11]>
2nd Map
Shuffle
(1950, [0, 22, −11])
(1950, [25, 15])
Mapper_1
Mapper_2(1950, [0, 22, −11, 25, 15])
(1950, 25)
Reducer(1949, [111, 78])
(1949, [30, 45])
(1949, [111, 78, 30, 45])
(1949, 111)
28/27
Appendix : Hadoop the Definitive Guide p19~20
1
1
2
2
3
34
4