ieee 2009 sangwon seo(kaist), ingook jang kyungchang woo, inkyo kim jin-soo kim, seungyoul maeng...

HPMR : Prefetching and Pre-shuffling in Shared MapReduce Computation Envi-ronment

IEEE 2009

Sangwon Seo(KAIST), Ingook Jang

Kyungchang Woo, Inkyo Kim

Jin-Soo Kim, Seungyoul Maeng

2013.04.25 파일처리 특론

김태훈

2 /27

Contents

1. Introduction

2. Related Work

3. Design

4. Implementation

5. Evaluations

6. Conclusion

3 /27

Introduction

It is difficult to deal internet services Enormous volumes of data Generate a large amount of data which needs

to be processed every day

To solve the problem, use MapReduce programing model Support distributed and parallel processing

for largescale data-intensive applicationdata-intensive application e.g : data mining, scientific

simulation

4 /27

Introduction

Hadoop; based on MapReduce Since hadoop is distributed system, it’s called HDFS(Hadoop

distributed file system)

HDFS cluster is consist of A Single NameNode

master server that manages the namespace of a file system, regulates clients’ access to file

A Number of DataNode manage storage directly attached to each DataNode

HDFS placement policy place each of three replicas on each node in the local rack

Advantage : improve write performance by cutting down inter-rack write traffic

/27

Introduction

5

Essential to reduce the shuffling overhead to improve the overall perfor-mance of the MapReduce computation. the network bandwidth between nodes is also an important factor of the shuffling overhead.

Node1

file

file

Input format

Split

Split

Split

RR RR RR

map

map

map

Com-binerParti-tioner

(sort)

reduce

Output Format

Node2

file

file

Input format

Split

Split

Split

RR RR RR

map

map

map

Com-binerParti-tioner

(sort)

reduce

Output Format

Writeback toLocal HDFS

store

RecordReaders

“Shuffling” process(over the N/W)

Files loaded from HDFS stores

6 /27

Introduction

Hadoop’s basic principle Moving computation is better

Better to migrate the computation closer

It’s used for when the size of data set is huge the migration of the computation minimizes

network congestion and increase the overall throughput1) of the system.

1)Throughput : 지정된 시간 내 전송된 처리량

7 /27

Introduction

HOD(Hadoop-On-Demand, developed by Ya-hoo!) a management system for provisioning virtual Hadoop

cluster over a large physical large physical clusterAll physical nodes are shared by more than one Yahoo! En-

gineers

Increase the utilization of physical resource

When the computing resources are shared by multiple users, Hadoop policy(‘Moving com-putation’) is not effective Because resource are shared

Resource e.g : computing n/w, hardware resource

8 /27

Introduction

To solve the that problem, two optimiza-tion scheme is proposed Prefetching

Intra-block prefetching Inter-block prefetching

Pre-shuffling

9 /27

Related work

J. Dean and S. Ghemawat Traditional prefetching techniques

V. Padmanabhan and J.Mogul, T.Kroeger and D. long, P. Cao,E. Felten et al., Prefetching method to reduce I/O latency

10/27

Related work

Zaharia et al., LATE(Longest Approximation Time to End)

More efficiently in the shared environment

Drayd(Microsoft) Can be expressed as direct acyclic graph

The degree of data locality is highly related to the MapReduce performance

11/27

Design(Prefetching Scheme)

Intra2)-block prefetching Bi-directional processing A simple prefetching technique that prefetches data within a single block

while performing a complex computation

ComputationIn progress

PrefetchingIn progress

2)Intra : 안 내부

Fig.1. The intra-block prefetching in Map Phase

ComputationIn progress

PrefetchingIn progress

Fig.2. The intra-block prefetching in Reduce Phase

Assigned input split for map task

Expected data for reduce task

12/27


While a complex job is performed in the left side, the to be-required data are prefetched and assigned in parallel to the corresponding task

Advantage of Intra-block prefetching 1. Using the concept of processing bar that monitors

the current status of each side and invokes a signal if synchronization is about to be broken

2. Try find the appropriate prefetching rate at which the performance can be maximized while minimizing the prefetching overheadCan be minimize the network overhead

3)At which : when, where

13/27

2

2

3

3


Inter-block prefetching runs in block level, by prefetching the expected block replica4) to a

local rack

• A2, A3, A4 is prefetching the required blocks D=Distance

n1

n2

n3

D=1 D=5 D=8

block

block

block

block

block

block

1

1

4)replica : 복제본

14/27


Inter-block prefetching runs in block level, by prefetching the expected block replica4) to a

local rack

• A2, A3, A4 is prefetching the required blocks4)replica : 복제본

15/27


Inter-block prefetch-ing processing Algo-rithm 1. Assign map task to

the node that are the nearest to the required blocks

2. The predictor gener-ates the list of data blocks, B, to be prefetched for the target task t

16/27

Design(Pre-Shuffling Scheme)

Pre-Shuffling pro-cessing The pre-shuffling

module in the task scheduler looks over input split or candidate data in the map phase, and predicts which reducer the key-value pairs are partitioned into.

17/27

Design(Optimization)

LATE(Longest Approximation Time to End) algorithm How to robustly perform specu-

lative execution to maximize performance under heteroge-nous environment Did not consider data locality

that can accelerate the MapReduce computation further

D-LATE(Data-aware LATE) al-gorithm Almost the same LATE, except that

a task is assigned as nearly as possible to the location where the needed data are present

18/27

Implementation – Optimizer scheduler)

Optimized scheduler Predictor module

Not only finds stragglers, but also predicts candi-date data blocks and the reducers into which the key-value pairs are parti-tioned

D-LATEThese predictions, the opti-

mized scheduler perform the D-LATE algorithm

19/27

Implementation – Optimizer scheduler)

Prefetcher To Monitor the status of

worker threads and to man-age the prefetching syn-chronization with processing bar

Load Balancer Check the logs(include dis usage

per node and current n/w traffic per data block)

Invoke to maintain load bal-ancing based on disk usage and n/w traffic

20/27

Evaluation

Two dual-core 2.0Ghz AMD, 4GB main memory 400GB ATA Hard disk drives Gigabit Ethernet n/w interface card The entire nodes are divided in to 40racks which are con-

nected with L3 routers Yahoo! Grid which consists of 1670 nodes All test configured that HDFS maintains four replicas for each

data block, whose size is 128MB Three type of workload ; wordcount, search log aggregator, simi-

larity calculator

21/27

Evaluation

Fig8, #1 : smallest ratio of number of nodes to the num-ber of map tasks.

#5 : due to significant reduction in shuffling overhead

Fig7, We can observe that HPMR shows significantly bet-ter performance than the na-tive Hadoop for all of test sets

22/27

Evaluation

The prefetching latency is affected by disk overhead or n/w congestion

Therefore, the long prefetching latency in-dicates that the corre-sponding node is heavily loaded

Prefetching rate increases beyond 60%

23/27

Evaluation

This means that HPMR assures consistent performance even in the shared environment such as Yahoo!Grid where the available bandwidth fluctuates severely.

4Kbps ~ 128Kbps

24/27

Conclusion

Two innovative schemes The prefetching scheme

Exploits data locality

The pre-shuffling scheme Reduce the network overhead required to shuffle key-value

pairs

HPMR is implemented as a plug-in type compo-nent for Hadoop

HPMR improves the overall performance by up to 73% compared to the native Hadoop

Next, step we plan to evaluate a more compli-cated workload such as HAMA(Open-source Apache incubator project)

/27

Appendix : MapReduce Example

MapReduce Example : Weather data set 분석 하나의 레코드는 라인 단위로 저장되며 , 이때 저장 타입은 ASCII 형태 하나의 파일에서 각 필드는 구분자없이 고정길이로 저장되어 있음 레코드 예제 ) 0057332130999991950010103004+51317+028783FM-

12+017199999V0203201N00721004501CN0100001N9-01281-01391102681

질의 1901 년 ~ 2001 년 동안 작성된 NCDC 데이터 파일들로부터 각 년도별 가장 높은

기온 (F) 을 측정하라

25

Input:Chunk(64MB) 단위 데이터 파일

1st Map:파일로부터

<offset, 레코드 >추출

2nd Map:각 레코드로부터< 연도 , 기온 >

추출

Shuffle:연도별 데이터 그룹으로 정리

Reduce:최종 결과

병합 및 반환

/27


1st Map : 파일에서 , <Offset, Record> 추출 <Key_1, Value> = <offset, record>

<0, 0067011990999991950051507004...9999999N9+00001+99999999999...>

<106, 0043011990999991950051512004...9999999N9+00221+99999999999...>

<212, 0043011990999991950051518004...9999999N9-00111+99999999999...>

<318, 0043012650999991949032412004...0500001N9+01111+99999999999...>

<424, 0043012650999991949032418004...0500001N9+00781+99999999999...>

...

2nd Map : 각 레코드별 Year, Temp 추출 <Key_2, Value> = <year, Temp>

<1950, 0>

<1950, 22>

<1950, −11>

<1949, 111>

<1949, 78>

…

26

연도 기온

27/27


Shuffle

2nd Map 의 결과가 너무 많기 때문에 , 이를 각 연도별 데이터 그룹으로 다시 정리 Reduce 과정에서 병합시 , 처리 비용 감소

Reduce : 모든 Map 의 후보집합을 병합하여 최종 결과 반환

<1950, 0><1950, 22><1950, −11><1949, 111><1949, 78>

<1949, [111, 78]><1950, [0, 22, −11]>

2nd Map

Shuffle

(1950, [0, 22, −11])

(1950, [25, 15])

Mapper_1

Mapper_2(1950, [0, 22, −11, 25, 15])

(1950, 25)

Reducer(1949, [111, 78])

(1949, [30, 45])

(1949, [111, 78, 30, 45])

(1949, 111)

28/27

Appendix : Hadoop the Definitive Guide p19~20

1

1

2

2

3

34

4

ieee 2009 sangwon seo(kaist), ingook jang kyungchang woo, inkyo kim jin-soo kim, seungyoul maeng...

Documents