mapreduce : simplified data processing on large clusters

41
MapReduce: Simplified Data Processing on Large Clusters 2009-21146 Lim JunSeok

Upload: munin

Post on 23-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

MapReduce : Simplified Data Processing on Large Clusters. 2009-21146 Lim JunSeok. Contents. 1. Introduction 2. Programming Model 3. Structure 4. Performance & Experience 5. Conclusion. Introduction. Introduction. What is MapReduce ? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MapReduce : Simplified Data Processing on Large Clusters

MapReduce: Simplified Data Pro-cessing on Large Clusters

2009-21146 Lim JunSeok

Page 2: MapReduce : Simplified Data Processing on Large Clusters

2

1. Introduction2. Programming Model3. Structure4. Performance & Experience5. Conclusion

Contents

Page 3: MapReduce : Simplified Data Processing on Large Clusters

3

Introduction

Page 4: MapReduce : Simplified Data Processing on Large Clusters

4

Introduction What is MapReduce?

A simple and powerful interface that enables automatic paral-lelization and distribution of large-scale computations.

A programming model executes process in distributed manner exploits large set of commodity computers for large data set(> 1 TB)

with underlying runtime System parallelizes the computation across large-scale clusters of machines handles machine failures schedules inter-machine communication to make efficient use of the

network and disk

Page 5: MapReduce : Simplified Data Processing on Large Clusters

5

Motivation Want to process lots of data( > 1TB)

E.g. Raw data: crawled documents, Web request logs, … Derived data: inverted indices, summaries of the number of pages, a set of

most frequent queries in a given day.

Want to parallelize across hundreds/thousands of CPUs

And, want to make these easy

Google Data Centers – File System DistributedThe Digital Universe 2009-2020

Page 6: MapReduce : Simplified Data Processing on Large Clusters

6

Motivation Application: Sifting through large amounts of

data Used for

Generating the Google search index Clustering problems for Google News and Froogle prod-

ucts Extraction of data used to produce reports of popular

queries Large scale graph computation Large scale machine learning …

Google Search PageRank Machine learning

Page 7: MapReduce : Simplified Data Processing on Large Clusters

7

Motivation Platform: clusters of inexpensive machines

Commodity computers(15,000 Machines in 2003) Scale to large clusters: thousands of machines Data distributed and replicated across machines of

the cluster Recover from machine failure Hadoop, Google File System

Hadoop

Google File System

Page 8: MapReduce : Simplified Data Processing on Large Clusters

8

Programming Model

Page 9: MapReduce : Simplified Data Processing on Large Clusters

9

MapReduce Programming Model MapReduce framework

Partitioning function: (default) Related well balanced partitions Partitioning function and can be specified by users.

Map

Reduce

Partitioning function

Page 10: MapReduce : Simplified Data Processing on Large Clusters

10

MapReduce Programming Model Map phase

Local computation Process each record independently and locally

Reduce phase Aggregate the filtered output

Local StorageMap Reduce Result

Commodity computers

Page 11: MapReduce : Simplified Data Processing on Large Clusters

11

Example: Word CountingFile 1:

Hello World Bye SQLFile 2:

Hello Map Bye Re-duce

<Hello, 1> <World, 1><Bye, 1> <SQL, 1>

<Hello, 1> <Map, 1><Bye, 1> <Reduce, 1>

Map procedure

<Hello, {1,1}> <World, 1> <Map,1>

<Bye, {1,1}> <SQL, 1><Reduce,1>

<Hello, 2> <World, 1> <Map,1> <Bye, 2> <SQL, 1><Reduce,1>

Reduce procedure

Partitioning Func-tion

Page 12: MapReduce : Simplified Data Processing on Large Clusters

12

Example: PageRank PageRank review:

Link analysis algorithm

: set of all Web pages: set of pages that link to the page : the total number of links going out of : the PageRank of page

Page 13: MapReduce : Simplified Data Processing on Large Clusters

13

Example: PageRank Key ideas for Map Reduce

RageRank calculation only depends on the PageR-ank values of previous iteration

PageRank calculation of each Web pages can be processed in parallel

Algorithm: Map: Provide each page’s PageRank ‘fragments’ to the

links Reduce: Sum up the PageRank fragments for each page

Page 14: MapReduce : Simplified Data Processing on Large Clusters

14

Example: PageRank Key ideas for Map Reduce

Page 15: MapReduce : Simplified Data Processing on Large Clusters

15

Example: PageRank PageRank calculation with 4 pages

Page 16: MapReduce : Simplified Data Processing on Large Clusters

16

Example: PageRank Map phase: Provide each page’s PageRank ‘fragments’ to

the links

PageRank fragment computation of page 1 PageRank fragment computation of page 2

Page 17: MapReduce : Simplified Data Processing on Large Clusters

17

Example: PageRank Map phase: Provide each page’s PageRank ‘fragments’ to

the links

PageRank fragment computation of page 3 PageRank fragment computation of page 4

Page 18: MapReduce : Simplified Data Processing on Large Clusters

18

Example: PageRank Reduce phase: Sum up the PageRank fragments for each

page

Page 19: MapReduce : Simplified Data Processing on Large Clusters

19

Structure

Page 20: MapReduce : Simplified Data Processing on Large Clusters

20

Execution Overview(1) Split the input files into M pieces of 16-64MB per piece. Then start many copies of pro-gram

(2) Master is special: the rest are workers that are assigned work by the master.

M map tasks and R reduce tasks

(3) Map phase Assigned worker read the input

files Parse the input data into key/value

pairs Produce intermediate key/value

pairs

(7) Return to user code

Page 21: MapReduce : Simplified Data Processing on Large Clusters

21

Execution Overview(4) Buffered pairs are written to local disk, partitioned into R regions by the partitioning function

The locations are passed back to the master

Master forwards these locations to the reduce workers

(5) Reduce phase 1: read and sort

Reduce workers read the data from intermediate data for its partition

Sort intermediate key/value pairs to group data by same key

(7) Return to user code

Page 22: MapReduce : Simplified Data Processing on Large Clusters

22

Execution Overview(6) Reduce phase 2: reduce function

Iterate over the sorted inter-mediate data in the reduce function

The output is appended to a final output file for the reduce function

(7) Return to user code The master wakes up the user

program Return back to the user code

(7) Return to user code

Page 23: MapReduce : Simplified Data Processing on Large Clusters

23

Failure Tolerance Handled via re-execution: worker failure

Failure detection: heartbeat The master pings every worker periodically

Handling Failure: re-execution Map task:

Re-execute completed and in-progress map tasks since map tasks are performed in the local

Reset the state of map tasks and re-schedule Reduce tasks

Re-execute in-progress map tasks since the data is stored in local

Completed reduce tasks do NOT need to be re-executed The results are stored in global file system

Page 24: MapReduce : Simplified Data Processing on Large Clusters

24

Failure Tolerance Master failure:

Job state is checkpointed to global file system New master recovers and continues the tasks from

checkpoint

Robust to large-scale worker failure: Simply re-execute the tasks! Simply make new masters! E.g.

Lost 1600 of 1800 machines once, but finished fine.

Page 25: MapReduce : Simplified Data Processing on Large Clusters

25

Locality Network bandwidth is a relatively scarce resource

Input data is stored on the local disks of the machines GFS divides each file into 64MB blocks Store several copies of each block on different machines

Local computation: Master takes the information of location of input data’s replica Map task is performed in the local disk that contains the

replica of the input data If it fails, master schedules the map task near a replica

E.g.: worker on the same network switch Most input data is read locally and consumes no network

bandwidth

Page 26: MapReduce : Simplified Data Processing on Large Clusters

26

Task Granularity Fine granularity tasks

Many more map tasks than machines The many map tasks can be completed by spread out

across all the other worker machines

Practical bounds on the size of M and R for scheduling for state in memory

The constant factors for memory usage are small One piece of the state is approximately one byte of data

per map/reduce task pair

Page 27: MapReduce : Simplified Data Processing on Large Clusters

27

Backup Tasks Slow workers significantly lengthen comple-

tion time Other jobs consuming resources on machine Bad disks with soft errors

Data transfer very slowly Weird things

Processor cashes disabled

Solution: Near end of phase, spawn backup copies of tasks Whichever, one finishes first wins As a result, job completion time dramatically

shortened E.g. 44% longer to complete if backup task mechanism

is disabled

Page 28: MapReduce : Simplified Data Processing on Large Clusters

28

Performance & Experience

Page 29: MapReduce : Simplified Data Processing on Large Clusters

29

Performance Experiment setting

1,800 machines 4 GB of memory Dual-processor 2 GHz Xeons with Hyperthreading Dual 160 GB IDE disks Gigabit Ethernet per machine Approximately 100-200 Gbps of aggregate band-

with

Page 30: MapReduce : Simplified Data Processing on Large Clusters

30

Performance MR_Grep: Grep task with MapReduce

Grep: search relatively rare three-character pattern through 1 terabyte

80 sec to hit zero

Computation peaks at over 30GB/s when 1764 workers are assigned

Locality optimization helps Without this, rack switches would limit to 10GB/s

Data transfer rate over time

Page 31: MapReduce : Simplified Data Processing on Large Clusters

31

Performance MR_Sort: Sorting task with MapReduce

Sort: sort 1 terabyte of 100 byte records

Takes about 14 min.

Input rate is higher than the shuffle rate and the output rate; locality

Shuffle rate is higher than output rate Output phase writes two copies for reliability

Page 32: MapReduce : Simplified Data Processing on Large Clusters

32

Performance MR_Sort: Backup task and failure tolerance

Backup tasks reduce job completion time significantly System deal well with failures

Page 33: MapReduce : Simplified Data Processing on Large Clusters

33

Experience Large-scale indexing

MapReduce used for the Google Web search ser-vice

As a results, The indexing code is simpler, smaller, and easier to un-

derstand Performance is good enough Locality makes it easy to change the indexing process

A few months a few days MapReduce takes care of failures, slow machines Easy to make indexing faster by adding more machines

Page 34: MapReduce : Simplified Data Processing on Large Clusters

34

The number of MapReduce instances grows significantly over time

2003/02: first version 2004/09: almost 900 2006/03: about 4000 2007/01: over 6000

Experience

MapReduce instances over time

Page 35: MapReduce : Simplified Data Processing on Large Clusters

35

New MapReduce Programs Per Month The number of new MapReduce programs in-

creases continuously

Experience

Page 36: MapReduce : Simplified Data Processing on Large Clusters

36

Experience MapReduce statistics for different months

Aug. ‘04 Mar. ‘06 Sep. ‘07Number of jobs(1000s) 39 171 2,217Avg. completion time

(secs)634 874 395

Machine years used 217 2,002 11,081Map input data (TB) 3,288 52,254 403,152

Map output data (TB) 758 6,743 34,774Reduce output

data(TB)193 2,970 14,018

Avg. machines per job 157 268 394Unique implementation

Map 395 1958 4083Reduce 269 1208 2418

Page 37: MapReduce : Simplified Data Processing on Large Clusters

37

Conclusion

Page 38: MapReduce : Simplified Data Processing on Large Clusters

38

Are every tasks suitable for MapRe-duce?

Have a cluster; local computation

Working with large dataset

Working with independent data

Information to share across clusters is small

e.g. word count, grep, K-means clus-tering, PageRank

Cannot work independently with data

Cannot be cast into Map and Re-duceInformation to share across clusters is large• Exponential size or even linear size

e.g. Fibonacci series

Suitable if… NOT Suitable if… NOT every tasks are suitable for MapReduce:

Page 39: MapReduce : Simplified Data Processing on Large Clusters

39

Is it trend? Really? Job market trend:

‘World says 'No' to NoSQL’ – written by IBM (2011.9, BNT RackSwitch G8264)

Comparing to SQL, Much harder to learn It cannot solve all problems in the world

E.g. Fibonacci: Main stream enterprise don’t need it

they already have skillful engineers of another languages.

Percentage of matching job postings

SQL: 4%MapReduce: 0……%

Page 40: MapReduce : Simplified Data Processing on Large Clusters

40

Conclusion Focus on problem:

let library deal with messy details Automatic parallelization and distribution

MapReduce has proven to be a useful abstraction MapReduce Simplifies large-scale computations

at Google Functional programming paradigm can be applied

to large-scale application

Page 41: MapReduce : Simplified Data Processing on Large Clusters

41

EOD