mapreduce : simplified data processing on large clusters
DESCRIPTION
MapReduce : Simplified Data Processing on Large Clusters. 2009-21146 Lim JunSeok. Contents. 1. Introduction 2. Programming Model 3. Structure 4. Performance & Experience 5. Conclusion. Introduction. Introduction. What is MapReduce ? - PowerPoint PPT PresentationTRANSCRIPT
MapReduce: Simplified Data Pro-cessing on Large Clusters
2009-21146 Lim JunSeok
2
1. Introduction2. Programming Model3. Structure4. Performance & Experience5. Conclusion
Contents
3
Introduction
4
Introduction What is MapReduce?
A simple and powerful interface that enables automatic paral-lelization and distribution of large-scale computations.
A programming model executes process in distributed manner exploits large set of commodity computers for large data set(> 1 TB)
with underlying runtime System parallelizes the computation across large-scale clusters of machines handles machine failures schedules inter-machine communication to make efficient use of the
network and disk
5
Motivation Want to process lots of data( > 1TB)
E.g. Raw data: crawled documents, Web request logs, … Derived data: inverted indices, summaries of the number of pages, a set of
most frequent queries in a given day.
Want to parallelize across hundreds/thousands of CPUs
And, want to make these easy
Google Data Centers – File System DistributedThe Digital Universe 2009-2020
6
Motivation Application: Sifting through large amounts of
data Used for
Generating the Google search index Clustering problems for Google News and Froogle prod-
ucts Extraction of data used to produce reports of popular
queries Large scale graph computation Large scale machine learning …
Google Search PageRank Machine learning
7
Motivation Platform: clusters of inexpensive machines
Commodity computers(15,000 Machines in 2003) Scale to large clusters: thousands of machines Data distributed and replicated across machines of
the cluster Recover from machine failure Hadoop, Google File System
Hadoop
Google File System
8
Programming Model
9
MapReduce Programming Model MapReduce framework
Partitioning function: (default) Related well balanced partitions Partitioning function and can be specified by users.
Map
Reduce
Partitioning function
10
MapReduce Programming Model Map phase
Local computation Process each record independently and locally
Reduce phase Aggregate the filtered output
Local StorageMap Reduce Result
Commodity computers
11
Example: Word CountingFile 1:
Hello World Bye SQLFile 2:
Hello Map Bye Re-duce
<Hello, 1> <World, 1><Bye, 1> <SQL, 1>
<Hello, 1> <Map, 1><Bye, 1> <Reduce, 1>
Map procedure
<Hello, {1,1}> <World, 1> <Map,1>
<Bye, {1,1}> <SQL, 1><Reduce,1>
<Hello, 2> <World, 1> <Map,1> <Bye, 2> <SQL, 1><Reduce,1>
Reduce procedure
Partitioning Func-tion
12
Example: PageRank PageRank review:
Link analysis algorithm
: set of all Web pages: set of pages that link to the page : the total number of links going out of : the PageRank of page
13
Example: PageRank Key ideas for Map Reduce
RageRank calculation only depends on the PageR-ank values of previous iteration
PageRank calculation of each Web pages can be processed in parallel
Algorithm: Map: Provide each page’s PageRank ‘fragments’ to the
links Reduce: Sum up the PageRank fragments for each page
14
Example: PageRank Key ideas for Map Reduce
15
Example: PageRank PageRank calculation with 4 pages
16
Example: PageRank Map phase: Provide each page’s PageRank ‘fragments’ to
the links
PageRank fragment computation of page 1 PageRank fragment computation of page 2
17
Example: PageRank Map phase: Provide each page’s PageRank ‘fragments’ to
the links
PageRank fragment computation of page 3 PageRank fragment computation of page 4
18
Example: PageRank Reduce phase: Sum up the PageRank fragments for each
page
19
Structure
20
Execution Overview(1) Split the input files into M pieces of 16-64MB per piece. Then start many copies of pro-gram
(2) Master is special: the rest are workers that are assigned work by the master.
M map tasks and R reduce tasks
(3) Map phase Assigned worker read the input
files Parse the input data into key/value
pairs Produce intermediate key/value
pairs
(7) Return to user code
21
Execution Overview(4) Buffered pairs are written to local disk, partitioned into R regions by the partitioning function
The locations are passed back to the master
Master forwards these locations to the reduce workers
(5) Reduce phase 1: read and sort
Reduce workers read the data from intermediate data for its partition
Sort intermediate key/value pairs to group data by same key
(7) Return to user code
22
Execution Overview(6) Reduce phase 2: reduce function
Iterate over the sorted inter-mediate data in the reduce function
The output is appended to a final output file for the reduce function
(7) Return to user code The master wakes up the user
program Return back to the user code
(7) Return to user code
23
Failure Tolerance Handled via re-execution: worker failure
Failure detection: heartbeat The master pings every worker periodically
Handling Failure: re-execution Map task:
Re-execute completed and in-progress map tasks since map tasks are performed in the local
Reset the state of map tasks and re-schedule Reduce tasks
Re-execute in-progress map tasks since the data is stored in local
Completed reduce tasks do NOT need to be re-executed The results are stored in global file system
24
Failure Tolerance Master failure:
Job state is checkpointed to global file system New master recovers and continues the tasks from
checkpoint
Robust to large-scale worker failure: Simply re-execute the tasks! Simply make new masters! E.g.
Lost 1600 of 1800 machines once, but finished fine.
25
Locality Network bandwidth is a relatively scarce resource
Input data is stored on the local disks of the machines GFS divides each file into 64MB blocks Store several copies of each block on different machines
Local computation: Master takes the information of location of input data’s replica Map task is performed in the local disk that contains the
replica of the input data If it fails, master schedules the map task near a replica
E.g.: worker on the same network switch Most input data is read locally and consumes no network
bandwidth
26
Task Granularity Fine granularity tasks
Many more map tasks than machines The many map tasks can be completed by spread out
across all the other worker machines
Practical bounds on the size of M and R for scheduling for state in memory
The constant factors for memory usage are small One piece of the state is approximately one byte of data
per map/reduce task pair
27
Backup Tasks Slow workers significantly lengthen comple-
tion time Other jobs consuming resources on machine Bad disks with soft errors
Data transfer very slowly Weird things
Processor cashes disabled
Solution: Near end of phase, spawn backup copies of tasks Whichever, one finishes first wins As a result, job completion time dramatically
shortened E.g. 44% longer to complete if backup task mechanism
is disabled
28
Performance & Experience
29
Performance Experiment setting
1,800 machines 4 GB of memory Dual-processor 2 GHz Xeons with Hyperthreading Dual 160 GB IDE disks Gigabit Ethernet per machine Approximately 100-200 Gbps of aggregate band-
with
30
Performance MR_Grep: Grep task with MapReduce
Grep: search relatively rare three-character pattern through 1 terabyte
80 sec to hit zero
Computation peaks at over 30GB/s when 1764 workers are assigned
Locality optimization helps Without this, rack switches would limit to 10GB/s
Data transfer rate over time
31
Performance MR_Sort: Sorting task with MapReduce
Sort: sort 1 terabyte of 100 byte records
Takes about 14 min.
Input rate is higher than the shuffle rate and the output rate; locality
Shuffle rate is higher than output rate Output phase writes two copies for reliability
32
Performance MR_Sort: Backup task and failure tolerance
Backup tasks reduce job completion time significantly System deal well with failures
33
Experience Large-scale indexing
MapReduce used for the Google Web search ser-vice
As a results, The indexing code is simpler, smaller, and easier to un-
derstand Performance is good enough Locality makes it easy to change the indexing process
A few months a few days MapReduce takes care of failures, slow machines Easy to make indexing faster by adding more machines
34
The number of MapReduce instances grows significantly over time
2003/02: first version 2004/09: almost 900 2006/03: about 4000 2007/01: over 6000
Experience
MapReduce instances over time
35
New MapReduce Programs Per Month The number of new MapReduce programs in-
creases continuously
Experience
36
Experience MapReduce statistics for different months
Aug. ‘04 Mar. ‘06 Sep. ‘07Number of jobs(1000s) 39 171 2,217Avg. completion time
(secs)634 874 395
Machine years used 217 2,002 11,081Map input data (TB) 3,288 52,254 403,152
Map output data (TB) 758 6,743 34,774Reduce output
data(TB)193 2,970 14,018
Avg. machines per job 157 268 394Unique implementation
Map 395 1958 4083Reduce 269 1208 2418
37
Conclusion
38
Are every tasks suitable for MapRe-duce?
Have a cluster; local computation
Working with large dataset
Working with independent data
Information to share across clusters is small
e.g. word count, grep, K-means clus-tering, PageRank
Cannot work independently with data
Cannot be cast into Map and Re-duceInformation to share across clusters is large• Exponential size or even linear size
e.g. Fibonacci series
Suitable if… NOT Suitable if… NOT every tasks are suitable for MapReduce:
39
Is it trend? Really? Job market trend:
‘World says 'No' to NoSQL’ – written by IBM (2011.9, BNT RackSwitch G8264)
Comparing to SQL, Much harder to learn It cannot solve all problems in the world
E.g. Fibonacci: Main stream enterprise don’t need it
they already have skillful engineers of another languages.
Percentage of matching job postings
SQL: 4%MapReduce: 0……%
40
Conclusion Focus on problem:
let library deal with messy details Automatic parallelization and distribution
MapReduce has proven to be a useful abstraction MapReduce Simplifies large-scale computations
at Google Functional programming paradigm can be applied
to large-scale application
41
EOD