map reduce学习报告

MapReduce Implementation Intro.

Anty.Rao(@gmail.com)

Nov 2,2011

Outline

• Map Reduce Overview• Map Phase• Reduce Phase• Potential Optimization

Map Reduce Overview

Hadoop—The Definition Guide

Map Phase

Map Phase Diagram

Steps of Map Phase

• Put records emitted by map function into circle buffer continually

• When buffer usage space exceed io.sort.mb*io.sort.spill.percent, spill will start which will sort records by partition, key-part, then write out buffer onto disk, with a index file associated with it indicating the positions where partition begins.

• Merge will combine all the intermediate files into a single large file, plus a index file.

Main map-side tuning Knobs

Reduce Phase

Reduce Phase Diagram

Steps of Reduce Phase

• Pull over data from map, if there is space available In memory & the size of file is less than 25%*HeapSize*mapred.job.shuffle.input.buffer.percent, put file in memory, else directly store file on disk.

Steps of Reduce Phase(Cont.)

• Merge operation will merge and sort data from memory and/or disk and write result on disk. Merge operation come in two different flavors:– In-memory merge operation

• In-memory merge operation can be triggered when accumulated memory space exceed mapred.job.shuffle.merge.percent.

– On-disk merge operation• On-disk merge operation will be triggered when # of

files on disk exceed configured threshold.

Steps of Reduce Phase(Cont.)

• When shuffle and sort complete, before feeding reduce function, it must satisfy the following constraints: – memory usage for buffering reduce input can’t

exceed mapred.job.reduce.input.buffer.percent; – # of files on disk can’t exceed io.sort.factor

Notes about Reduce

• Shuffle & sort take up % of Reduce heap size to buffer shuffle data, because Reduce can’t start until shuffle and sort complete. As opposed to Map phase, which buffer size is determined by io.sort.mb.

• Reduce input may contains multiple files, not necessarily a single file. Just using a heap iterator to feed reduce function.

Reduce-side Key parameters

Optimization Tuning

• We can make use of mapred.job.reduce.input.buffer.percent which specify how much memory can be spared to use as reduce input buffer

• Look at the difference between the following cases– Case-1– Case-2– Case-3

Case-1

All reduce input reside on disk

Case-2

Partial data in memory ,plus data on disk as reduce input

Case-3

Much better, all data in memory

• If reduce function don’t stress memory too much, we can spare some memory to buffer reduce input to boost overall performance.

• What’s more, if input data is small, we can let reduces hold all intermediate data in memory, not involving disk access.

Potential optimization(?)• In that Reduce input files reside on local FS, maybe we can optimize

disk access (read and write)with local file system API, such as mmap, Without using HDFS API.

• Transfer local map output during shuffle phase, maybe we can use more efficiency network API to improve data transfer efficiency between nodes, such as sendfile()?

• Currently reduce randomly choose map to fetch map output, maybe we can use smart schedule policy to improve shuffle performance.

• Map and Reduce may have different memory need, configure JVM options separately.– mapred.map.child.java.opts– mapred.reduce.child.java.opts

map reduce学习报告

Technology