mapreduce 資工碩一 黃威凱. outline purpose example method advanced 資工碩一 黃威凱

Post on 02-Jan-2016

288 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

資工碩一 黃威凱

MapReduce

資工碩一 黃威凱

OutlinePurposeExampleMethodAdvanced

資工碩一 黃威凱

PURPOSE

資工碩一 黃威凱

PurposeData miningData processing

資工碩一 黃威凱

EXAMPLE

資工碩一 黃威凱

ExampleFind the maximum temperature of

yearNational Climatic Data Center(NCDC)

◦The data is stored using a line-oriented ASCII format , in which each line is a record

◦There is a directory for each year from 1901 to 2001 ,each containing a gzipped file for each weather station with its readings for that year

資工碩一 黃威凱

Example(Data format)

資工碩一 黃威凱

Example(Gzipped file, example for 1990)

◦% ls raw/1990 | head◦010010-99999-1990.gz◦010014-99999-1990.gz◦010015-99999-1990.gz◦010016-99999-1990.gz◦010017-99999-1990.gz◦010030-99999-1990.gz◦010040-99999-1990.gz◦010080-99999-1990.gz◦010100-99999-1990.gz◦010150-99999-1990.gz

資工碩一 黃威凱

METHOD

資工碩一 黃威凱

MethodAnalzing the data with Unix toolsAnalzing the data with Hadoop

資工碩一 黃威凱

Method(Unix tools)

資工碩一 黃威凱

Method(Unix tools)Here is the beginning of a run:

◦% ./max_temperature.sh◦1901 317◦1902 244◦1903 289◦1904 256◦1905 283◦ ...

The complete run for the century took 42 minutes in one run single EC2 High-CPU Extra Large Instance.

資工碩一 黃威凱

Method(Hadoop)Use MapReduce

◦Map Shuffle

◦Reduce

資工碩一 黃威凱

Method(Hadoop)Map function

◦Pull out the year and the air temperature

◦Transform key-value pairs

資工碩一 黃威凱

Method(Hadoop)Map function

◦The shuffle Each reduce task is fed by many map

tasks.

資工碩一 黃威凱

Method(Hadoop)Reduce function

◦Iterate through the list and pick up the maximum reading

◦Input (1949, [111, 78]) (1950, [0, 22, -11])

◦Output: (1949, 111) (1950, 22)

資工碩一 黃威凱

Method(Hadoop)Data flow

資工碩一 黃威凱

Method(Hadoop)Java MapReduce-Mapper

example

資工碩一 黃威凱

Method(Hadoop)Java MapReduce-Reduce example

資工碩一 黃威凱

Method(Hadoop)Java MapReduce-Job example

Support multiple path

資工碩一 黃威凱

ADVANCED

資工碩一 黃威凱

AdvancedCase1

資工碩一 黃威凱

AdvancedCase2

資工碩一 黃威凱

AdvancedCase3

資工碩一 黃威凱

AdvancedCombiner Functions on Map

output◦Example

Map input1: (1950, 0), (1950, 20), (1950, 10)

Map input2: (1950, 25), (1950, 15) After shuffle:

Map1: (1950, [0,20,10]) Map2: (1950, [25,15])

No Use Combiner to reduce input (1950, [0, 20, 10, 25, 15])

Use Combiner to reduce input (1950, [20, 25])

top related