hadoop api

Upload: noel-koutlis

Post on 14-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Hadoop API

    1/34

    Large Scale Data Management Systems

    Hadoop API

    Dimitrios Michail

    Dept. of Informatics and TelematicsHarokopio University of Athens

    2012-2013

    . . . . . .

  • 7/30/2019 Hadoop API

    2/34

    . . . . . .

    Hadoop

    Input/Output(input) < k1, v1 >

    map

    < k2, v2 >

    combine

    < k2, v2 >

    reduce

    < k3, v3 > (output)

    Dimitrios Michail Harokopio University of Athens 2/29

  • 7/30/2019 Hadoop API

    3/34

    . . . . . .

    Hadoop

    Serializable

    The keyand value classes need to be serializable by the framework.

    they need to implement the Writable interface

    Dimitrios Michail Harokopio University of Athens 3/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/DoubleWritable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/WritableComparable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/Writable.html
  • 7/30/2019 Hadoop API

    4/34

    . . . . . .

    Hadoop

    Serializable

    The keyand value classes need to be serializable by the framework.

    they need to implement the Writable interface

    Comparable

    To facilitate sorting by the framework, the keyclass also needs to

    be comparable.

    needs to also implement the WritableComparable interface.

    Dimitrios Michail Harokopio University of Athens 3/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/DoubleWritable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/WritableComparable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/Writable.html
  • 7/30/2019 Hadoop API

    5/34

    . . . . . .

    Hadoop

    Serializable

    The keyand value classes need to be serializable by the framework.

    they need to implement the Writable interface

    Comparable

    To facilitate sorting by the framework, the keyclass also needs to

    be comparable.

    needs to also implement the WritableComparable interface.

    There are implementations for all basic types, e.g. DoubleWritable.

    Dimitrios Michail Harokopio University of Athens 3/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/DoubleWritable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/WritableComparable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/Writable.html
  • 7/30/2019 Hadoop API

    6/34

    . . . . . .

    WordCount Example

    1 p u b l i c c l a s s WordCount {2

    3 p ub li c s t a t i c c l a s s Map e xt e n d s MapReduceBase implementsMapper {

    4

    5 p r i v a t e s t a t i c I n t W r i t a b l e one = new I n t W r i t a b l e ( 1 ) ;6 p r i v a t e

    Text word =new

    Text () ;78 p u b l ic v oi d map (L o n g W rita ble k ey , Tex t v a lu e ,

    O u t p u t C o l l e ct o r o ut pu t , R e p o r t e rr e p o r t e r ) throws I O E x ce p t i on {

    9 S t r i n g l i n e = v a l u e . t o S t r i n g ( ) ;10 S t r in g To k en i ze r t o k e n i z e r = new S t r i n g T o k e n i z e r ( l i n e ) ;

    11 w h i l e ( to k e n i ze r . h as Mo reTo kens () ) {12 word . se t ( to ke ni ze r . nextToken( ) ) ;13 o u tp ut . co l l e c t (word , o ne) ;14 }15 }16 }

    Dimitrios Michail Harokopio University of Athens 4/29

  • 7/30/2019 Hadoop API

    7/34

    . . . . . .

    WordCount Example

    18

    19 p ub li c s t a t i c c l a s s Reduce e xt e n d s MapReduceBaseimplements Reducer {

    20

    21 p u b l ic v oi d r e du c e ( Te xt key , I t e r a t o r v a l u e s , O u t p u t C o l l e c t or o ut pu t ,R e po r te r r e p o r t e r ) throws I O E x ce p t i on {

    22 i n t sum = 0;23 w h i l e ( v a l u es . h a sN ext () ) {24 sum += va l ue s . nex t () . ge t () ;25 }26 o u tp ut . co l l e c t ( k ey , new In t W ri ta b l e (sum) ) ;27 }28 }

    Dimitrios Michail Harokopio University of Athens 5/29

  • 7/30/2019 Hadoop API

    8/34

    . . . . . .

    WordCount Example

    30 p u b l i c s t a t i c v oi d main ( S t r i n g [ ] a r g s ) throws E x c ep t i on {31 J ob Co nf co n f = new JobConf(WordCount. c l a s s ) ;32 co nf . setJobName (wordcount) ;33

    34 conf . setOutputKe yClass ( Text . c l a s s ) ;35 c o n f . s e t O u t pu t V a l u e C l a s s ( I n t W r i t a b l e . c l a s s ) ;36

    37 conf . setMap perC lass (Map. c l a s s ) ;38 co n f . s etC o mb in erC la s s (Redu ce . c l a s s ) ;39 co n f . s etRed u cer C la s s (Redu ce . c l a s s ) ;40

    41 conf . setInp utFor mat ( TextInputFormat . c l a s s ) ;42 co nf . setOutpu tFormat ( TextOutputFormat . c l a s s ) ;43

    44 FileI n p u tF o rma t . s etI n p u t P a th s ( co n f , new P ath ( a rg s [ 0 ] ) ) ;45 FileOutputFormat . setOutputPath ( conf , new Pa th ( a rg s [ 1 ] ) ) ;46

    47 J o b C lie n t . ru nJ o b ( co n f ) ;48 }49

    50 } // e nd o f c l a s s WordCount

    Dimitrios Michail Harokopio University of Athens 6/29

  • 7/30/2019 Hadoop API

    9/34

    . . . . . .

    Input

    There are three major interfaces for providing input to amap-reduce job

    InputFormat

    InputSplit

    RecordReader

    Dimitrios Michail Harokopio University of Athens 7/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordReader.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.html
  • 7/30/2019 Hadoop API

    10/34

    . . . . . .

    InputFormat Interface

    InputFormat describes the input-specification for aMap-Reduce job.

    The Map-Reduce framework relies on the InputFormat of the job

    to:1 Validate the input-specification of the job.

    2 Split-up the input file(s) into logical InputSplit(s), each ofwhich is then assigned to an individual Mapper.

    3 Provide the RecordReader implementation to be used toproduce input records from the logical InputSplit forprocessing by the Mapper.

    Dimitrios Michail Harokopio University of Athens 8/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordReader.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.html
  • 7/30/2019 Hadoop API

    11/34

    . . . . . .

    InputSplit Interface

    InputSplit represents the data to be processed by an individualMapper.

    presents a byte-oriented view on the input, and

    it is the responsibility of the RecordReader of the job toprocess this and present a record-oriented view.

    Dimitrios Michail Harokopio University of Athens 9/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordReader.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.html
  • 7/30/2019 Hadoop API

    12/34

    . . . . . .

    RecordReader Interface

    RecordReader reads pairs from an InputSplit.

    converts the byte-oriented view of the input, provided by the

    InputSplit, and

    presents a record-oriented view for the Mapper & Reducertasks for processing.

    It thus assumes the responsibility of processing record boundariesand presenting the tasks with keys and values.

    Dimitrios Michail Harokopio University of Athens 10/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordReader.html
  • 7/30/2019 Hadoop API

    13/34

    . . . . . .

    Default InputFormat

    The default InputFormat is the TextInputFormat.

    TextInputFormat

    Implements TextInputFormat

    InputFormat for plain text files,

    files are broken into lines. Either linefeed or carriage-returnare used to signal end of line.

    keys are the position in the file, and values are the line of text.

    Dimitrios Michail Harokopio University of Athens 11/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/TextInputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/TextInputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.html
  • 7/30/2019 Hadoop API

    14/34

    . . . . . .

    Output

    The major interfaces for outputing a map-reduce job are

    OutputCollectorOutputFormat

    RecordWriter

    FileSystem

    Dimitrios Michail Harokopio University of Athens 12/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordWriter.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputCollector.html
  • 7/30/2019 Hadoop API

    15/34

    . . . . . .

    OutputCollector Interface

    Collects the pairs output by Mappers and Reducers.

    OutputCollectoris the generalization of the facility providedby the Map-Reduce framework to collect data output by either theMapper or the Reducer i.e. intermediate outputs or the output ofthe job.

    Dimitrios Michail Harokopio University of Athens 13/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputCollector.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    16/34

    . . . . . .

    OutputFormat Interface

    OutputFormat describes the output-specification for aMap-Reduce job.

    The Map-Reduce framework relies on the OutputFormatof

    the job to:

    1 Validate the output-specification of the job. For e.g. checkthat the output directory doesnt already exist.

    2 Provide the RecordWriter implementation to be used to write

    out the output files of the job. Output files are stored in aFileSystem.

    Dimitrios Michail Harokopio University of Athens 14/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordWriter.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.html
  • 7/30/2019 Hadoop API

    17/34

    . . . . . .

    RecordWriter Interface

    RecordWriter writes the output pairs to an output

    file.

    RecordWriter implementations write the job outputs to theFileSystem.

    Dimitrios Michail Harokopio University of Athens 15/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordWriter.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordWriter.html
  • 7/30/2019 Hadoop API

    18/34

    . . . . . .

    FileSystem

    An abstract base class for a fairly generic filesystem.

    Implemented either as:

    local

    reflects the locally-connected disk

    Distributed File System

    the Hadoop DFS which is a multi-machine system thatappears as a single disk

    fault tolerance and potentially very large capacity

    Dimitrios Michail Harokopio University of Athens 16/29

  • 7/30/2019 Hadoop API

    19/34

    . . . . . .

    Default OutputFormat

    The default OutputFormat is the TextOutputFormat.

    TextOutputFormat

    OutputFormat that writes plain text files,

    Dimitrios Michail Harokopio University of Athens 17/29

    J bC f

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/TextOutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.html
  • 7/30/2019 Hadoop API

    20/34

    . . . . . .

    JobConfmap/reduce job configuration

    JobConf is the primary interface for a user to describe amap-reduce job to the Hadoop framework for execution.

    JobConf typically specifies the

    Mapper,combiner (if any),

    Partitioner,

    Reducer,

    InputFormatand

    OutputFormat

    implementations to be used etc.

    Dimitrios Michail Harokopio University of Athens 18/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Partitioner.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/JobConf.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/JobConf.html
  • 7/30/2019 Hadoop API

    21/34

    . . . . . .

    Mapper Interface

    Mapper maps input key/value pairs to a set of intermediatekey/value pairs.

    Maps are the individual tasks which transform input records

    into intermediate records.The user implements the following interface:

    p u bl i c i n t e r f a c e Mapper e xt e n d sJ o bC o nf i gu r ab l e , C l o s e a b le {

    v o i d map(K1 key , V1 val ue , Out put Col lec tor o ut pu t , R e p or t e r r e p o r t e r ) ;

    }

    Dimitrios Michail Harokopio University of Athens 19/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    22/34

    . . . . . .

    Hadoop Map-Reduce Framework

    Framework

    spawns one map task for each InputSplit generated by theInputFormat for the job.

    calls map(K1, V1, OutputCollector, Reporter) for eachkey/value pair in the InputSplit for that task.

    Dimitrios Michail Harokopio University of Athens 20/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.html
  • 7/30/2019 Hadoop API

    23/34

    . . . . . .

    Hadoop Map-Reduce Framework

    Grouping

    All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to aReducer to determine the final output.

    Users can control the grouping by specifying a Comparator viaJobConf.setOutputKeyComparatorClass(Class).

    Dimitrios Michail Harokopio University of Athens 21/29

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    24/34

    . . . . . .

    Hadoop Map-Reduce Framework

    Partitioning

    The grouped Mapper outputs are partitioned per Reducer.

    Users can control which keys (and hence records) go to which

    Reducer by implementing a custom Partitioner.

    Dimitrios Michail Harokopio University of Athens 22/29

    H d M R d F k

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Partitioner.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    25/34

    . . . . . .

    Hadoop Map-Reduce Framework

    Partitioning

    The grouped Mapper outputs are partitioned per Reducer.

    Users can control which keys (and hence records) go to which

    Reducer by implementing a custom Partitioner.

    A partitioner is represented by the following interface:

    i n t e r f a c e P a rt it io n e r e xt e n d s J o b C o nf i g u r a b le {

    i n t g e t P a r t i t i o n (K2 key , V2 v a lu e , i n t n u m P a r t i t i o n s ) ;

    }

    Dimitrios Michail Harokopio University of Athens 22/29

    H d M R d F k

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Partitioner.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    26/34

    . . . . . .

    Hadoop Map-Reduce Framework

    Combiners

    Users can optionally specify a combiner, via

    JobConf.setCombinerClass(Class).Combiners perform local aggregation of the intermediateoutputs.

    Helps to cut down the amount of data transferred from the

    Mapper to the Reducer.

    Dimitrios Michail Harokopio University of Athens 23/29

    H d M R d F k

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    27/34

    . . . . . .

    Hadoop Map-Reduce Framework

    The intermediate, grouped outputs are stored in SequenceFileswhich are flat files consisting of binary key/value pairs.

    There are three SequenceFile Writers:

    1 Writer : Uncompressed records.

    2 RecordCompressWriter : Record-compressed files, onlycompress values.

    3 BlockCompressWriter : Block-compressed files, both keys &

    values are collected in blocks separately and compressed.The size of the block is configurable.

    Dimitrios Michail Harokopio University of Athens 24/29

    H d M R d F k

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.html
  • 7/30/2019 Hadoop API

    28/34

    . . . . . .

    Hadoop Map-Reduce Framework

    The intermediate, grouped outputs are stored in SequenceFileswhich are flat files consisting of binary key/value pairs.

    The actual compression algorithm used to compress key and/orvalues can be specified by using the appropriate CompressionCodec.

    There are also Readers and Sorters for each SequenceFile type.

    Dimitrios Michail Harokopio University of Athens 24/29

    R d I t f

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/compress/CompressionCodec.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.html
  • 7/30/2019 Hadoop API

    29/34

    . . . . . .

    Reducer Interface

    Reduce a set of intermediate values which share a key to a smallerset of values.

    The user implements the following interface:

    p u bl i c i n t e r f a c e Reducer e xt e n d sJ o bC on f ig ur a bl e , C l o s e a bl e {

    v o i d r e d u ce ( K2 k ey , I t e r a t o r v a l u e s , O u t p u t C o l l e c t or o u tp u t , Rep o rter re p o rt er ) ;

    }

    The number of Reducers for the job is set by the user viaJobConf.setNumReduceTasks(int).

    Dimitrios Michail Harokopio University of Athens 25/29

    Hadoop Map-Reduce Framework

  • 7/30/2019 Hadoop API

    30/34

    . . . . . .

    Hadoop Map Reduce FrameworkReducer Phases

    Phase 1: Shuffle

    The framework, for each reducer, fetches the relevantpartition of the output of all the Mappers, via HTTP.

    Dimitrios Michail Harokopio University of Athens 26/29

    Hadoop Map-Reduce Framework

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    31/34

    . . . . . .

    Hadoop Map Reduce FrameworkReducer Phases

    Phase 1: Shuffle

    The framework, for each reducer, fetches the relevantpartition of the output of all the Mappers, via HTTP.

    Phase 2: Sort

    The framework groups Reducer inputs by keys (since differentMappers may have output the same key).

    The shuffle and sort phases occur simultaneously i.e. while outputsare being fetched they are merged.

    Dimitrios Michail Harokopio University of Athens 26/29

    Hadoop Map-Reduce Framework

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    32/34

    . . . . . .

    p pReducer Phases

    Phase 3: Reduce

    The reduce(K2, Iterator, OutputCollector, Reporter)method is called for each pair in thegrouped inputs.

    The output of the reduce task is typically written to theFileSystem via OutputCollector.collect(K3, V3).

    Dimitrios Michail Harokopio University of Athens 27/29

    Reporter

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.html
  • 7/30/2019 Hadoop API

    33/34

    . . . . . .

    Reporter

    A facility for Map-Reduce applications to report progress andupdate counters, status information etc.

    Is Alive?

    Mapper and Reducer can use the Reporter provided to reportprogress or just indicate that they are alive.

    In scenarios where the application takes an insignificant amount oftime to process individual key/value pairs, this is crucial since the

    framework might assume that the task has timed-out and kill thattask.

    Dimitrios Michail Harokopio University of Athens 28/29

    Material

    http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reporter.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html
  • 7/30/2019 Hadoop API

    34/34

    . . . . . .

    Material

    http://hadoop.apache.org

    http://hadoop.apache.org/docs/r1.0.4

    http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html

    Dimitrios Michail Harokopio University of Athens 29/29

    http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.htmlhttp://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.htmlhttp://hadoop.apache.org/docs/r1.0.4http://hadoop.apache.org/