hadoop api
TRANSCRIPT
-
7/30/2019 Hadoop API
1/34
Large Scale Data Management Systems
Hadoop API
Dimitrios Michail
Dept. of Informatics and TelematicsHarokopio University of Athens
2012-2013
. . . . . .
-
7/30/2019 Hadoop API
2/34
. . . . . .
Hadoop
Input/Output(input) < k1, v1 >
map
< k2, v2 >
combine
< k2, v2 >
reduce
< k3, v3 > (output)
Dimitrios Michail Harokopio University of Athens 2/29
-
7/30/2019 Hadoop API
3/34
. . . . . .
Hadoop
Serializable
The keyand value classes need to be serializable by the framework.
they need to implement the Writable interface
Dimitrios Michail Harokopio University of Athens 3/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/DoubleWritable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/WritableComparable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/Writable.html -
7/30/2019 Hadoop API
4/34
. . . . . .
Hadoop
Serializable
The keyand value classes need to be serializable by the framework.
they need to implement the Writable interface
Comparable
To facilitate sorting by the framework, the keyclass also needs to
be comparable.
needs to also implement the WritableComparable interface.
Dimitrios Michail Harokopio University of Athens 3/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/DoubleWritable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/WritableComparable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/Writable.html -
7/30/2019 Hadoop API
5/34
. . . . . .
Hadoop
Serializable
The keyand value classes need to be serializable by the framework.
they need to implement the Writable interface
Comparable
To facilitate sorting by the framework, the keyclass also needs to
be comparable.
needs to also implement the WritableComparable interface.
There are implementations for all basic types, e.g. DoubleWritable.
Dimitrios Michail Harokopio University of Athens 3/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/DoubleWritable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/WritableComparable.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/Writable.html -
7/30/2019 Hadoop API
6/34
. . . . . .
WordCount Example
1 p u b l i c c l a s s WordCount {2
3 p ub li c s t a t i c c l a s s Map e xt e n d s MapReduceBase implementsMapper {
4
5 p r i v a t e s t a t i c I n t W r i t a b l e one = new I n t W r i t a b l e ( 1 ) ;6 p r i v a t e
Text word =new
Text () ;78 p u b l ic v oi d map (L o n g W rita ble k ey , Tex t v a lu e ,
O u t p u t C o l l e ct o r o ut pu t , R e p o r t e rr e p o r t e r ) throws I O E x ce p t i on {
9 S t r i n g l i n e = v a l u e . t o S t r i n g ( ) ;10 S t r in g To k en i ze r t o k e n i z e r = new S t r i n g T o k e n i z e r ( l i n e ) ;
11 w h i l e ( to k e n i ze r . h as Mo reTo kens () ) {12 word . se t ( to ke ni ze r . nextToken( ) ) ;13 o u tp ut . co l l e c t (word , o ne) ;14 }15 }16 }
Dimitrios Michail Harokopio University of Athens 4/29
-
7/30/2019 Hadoop API
7/34
. . . . . .
WordCount Example
18
19 p ub li c s t a t i c c l a s s Reduce e xt e n d s MapReduceBaseimplements Reducer {
20
21 p u b l ic v oi d r e du c e ( Te xt key , I t e r a t o r v a l u e s , O u t p u t C o l l e c t or o ut pu t ,R e po r te r r e p o r t e r ) throws I O E x ce p t i on {
22 i n t sum = 0;23 w h i l e ( v a l u es . h a sN ext () ) {24 sum += va l ue s . nex t () . ge t () ;25 }26 o u tp ut . co l l e c t ( k ey , new In t W ri ta b l e (sum) ) ;27 }28 }
Dimitrios Michail Harokopio University of Athens 5/29
-
7/30/2019 Hadoop API
8/34
. . . . . .
WordCount Example
30 p u b l i c s t a t i c v oi d main ( S t r i n g [ ] a r g s ) throws E x c ep t i on {31 J ob Co nf co n f = new JobConf(WordCount. c l a s s ) ;32 co nf . setJobName (wordcount) ;33
34 conf . setOutputKe yClass ( Text . c l a s s ) ;35 c o n f . s e t O u t pu t V a l u e C l a s s ( I n t W r i t a b l e . c l a s s ) ;36
37 conf . setMap perC lass (Map. c l a s s ) ;38 co n f . s etC o mb in erC la s s (Redu ce . c l a s s ) ;39 co n f . s etRed u cer C la s s (Redu ce . c l a s s ) ;40
41 conf . setInp utFor mat ( TextInputFormat . c l a s s ) ;42 co nf . setOutpu tFormat ( TextOutputFormat . c l a s s ) ;43
44 FileI n p u tF o rma t . s etI n p u t P a th s ( co n f , new P ath ( a rg s [ 0 ] ) ) ;45 FileOutputFormat . setOutputPath ( conf , new Pa th ( a rg s [ 1 ] ) ) ;46
47 J o b C lie n t . ru nJ o b ( co n f ) ;48 }49
50 } // e nd o f c l a s s WordCount
Dimitrios Michail Harokopio University of Athens 6/29
-
7/30/2019 Hadoop API
9/34
. . . . . .
Input
There are three major interfaces for providing input to amap-reduce job
InputFormat
InputSplit
RecordReader
Dimitrios Michail Harokopio University of Athens 7/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordReader.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.html -
7/30/2019 Hadoop API
10/34
. . . . . .
InputFormat Interface
InputFormat describes the input-specification for aMap-Reduce job.
The Map-Reduce framework relies on the InputFormat of the job
to:1 Validate the input-specification of the job.
2 Split-up the input file(s) into logical InputSplit(s), each ofwhich is then assigned to an individual Mapper.
3 Provide the RecordReader implementation to be used toproduce input records from the logical InputSplit forprocessing by the Mapper.
Dimitrios Michail Harokopio University of Athens 8/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordReader.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.html -
7/30/2019 Hadoop API
11/34
. . . . . .
InputSplit Interface
InputSplit represents the data to be processed by an individualMapper.
presents a byte-oriented view on the input, and
it is the responsibility of the RecordReader of the job toprocess this and present a record-oriented view.
Dimitrios Michail Harokopio University of Athens 9/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordReader.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.html -
7/30/2019 Hadoop API
12/34
. . . . . .
RecordReader Interface
RecordReader reads pairs from an InputSplit.
converts the byte-oriented view of the input, provided by the
InputSplit, and
presents a record-oriented view for the Mapper & Reducertasks for processing.
It thus assumes the responsibility of processing record boundariesand presenting the tasks with keys and values.
Dimitrios Michail Harokopio University of Athens 10/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordReader.html -
7/30/2019 Hadoop API
13/34
. . . . . .
Default InputFormat
The default InputFormat is the TextInputFormat.
TextInputFormat
Implements TextInputFormat
InputFormat for plain text files,
files are broken into lines. Either linefeed or carriage-returnare used to signal end of line.
keys are the position in the file, and values are the line of text.
Dimitrios Michail Harokopio University of Athens 11/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/TextInputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/TextInputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.html -
7/30/2019 Hadoop API
14/34
. . . . . .
Output
The major interfaces for outputing a map-reduce job are
OutputCollectorOutputFormat
RecordWriter
FileSystem
Dimitrios Michail Harokopio University of Athens 12/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordWriter.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputCollector.html -
7/30/2019 Hadoop API
15/34
. . . . . .
OutputCollector Interface
Collects the pairs output by Mappers and Reducers.
OutputCollectoris the generalization of the facility providedby the Map-Reduce framework to collect data output by either theMapper or the Reducer i.e. intermediate outputs or the output ofthe job.
Dimitrios Michail Harokopio University of Athens 13/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputCollector.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
16/34
. . . . . .
OutputFormat Interface
OutputFormat describes the output-specification for aMap-Reduce job.
The Map-Reduce framework relies on the OutputFormatof
the job to:
1 Validate the output-specification of the job. For e.g. checkthat the output directory doesnt already exist.
2 Provide the RecordWriter implementation to be used to write
out the output files of the job. Output files are stored in aFileSystem.
Dimitrios Michail Harokopio University of Athens 14/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordWriter.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.html -
7/30/2019 Hadoop API
17/34
. . . . . .
RecordWriter Interface
RecordWriter writes the output pairs to an output
file.
RecordWriter implementations write the job outputs to theFileSystem.
Dimitrios Michail Harokopio University of Athens 15/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordWriter.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/RecordWriter.html -
7/30/2019 Hadoop API
18/34
. . . . . .
FileSystem
An abstract base class for a fairly generic filesystem.
Implemented either as:
local
reflects the locally-connected disk
Distributed File System
the Hadoop DFS which is a multi-machine system thatappears as a single disk
fault tolerance and potentially very large capacity
Dimitrios Michail Harokopio University of Athens 16/29
-
7/30/2019 Hadoop API
19/34
. . . . . .
Default OutputFormat
The default OutputFormat is the TextOutputFormat.
TextOutputFormat
OutputFormat that writes plain text files,
Dimitrios Michail Harokopio University of Athens 17/29
J bC f
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/TextOutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.html -
7/30/2019 Hadoop API
20/34
. . . . . .
JobConfmap/reduce job configuration
JobConf is the primary interface for a user to describe amap-reduce job to the Hadoop framework for execution.
JobConf typically specifies the
Mapper,combiner (if any),
Partitioner,
Reducer,
InputFormatand
OutputFormat
implementations to be used etc.
Dimitrios Michail Harokopio University of Athens 18/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/OutputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Partitioner.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/JobConf.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/JobConf.html -
7/30/2019 Hadoop API
21/34
. . . . . .
Mapper Interface
Mapper maps input key/value pairs to a set of intermediatekey/value pairs.
Maps are the individual tasks which transform input records
into intermediate records.The user implements the following interface:
p u bl i c i n t e r f a c e Mapper e xt e n d sJ o bC o nf i gu r ab l e , C l o s e a b le {
v o i d map(K1 key , V1 val ue , Out put Col lec tor o ut pu t , R e p or t e r r e p o r t e r ) ;
}
Dimitrios Michail Harokopio University of Athens 19/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
22/34
. . . . . .
Hadoop Map-Reduce Framework
Framework
spawns one map task for each InputSplit generated by theInputFormat for the job.
calls map(K1, V1, OutputCollector, Reporter) for eachkey/value pair in the InputSplit for that task.
Dimitrios Michail Harokopio University of Athens 20/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputFormat.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/InputSplit.html -
7/30/2019 Hadoop API
23/34
. . . . . .
Hadoop Map-Reduce Framework
Grouping
All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to aReducer to determine the final output.
Users can control the grouping by specifying a Comparator viaJobConf.setOutputKeyComparatorClass(Class).
Dimitrios Michail Harokopio University of Athens 21/29
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
24/34
. . . . . .
Hadoop Map-Reduce Framework
Partitioning
The grouped Mapper outputs are partitioned per Reducer.
Users can control which keys (and hence records) go to which
Reducer by implementing a custom Partitioner.
Dimitrios Michail Harokopio University of Athens 22/29
H d M R d F k
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Partitioner.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
25/34
. . . . . .
Hadoop Map-Reduce Framework
Partitioning
The grouped Mapper outputs are partitioned per Reducer.
Users can control which keys (and hence records) go to which
Reducer by implementing a custom Partitioner.
A partitioner is represented by the following interface:
i n t e r f a c e P a rt it io n e r e xt e n d s J o b C o nf i g u r a b le {
i n t g e t P a r t i t i o n (K2 key , V2 v a lu e , i n t n u m P a r t i t i o n s ) ;
}
Dimitrios Michail Harokopio University of Athens 22/29
H d M R d F k
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Partitioner.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
26/34
. . . . . .
Hadoop Map-Reduce Framework
Combiners
Users can optionally specify a combiner, via
JobConf.setCombinerClass(Class).Combiners perform local aggregation of the intermediateoutputs.
Helps to cut down the amount of data transferred from the
Mapper to the Reducer.
Dimitrios Michail Harokopio University of Athens 23/29
H d M R d F k
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
27/34
. . . . . .
Hadoop Map-Reduce Framework
The intermediate, grouped outputs are stored in SequenceFileswhich are flat files consisting of binary key/value pairs.
There are three SequenceFile Writers:
1 Writer : Uncompressed records.
2 RecordCompressWriter : Record-compressed files, onlycompress values.
3 BlockCompressWriter : Block-compressed files, both keys &
values are collected in blocks separately and compressed.The size of the block is configurable.
Dimitrios Michail Harokopio University of Athens 24/29
H d M R d F k
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.html -
7/30/2019 Hadoop API
28/34
. . . . . .
Hadoop Map-Reduce Framework
The intermediate, grouped outputs are stored in SequenceFileswhich are flat files consisting of binary key/value pairs.
The actual compression algorithm used to compress key and/orvalues can be specified by using the appropriate CompressionCodec.
There are also Readers and Sorters for each SequenceFile type.
Dimitrios Michail Harokopio University of Athens 24/29
R d I t f
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/compress/CompressionCodec.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.html -
7/30/2019 Hadoop API
29/34
. . . . . .
Reducer Interface
Reduce a set of intermediate values which share a key to a smallerset of values.
The user implements the following interface:
p u bl i c i n t e r f a c e Reducer e xt e n d sJ o bC on f ig ur a bl e , C l o s e a bl e {
v o i d r e d u ce ( K2 k ey , I t e r a t o r v a l u e s , O u t p u t C o l l e c t or o u tp u t , Rep o rter re p o rt er ) ;
}
The number of Reducers for the job is set by the user viaJobConf.setNumReduceTasks(int).
Dimitrios Michail Harokopio University of Athens 25/29
Hadoop Map-Reduce Framework
-
7/30/2019 Hadoop API
30/34
. . . . . .
Hadoop Map Reduce FrameworkReducer Phases
Phase 1: Shuffle
The framework, for each reducer, fetches the relevantpartition of the output of all the Mappers, via HTTP.
Dimitrios Michail Harokopio University of Athens 26/29
Hadoop Map-Reduce Framework
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
31/34
. . . . . .
Hadoop Map Reduce FrameworkReducer Phases
Phase 1: Shuffle
The framework, for each reducer, fetches the relevantpartition of the output of all the Mappers, via HTTP.
Phase 2: Sort
The framework groups Reducer inputs by keys (since differentMappers may have output the same key).
The shuffle and sort phases occur simultaneously i.e. while outputsare being fetched they are merged.
Dimitrios Michail Harokopio University of Athens 26/29
Hadoop Map-Reduce Framework
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
32/34
. . . . . .
p pReducer Phases
Phase 3: Reduce
The reduce(K2, Iterator, OutputCollector, Reporter)method is called for each pair in thegrouped inputs.
The output of the reduce task is typically written to theFileSystem via OutputCollector.collect(K3, V3).
Dimitrios Michail Harokopio University of Athens 27/29
Reporter
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.html -
7/30/2019 Hadoop API
33/34
. . . . . .
Reporter
A facility for Map-Reduce applications to report progress andupdate counters, status information etc.
Is Alive?
Mapper and Reducer can use the Reporter provided to reportprogress or just indicate that they are alive.
In scenarios where the application takes an insignificant amount oftime to process individual key/value pairs, this is crucial since the
framework might assume that the task has timed-out and kill thattask.
Dimitrios Michail Harokopio University of Athens 28/29
Material
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reporter.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.htmlhttp://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/Reducer.html -
7/30/2019 Hadoop API
34/34
. . . . . .
Material
http://hadoop.apache.org
http://hadoop.apache.org/docs/r1.0.4
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html
Dimitrios Michail Harokopio University of Athens 29/29
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.htmlhttp://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.htmlhttp://hadoop.apache.org/docs/r1.0.4http://hadoop.apache.org/