サンプルから見るmap reduceコード

22
サンプルからみる MapReduce @shot6

Upload: shinpei-ohtani

Post on 20-Jan-2015

2.250 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

  • 1. MapReduce @shot6

2. ClouderaAvro Sqoop Desktop PigHiveHBase ChukwaMap ZooHDFS ReduceKeeper Core 3. ClouderaAvro Sqoop Desktop PigHiveHBase ChukwaMap ZooHDFS ReduceKeeper Core 4. MapReduce Mapper/Reducer 5. MapReduce WordCount Mapper/Reducer Job InputFormat/OutputFormat HDFS(FileSystem)Writable 6. WordCount HadoopHello World API(org.apache.hadoop.mapreduce) API 7. Grep grep grepJob/sortJob 2 JobConf/Mapper/Reducer Mapper RegexMapper SequenceFileFormat sortJob 8. Grep - JobConf Mapper Reducer 9. o.a.hadoop.mapred.JobConfmapred-default.xml conf/mapred-site.xmlXMLDOM JobConf child = new JobConf( Conf, jar); 10. mapred-site.xmlmapred.job.trackeryour-site:9001 11. o.a.hadoop.mapred.Mapper Mapper InputSplitMapper MapTask/MapRunner map(KEY, VALUE, COLLECTOR,REPORTER)KEY:MapVALUE:MapCOLLECTOR:REPORTER: API MapReduceBase 12. o.a.hadoop.mapred.MapTask Map initiazlize(Task Reducer) (o.a.h.mapred.TaskStatus.State) RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN OutputCommiter Task mapred.work.output.dir 13. o.a.h.mapred.MapTask cont runrunOldMapper JobClientInputSplit RecordReader 14. o.a.h.mapred.MapTask cont2 Reduce spill (*) $mapred.local.dir/taskTracker/jobcache/$ {taskid}/output/spill${spillNumber}.out Reducer Combinermin.num.spills.for.combine combiner RecordWriter MapRunner 15. o.a.h.mapred.MapRunner MapRunnable mapred.map.runner.class Hadoop PipeMapRunner Map MultiThreadedMapRunner 16. o.a.h.mapred.MapRunner cont run(RecordReader, OutputCollector,Reporter)RecordReader: InputFormat SplitReader(InputFormat/RecordReader) RecordReader 17. MapTask MapRunner MapperRecordOutputReader Collector Input Split Spill & run createKey()SpillThreadcreateValue() next(key, value) EOF Map(key, value,SpilloutputCollector, reporter) 18. m(_ _)m 19. MapperJobConfMapper/MapRunner/MapTask Reducer Reducer Reducer InputFormat/RecordReader 20. o.a.h.mapred.Reducer Reducer InputSplitMapper ReduceTask/ReduceRunner reduce(KEY, Iterator,COLLECTOR, REPORTER)KEY:Iterator:COLLECTOR:REPORTER: API MapReduceBase 21. o.a.h.mapred.ReduceTask SHUFFLE ReduceTask.ReduceCopier fetchOutputs(Merger.MergeQueue) Mapx mapred.reduce.parallel.copies MapOutputCopier Map LocalFSMerger InMemFSMergeThread GetMapEventsThreadMap< , MapOutputLocation(taskId, host, httpUrl)> TaskTracker 22. o.a.h.mapred.ReduceTask run(RecordReader, OutputCollector,Reporter) SORT Memory, disk RowKeyValueItetator Reducer RecordWriter ReduceValuesIterator