hadoop for programmer
DESCRIPTION
Introduction to Hadoop for programmer.TRANSCRIPT
Hadoop
2010/09/15id:shiumachi
Agenda
BackGround
Hadoop
MapReduce
MapReduce
Hadoop
Hadoop
MapReduce
MapReduce
Hadoop
Hadoop
BackGround
Big Data
Big Data(TB)
Big Data [1]
Big Data etc...
Big Data
Google200820PBMapReduce [1]
eBay10PB [1]
FaceBook201015PB [2]
Hadoop
Hadoop
Google MapReduce
HDFS, MapReduce 2 Google (GFS Google MapReduce)
Big Data
Hadoop
WebRails CakePHP
Android
Hadoop
MapReduce
MapReduce
MapMapReduce
mapreduce
Hadoop lambda
python: def norm(V):return reduce( lambda x,y: x+y, map( lambda x: x**2, V ) ) ** 0.5
Map (=Reduce)
Map Shuffle Reduce
Shuffle MapReduce
Shuffle
hello, 1world, 1hello, 1hadoop, 1I, 1like, 1hadoop, 1programming, 1I, 1like, 1programming, 1world, 1hadoop, 1hadoop, 1hello, 1hello, 1hadoop, 1hadoop, 1hello, 1hello, 1I, 1like, 1programming, 1world, 1
(abc)
Hadoop
URL
etc...
Hadoop
uniq, sort, grep, sed, awk, ...
Hadoop
Hadoop
Hadoop
GBTBPBHadoop
1001
Hadoop
MapReduce
Hadoop
HadoopJavaJavaMapReduce
MapReduce Streaming
Pig Hive Hadoop
Hadoop Streaming Pig Hive
Map
import sys
for line in sys.stdin:words = line.rstrip().split()for w in words:print %s\t%d % (w,1)
python
Map()
import sysd = {}for line in sys.stdin:words = line.rstrip().split()for w in words:d[w] = d.get(w,0) + 1
for word,count in d.iteritems():print %s\t%d % (word,count)
Reduce
import sysd = {}for line in sys.stdin:word,count = line.rstrip().split('\t')d[word] = d.get(word,0) + int(count)
for word,count in d.iteritems():print %s\t%d % (word,count)
MapReduce
128MB128MB
MapReduce(Shuffle)Reduce1()
MapReduce
IO
Map
Map/Reduce
Hadoop
Hadoop
DAGDirect Acyclic Graph
DAG
DAG
...
Hadoop
100
Amazon EC2MapReduce( EC2 )
BFS()
MapReduceData-Intensive Text Processing with MapReduce[4]
Hadoop [3] Web
Conclusion
Big DataTB
OS
Hadoop
Data-Intensive Text Processing with MapReduce, http://www.umiacs.umd.edu/~jimmylin/book.html
Facebook has the world's largest Hadoop cluster!, http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html
Hadoop, Tom White, , 2009
Data-Intensive Text Processing with MapReduce, Jimmy Lin, Chris Dier, 2010, http://www.umiacs.umd.edu/~jimmylin/book.html
Thank you !