mi primer map reduce
DESCRIPTION
Charla sobre big data y map reduce por Rubén Orta.TRANSCRIPT
Mi Primer Map/Reduce
Rubén Orta @agileando
1
2
3
4
historia
implementación
netflix prize en python
enlaces
Big Data = Contar
1
CONTAR
1
JeffDean
SanjayGhemawat
1
map (key , value)new_value = a_function(value)return new_key, new_value
reduce (key, value)new_value = another_function(value)return key, new_value
2
f() f() f() f() f()
f’() f’() f’() f’() f’()
Dataset:Millones de páginas web
Mapfor each word in document: return (word, 1);
Reducetotal = 0for each item in value: total++return (key, total);
2
2
3
import mincemeat
data = dict((f, read_data(f)) for f in data_files)
s = mincemeat.Server()s.datasource = datas.mapfn = mapfns.reducefn = reducefn
results = s.run_server (password = "ruben")
3
def mapfn(key, value): lines = value.splitlines() film_id = lines[0][:-1] for line in lines[1:]: items = line.split(",") user_id = items[0] rating = items[1] date = items[2] yield user_id, film_id
3
def reducefn(key, values):
number_of_films = 0 for value in values: number_of_films += 1 return number_of_films
3
Papers
GFS http://research.google.com/archive/gfs.htmlMapReduce http://research.google.com/archive/mapreduce.htmlBigTable http://research.google.com/archive/bigtable.html
Dynamo http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
Dremel http://research.google.com/pubs/pub36632.htmlSpanner http://research.google.com/archive/spanner.html
PythonMinceMeat.py https://github.com/michaelfairley/mincemeatpyOcto.py http://code.google.com/p/octopy/Netflix DataSet http://www.lifecrunch.biz/archives/207
4
Rubén Orta
http://www.slideshare.net/agileando/mi-primer-map-reduce
Blog http://devspoke.com/Twitter https://twitter.com/agileandoGitHub https://github.com/rubenorta
4
BUSCAMOS GENTE PARA NUESTRO
EQUIPO¿Quieres unirte?
*unix, scripting (python, perl)devops