mi primer map reduce

15
Mi Primer Map/Reduce Rubén Orta @agileando

Upload: betabeers

Post on 19-May-2015

263 views

Category:

Technology


2 download

DESCRIPTION

Charla sobre big data y map reduce por Rubén Orta.

TRANSCRIPT

Page 1: Mi primer map reduce

Mi Primer Map/Reduce

Rubén Orta @agileando

Page 2: Mi primer map reduce

1

2

3

4

historia

implementación

netflix prize en python

enlaces

Page 3: Mi primer map reduce

Big Data = Contar

1

Page 4: Mi primer map reduce

CONTAR

1

Page 5: Mi primer map reduce

JeffDean

SanjayGhemawat

1

Page 6: Mi primer map reduce

map (key , value)new_value = a_function(value)return new_key, new_value

reduce (key, value)new_value = another_function(value)return key, new_value

2

Page 7: Mi primer map reduce

f() f() f() f() f()

f’() f’() f’() f’() f’()

Dataset:Millones de páginas web

Mapfor each word in document: return (word, 1);

Reducetotal = 0for each item in value: total++return (key, total);

2

Page 8: Mi primer map reduce

2

Page 9: Mi primer map reduce

3

Page 10: Mi primer map reduce

import mincemeat

data = dict((f, read_data(f)) for f in data_files)

s = mincemeat.Server()s.datasource = datas.mapfn = mapfns.reducefn = reducefn

results = s.run_server (password = "ruben")

3

Page 11: Mi primer map reduce

def mapfn(key, value): lines = value.splitlines() film_id = lines[0][:-1] for line in lines[1:]: items = line.split(",") user_id = items[0] rating = items[1] date = items[2] yield user_id, film_id

3

Page 12: Mi primer map reduce

def reducefn(key, values):

number_of_films = 0 for value in values: number_of_films += 1 return number_of_films

3

Page 13: Mi primer map reduce

Papers

GFS http://research.google.com/archive/gfs.htmlMapReduce http://research.google.com/archive/mapreduce.htmlBigTable http://research.google.com/archive/bigtable.html

Dynamo http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

Dremel http://research.google.com/pubs/pub36632.htmlSpanner http://research.google.com/archive/spanner.html

PythonMinceMeat.py https://github.com/michaelfairley/mincemeatpyOcto.py http://code.google.com/p/octopy/Netflix DataSet http://www.lifecrunch.biz/archives/207

4

Page 14: Mi primer map reduce

Rubén Orta

http://www.slideshare.net/agileando/mi-primer-map-reduce

Blog http://devspoke.com/Twitter https://twitter.com/agileandoGitHub https://github.com/rubenorta

4

Page 15: Mi primer map reduce

BUSCAMOS GENTE PARA NUESTRO

EQUIPO¿Quieres unirte?

*unix, scripting (python, perl)devops