sistemas distribuidos - hadoop
DESCRIPTION
TRANSCRIPT
![Page 1: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/1.jpg)
HADOOP
Aryel FernandesRenan Augusto de Miranda
Prof. Dr. Arlindo Flávio da Conceição
![Page 2: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/2.jpg)
O que é?High-availability distributed object-oriented platform
• Framework dedicado a computação:• Distribuída • Escalável• Tolerante à falhas
![Page 3: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/3.jpg)
O que é?
•Projeto open-source da Apache
•Criado em 2005
•Originalmente parte do Nutch
•Possui API para várias linguagens
![Page 4: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/4.jpg)
HDFS
• Hadoop Distributed File System
• Inspirado no Google File System
• Distribui e replica arquivos entre várias máquinas
• Flexível
![Page 5: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/5.jpg)
HDFS
Componentes
• NameNode
• DataNode
![Page 6: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/6.jpg)
HDFS
NameNode
• Índice de arquivos
• Não guarda os dados em si
• Ponto único de falha
![Page 7: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/7.jpg)
HDFS
DataNode
• Responsável por manter os dados
• Divide arquivos em vários blocos
• Conversam entre si para replicar dados
![Page 8: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/8.jpg)
HDFS
![Page 9: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/9.jpg)
MapReduce
• Aplicado ao Hadoop
• Mappers e Reducers
• Inputs geralmente são arquivos provenientes do HDFS
• Trabalho é feito em cima de pares (chave, valor)
![Page 10: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/10.jpg)
MapReduce
• Mapper
![Page 11: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/11.jpg)
MapReduce
• Reducer
![Page 12: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/12.jpg)
MapReduce• JobTracker
• Responsável por receber requests de tarefas de MapReduce
• FIFO por padrão
• Mantém checkpoints
• Execução especulativa
![Page 13: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/13.jpg)
MapReduce• TaskTracker
• Aceita operações de map ou reduce
• Trabalhos são processado em sandbox
• Possui um número limitado de “slots”
![Page 14: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/14.jpg)
MapReduceShuffle
![Page 15: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/15.jpg)
MapReduceOverview
![Page 16: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/16.jpg)
MapReduce
Limitações
• Scheduling baseado em slots
• Gargalo em mappers
• Startup de JVMs
![Page 17: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/17.jpg)
Arquitetura Comum
• NameNodes isolados:
• Primário• Secundário• Backup
• JobTracker isolado
![Page 18: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/18.jpg)
Arquitetura Comum
Cada nó composto por:
• DataNodes
• TaskTracker
![Page 19: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/19.jpg)
Exemplos de aplicações
•Logging
• Análise de marketing
• Aprendizado de máquina
• Processamento de Imagem
• Web crawling
![Page 20: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/20.jpg)
Hive
• Sumarização, pesquisas e análise
• Provê a HiveQL
![Page 21: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/21.jpg)
HBase
• Banco de dados distribuído
• Inspirado no BigTable
![Page 22: Sistemas distribuidos - Hadoop](https://reader035.vdocuments.pub/reader035/viewer/2022081414/548311935906b599158b4664/html5/thumbnails/22.jpg)
Referências bibliográficas• http://hadoop.apache.org/
• http://en.wikipedia.org/wiki/Apache_Hadoop
• http://www.cloudera.com/content/cloudera/en/why-cloudera/hadoop-and-big-data.html
• http://wiki.apache.org/hadoop/
• http://developer.yahoo.com/hadoop/tutorial
• http://hive.apache.org/
• http://hbase.apache.org/