Download - Processamento em Big Data
![Page 1: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/1.jpg)
Processamento em Big Data
Luiz Henrique Zambom Santana
24/08/2016
![Page 2: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/2.jpg)
Agenda● Introdução● Map/Reduce e Hadoop● Arquitetura Lambda● Filas com Apache Kafka● Memória compartilhada com Apache Ignite● Streaming com Apache Spark● Exercícios● Conclusões
![Page 3: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/3.jpg)
Motivação
![Page 4: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/4.jpg)
Não custa lembrar...
![Page 5: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/5.jpg)
No início era o map/reduce...
![Page 6: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/6.jpg)
...e o Apache Hadoop...
![Page 7: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/7.jpg)
Latência vs. Vazão
![Page 8: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/8.jpg)
O que é mais importante?
![Page 9: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/9.jpg)
Arquitetura Lambda (Nathan Marz)
![Page 10: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/10.jpg)
Arquitetura Kappa (Jay Kreps)
![Page 11: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/11.jpg)
Arquitetura do Exercício
Twitter Spark streamming
1
Master Database(Cassandra)
Kafka queue2
2
Ignite SQL3
![Page 12: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/12.jpg)
![Page 13: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/13.jpg)
![Page 14: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/14.jpg)
https://github.com/lhzsantana/neoway-processing
![Page 15: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/15.jpg)
Apache Kafka● Criado pelo LinkedIn no começo de 2011● Liderado atualmente pela http://www.confluent.io/● http://www.slideshare.net/GuozhangWang/apache-kafka-at-linkedin-43307044
![Page 16: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/16.jpg)
Apache Kafka
![Page 17: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/17.jpg)
Apache Kafka
![Page 18: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/18.jpg)
Apache Kafka1. Download Kafka2. Start zookeeper
a. zookeeper-server-start.bat zookeeper.properties
3. Start kafkaa. kafka-server-start.bat server.properties
![Page 19: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/19.jpg)
Apache Kafka● Exercício
○ Usando o exemplo, criar um código para enfilerar objetos produto no tópico MeusProdutos e objetos vendas no tópico MinhasVendas.
● Desafio○ Ler dados do Cassandra, enfilerar e enviar para o MongoDB.
![Page 20: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/20.jpg)
Apache Ignite● In-Memory Data Fabric● Compete/Coopera em várias frentes
![Page 21: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/21.jpg)
DataGrid● JCache (JSR 107)● Vantagens:
○ Consistency○ Distributed In-Memory Caching○ Lightning Fast Performance○ Elastic Scalability
○ Distributed In-Memory Transactions
○ Web Session Clustering○ Hibernate L2 Cache Integration○ Tiered Off-Heap Storage
○ Distributed ANSI-99 SQL Queries with support for Joins
![Page 22: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/22.jpg)
Apache Ignite com Spark
![Page 23: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/23.jpg)
Apache Ignite como Streaming
![Page 24: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/24.jpg)
IgniteSQL● Possui duas formas de
operação transactional e atomic
● Implementa SQL Ansi-99
![Page 25: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/25.jpg)
De certa forma relacionado ao NewSQL...
![Page 26: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/26.jpg)
SQL and NoSQL will merge “Not yet SQL”Michael Stonebraker, 2015https://www.youtube.com/watch?v=KRcecxdGxvQ
![Page 27: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/27.jpg)
Apache Ignite● Exercício
○ Usando o exemplo, criar um código enviar o Produto e a Venda para o Ignite. Acessar os produtos com preço maior que 20 e as vendas com valor maior que 100.
○ Usar uma transação para atualizar o valor de produto em 10%.■ http://apacheignite.gridgain.org/v1.0/docs/transactions
● Desafio○ Acessar todas as vendas de um produto usando um Join:
http://apacheignite.gridgain.org/docs/sql-queries#sql-joins
![Page 28: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/28.jpg)
Apache Spark● Iniciado no AMPLab● Tese do Dr. Matei Zaharia
“An Architecture for Fast and General Data Processing on Large Clusters”
![Page 29: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/29.jpg)
Abstração de memória
![Page 30: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/30.jpg)
Data Frames
![Page 31: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/31.jpg)
Abstração de memória
![Page 32: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/32.jpg)
Spark Streaming
![Page 33: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/33.jpg)
Spark Streaming e o Spark Engine
![Page 34: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/34.jpg)
Discretized Stream (DStream)
![Page 35: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/35.jpg)
Transformações sobre DStreams
![Page 36: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/36.jpg)
Apache Spark1. Download Spark2. Windows:
a. spark-class.cmd org.apache.spark.deploy.master.Master b. spark-class.cmd org.apache.spark.deploy.worker.Worker spark://192.168.99.1:7077
3. Linuxa. ./sbin/start-master.shb. ./sbin/start-slave.sh spark://192.168.99.1:7077
4. Colocar o Jar do projeto no Target5. Possível erro: org.apache.spark.rpc.netty.RequestMessage; local class
incompatible: stream classdesc
![Page 37: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/37.jpg)
Apache Spark
![Page 38: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/38.jpg)
Spark Context (cliente)
![Page 39: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/39.jpg)
Apache Spark● Exercícios
○ Usando como base o código de streaming do Twitter, salvar as informações no Cassandra.
● Desafio○ Usar o conector para Kafka para receber os dados de Produtos e Vendas do exercício
anterior via s]treaming.
![Page 40: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/40.jpg)
Conclusões● Existem muitas (muitas!) opções para processamento de dados em Big Data● Sugestões:
○ Apache Parquet○ Apache Mesos○ MLLib
![Page 41: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/41.jpg)
Outros frameworks de Big Data● Streaming
○ Storm
● Filas○ Flume
● Multi-NoSQL○ Apache Drill
● Gerenciador de Hadoop○ Apache Falcon○ Apache Flink○ Apache Apex
● Exploração e Visualização○ Apache Zepellin
![Page 42: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/42.jpg)
Referências● Hadoop
○ http://www.cloudera.com/developers/get-started-with-hadoop-tutorial.html
● Spark○ https://www.mapr.com/blog/spark-streaming-and-twitter-sentiment-analysis
○ https://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/collect.html
● Arquitetura Lambda○ http://www.devmedia.com.br/conheca-a-arquitetura-lambda-em-java/32646
![Page 43: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/43.jpg)
Referências● Mesos
○ https://abhishek-tiwari.com/post/building-distributed-systems-with-mesos
● Kafka○ https://www.mapr.com/blog/getting-started-sample-programs-apache-kafka-09
![Page 44: Processamento em Big Data](https://reader033.vdocuments.pub/reader033/viewer/2022052606/58a1605a1a28abc1708b4e79/html5/thumbnails/44.jpg)
Referências● Parquet
○ http://www.infoworld.com/article/2915565/big-data/apache-parquet-paves-the-way-towards-better-hadoop-data-storage.html