hd insight essentials quick view
DESCRIPTION
These slides provide highlights of my book HDInsight Essentials. Book link is here: http://www.packtpub.com/establish-a-big-data-solution-using-hdinsight/bookTRANSCRIPT
![Page 1: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/1.jpg)
HDInsight Essentials ISBN : 1849695369 / ISBN 13 : 9781849695367
Rajesh Nadipalli 05/01/2014
![Page 2: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/2.jpg)
Goals of this Book • Focus on Microso'’s new Hadoop distribu=on • Serve as Quick Reference • Provide an Overview of Hadoop • Address both cloud and on-‐premise setup for HDInsight • Highlight HDInsight differen:ator • Provide Prac=cal & Real world examples
![Page 3: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/3.jpg)
Book Table of Contents • Chapter 1: HDInsight in a Heartbeat • Chapter 2: Deployment HDInsight on premise • Chapter 3: HDInsight Azure cloud service • Chapter 4: Administer your cluster • Chapter 5: Ingest data to your cluster • Chapter 6: Transform data in your cluster • Chapter 7: Analyze & Report data from cluster • Chapter 8: Project Planning & Architectural Considera=ons
![Page 4: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/4.jpg)
CHAPTER 1 HIGHLIGHTS: HDINSIGHT IN A HEARTBEAT
![Page 5: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/5.jpg)
Big Data Problem Characteristics
![Page 6: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/6.jpg)
Hadoop Overview
Self Healing Distributed Storage
Fault Tolerant Distributed Computing
+ Abstraction for
Parallel Processing
CORE HADOOP COMPONENTS • HDFS: Distributed Storage – replicated, self-‐healing and scalable
• MapReduce: Parallel Processing, process local data for efficiency
![Page 7: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/7.jpg)
NameNode
JobTracker TaskTracker
TaskTracker
TaskTracker
MapReduce Layer
Distributed File System
Layer Secondary NameNode
Master Node Slaves Nodes
DataNode
DataNode
DataNode
Hadoop Nodes Layout
![Page 8: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/8.jpg)
Data Sources
RDBMS Databases
Audio, Images Log Files Sensors,
RFID Social
Media, Feeds
Hadoop Data Store
HDFS
Hbase (NOSQL DB)
Data Processing
Mapreduce
Data Access
Hive Pig Mahout Machine Learning
Flume, Sqoop
Excel
Business Data Feeds
Zook
eepe
r (Distrib
uted Process M
anag
ement)
Hcatalog (M
etad
ata on
Pig, H
ive, M
apRe
duce )
Oozie Workflow, Scheduler
Infrastructure , Ope
ra:o
ns
(Mon
itorin
g, Con
figura<
on)
Hadoop Eco System
![Page 9: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/9.jpg)
Collect & Import to HDFS
Process (MapReduce)
Analyze (BI Tools) Report & Publish
End to End Solution on Hadoop
![Page 10: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/10.jpg)
Popular Hadoop Distributions • Amazon Elas=c MapReduce (cloud, hbp://aws.amazon.com/elas=cmapreduce/)
• Cloudera (hbp://www.cloudera.com/content/cloudera/en/home.html)
• EMC PivitolHD (hbp://gopivotal.com/)
• Hortonworks HDP (hbp://hortonworks.com/)
• MapR (hbp://mapr.com/)
• Microsod HDInsight (cloud, hbp://www.windowsazure.com/)
![Page 11: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/11.jpg)
HDInsight Differenciator • Enterprise-‐ready Hadoop backed by Microsod
• Analy:cs using Excel
• Integra=on with Ac=ve Directory.
• Integra=on with .NET and Javascript
• Connectors to RDBMS
• Scale using cloud offering: Azure HDInsight service enables customers to scale quickly and has seamless interface between HDFS and Azure Storage Vault
• JavaScript Console
![Page 12: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/12.jpg)
WordCount in HDInsight
![Page 13: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/13.jpg)
CHAPTER 2 HIGHLIGHTS: HDINSIGHT INSTALL ON PREMISE
![Page 14: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/14.jpg)
Apache Hadoop
• Open Source Sodware • Community Development
Hortonworks Data PlaSorm
• Enterprise Hadoop Plagorm (HDP) • Leaders in Hadoop • Code commibers to Hadoop
Microso' HDInsight
• Built on top of HDP • Integra=on with ASV, Excel, Powerview,
SQLServer, Ac=ve Directory
HDInsight Distribution
![Page 15: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/15.jpg)
Physical Install Options
NN SNN JT
DN / TT
Single node for development/test
Mul= node for produc=on
![Page 16: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/16.jpg)
Multi Node Install Steps • Pre-‐requisites • Networking Setup • Remote Scrip=ng • Firewall Setup • Sodware Install (each node) • Hadoop Configura=on • Verifica=on
![Page 17: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/17.jpg)
CHAPTER 3 HIGHLIGHTS: HDINSIGHT AZURE SERVICE
![Page 18: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/18.jpg)
Azure Cloud Service
Create Storage
Create HDInsight cluster
![Page 19: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/19.jpg)
CHAPTER 4 HIGHLIGHTS: ADMINISTER YOUR CLUSTER
![Page 20: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/20.jpg)
HDInsight Cluster Management
![Page 21: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/21.jpg)
HDInsight Dashboard
![Page 22: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/22.jpg)
HDInsight Dashboard
![Page 23: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/23.jpg)
NameNode Status
![Page 24: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/24.jpg)
Jobtracker Status
![Page 25: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/25.jpg)
CHAPTER 5 HIGHLIGHTS: INGEST DATA INTO YOUR CLUSTER
![Page 26: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/26.jpg)
Loading Data into your Cluster You have following op=ons… • Loading data using Hadoop commands • Loading data using Azure Storage Vault • Loading data using Interac:ve JavaScript • Shipping data to your Cluster • Loading data from RDBMS via Sqoop
![Page 27: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/27.jpg)
Loading via Azure Storage Explorer
![Page 28: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/28.jpg)
CHAPTER 6 HIGHLIGHTS: TRANSFORM YOUR DATA
![Page 29: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/29.jpg)
Transforming Data You have following op=ons… • MapReduce • Hive • Pig • Others
![Page 30: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/30.jpg)
Processing Data in Cluster Map for Jan2012
Map for Feb2012
Map for Apr2013
…
One Reducer
![Page 31: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/31.jpg)
HDFS
Hive JDBC/OBDC
Metastore
Thrift Server
Command Line Web GUI
Driver (Parser, Planner, Executor)
MapReduce
Hive
![Page 32: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/32.jpg)
Raw Data in HDFS • Distributed
Storage • Reliable
Data Processing via Pig • Pipelines • Itera=ve Processing • Research
Data Warehouse
HDFS
Data Warehouse via Hive • BI Tools • Analysis
Hive or Pig?
![Page 33: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/33.jpg)
CHAPTER 7 HIGHLIGHTS: ANALYZE & REPORT
![Page 34: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/34.jpg)
Analyze using Excel
![Page 35: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/35.jpg)
Analyze using Excel
![Page 36: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/36.jpg)
CHAPTER 8: PROJECT PLANNING & ARCHITECTURAL CONSIDERATIONS
![Page 37: Hd insight essentials quick view](https://reader033.vdocuments.pub/reader033/viewer/2022060107/554a3a4bb4c905293a8b49f4/html5/thumbnails/37.jpg)
Execu:ve & Stakeholder
Buy-‐in
Discovery & Analysis
Design
Implementa:on User Acceptance
Produc:on Opera:ons
Feedback, New Requirements