cloudera + microsoftでhadoopするのがイイらしい。 #cwt2016

71
Cloudera Microsoft Hadoop Cloudera World Tokyo 2016 Microsoft Corporation Global Blackbelt Sales Japan OSS TSP Rio Fujita Cloudera, Inc. Senior Technical Manager Tatsuo Kawasaki Hitachi Solutions Lead Engineer Masaki Iwanaga

Upload: cloudera-japan

Post on 06-Jan-2017

951 views

Category:

Technology


2 download

TRANSCRIPT

Cloudera Microsoft Hadoop

Cloudera World Tokyo 2016

Microsoft Corporation Global Blackbelt Sales

Japan OSS TSP Rio Fujita

Cloudera, Inc. Senior Technical Manager Tatsuo Kawasaki

Hitachi Solutions Lead Engineer

Masaki Iwanaga

AGENDA• Self Introduction

• Stories

• Solution

2

OSS ON CLOUD

3https://azure.microsoft.com/ja-jp/free/

Stories

Hadoop•

• ...

• ...

• = Hadoop

5

6

Hadoop 10

10 “Hadoop”

(HDFS)

(MapReduce)

7

“Hadoop”

Core Hadoop

8

Hadoop 10

Azure• Windows Azure

• PaaS

• Microsoft Azure

• IaaS

• Linux / FreeBSD

9

10

Platform Services

Infrastructure Services

Web Apps

MobileApps

APIManagement

API Apps

Logic Apps

Notification Hubs

Content DeliveryNetwork (CDN)

Media Services

BizTalkServices

HybridConnections

Service Bus

StorageQueues

HybridOperations

Backup

StorSimple

Azure SiteRecovery

Import/Export

SQL Database

DocumentDB

RedisCache

AzureSearch

StorageTables

DataWarehouse Azure AD

Health Monitoring

AD PrivilegedIdentity Management

OperationalAnalytics

Cloud Services

Batch RemoteApp

ServiceFabric

Visual Studio

AppInsights

Azure SDK

VS Online

Domain Services

HDInsight MachineLearning

StreamAnalytics

Data Factory

EventHubs

MobileEngagement

Data Lake

IoT Hub

Data Catalog

Security & Management

Azure ActiveDirectory

Multi-FactorAuthentication

Automation

Portal

Key Vault

Store/Marketplace

VM Image Gallery& VM Depot

Azure ADB2C

Scheduler

Infrastructure

+Hundreds of community supported images on

Databases

SQL

App

Clients

Management

Applications

Web App Gallery Dozens of .NET & PHP

1/3 : Linux11

Hadoop

• /

→ Hadoop

12

13

14

Azure

15

Generally AvailableComing Soon

https://azure.microsoft.com/en-us/regions/

28 + 6, 2

16

JapanWest

JapanEast

6 duplicates

17

Datacenter

Internet Exchange

Terrestrial Network

Subsea Network

Edge Node

CDN Locations

56Earth laps

18

Co

mp

lian

t

Source : https://www.microsoft.com/en-us/TrustCenter/Compliance/default.aspx

19

Hadoop•

20

Hadoop

Finance Government Telecom Manufacturing Energy Healthcare

1PL\ ^A]_!-0��'��&��+/

�"*���0/�Il:

@UjT�Il: fe�nX

�/�-#�0-/�+/

��� /"*��/�

hboO Gp2S ?r�0��8W ��$hJ

+�#��%/��* ���Il:

B4C<�Il:RFID�0��;q

�#0�%0�0�9J

dD=k ��9J

�������

FE9J

Web�"+�0�)/�Il:�0�',CRM / ���.09J-�(+�� / "-&0�)/9J

ERP

5K�Il:

#0���/H`�Il:

7m+�����%/�

;Z6i

!*/�aW

VR�Il:

IT�/ *�9JQY�����+0

3M7��Lc

N>�[g

21

DATA-DRIVENPRODUCTS

• Azure Cloudera Hadoop

22

DATA-DRIVENPRODUCTS

ERP

23

http://go.cloudera.com/LP=985

Azure•

24

9s / 45s (1GB)

18,550km

25

Source : http://www.azurespeed.com

26

ExpressRoute

Microsoft Edge

Customer’s network

ExpressRoutePartnerEdge

Traffic to public IP addresses in AzureTraffic to Virtual Networks

Traffic to Office 365 Services

27

Hadoop

• Hadoop

Hadoop

28

29

2006 2008 2009 2010 2011 2012 2013

CoreHadoop(HDFS,

MapReduce)

HBaseZooKeeper

SolrPig

CoreHadoop

HiveMahoutHBase

ZooKeeperSolrPig

CoreHadoop

SqoopAvroHive

MahoutHBase

ZooKeeperSolrPig

CoreHadoop

FlumeBigtopOozie

HCatalogHueSqoopAvroHive

MahoutHBase

ZooKeeperSolrPigYARN

CoreHadoop

SparkTez

ImpalaKafkaDrillFlumeBigtopOozie

HCatalogHueSqoopAvroHive

MahoutHBase

ZooKeeperSolrPigYARN

CoreHadoop

ParquetSentrySparkTez

ImpalaKafkaDrillFlumeBigtopOozie

HCatalogHueSqoopAvroHive

MahoutHBase

ZooKeeperSolrPigYARN

CoreHadoop

2007

SolrPig

CoreHadoop

KnoxFlink

ParquetSentrySparkTez

ImpalaKafkaDrillFlumeBigtopOozie

HCatalogHueSqoopAvroHive

MahoutHBase

ZooKeeperSolrPigYARN

CoreHadoop

2014 2015

KuduRecordService

IbisFalconKnoxFlink

ParquetSentrySparkTez

ImpalaKafkaDrillFlumeBigtopOozie

HCatalogHueSqoopAvroHive

MahoutHBase

ZooKeeperSolrPigYARN

CoreHadoop

SQL

30

Azure• …

31

IoT

JBoss BRMS + Azure

Microsoft Power BI

SORACOM

Red Hat JBoss BRMS

Beam

SORACOMAir

SORACOMBeam

HDInsightSQLDatabase

StorageMachineLearning

StreamAnalytics

Event Hubs

Microsoft Azure

IoT Microsoft Azure

IoT

INTERNET of

THINGS

RED HAT ENTERPRISE LINUX

App JBoss BRMS/CEP

JBoss EAP

Java VM DataBase

Hadoop

SORACOM

Red Hat

Cloudera

32

NoSQL MongoDB• NoSQL

• Hadoop

• NoSQL RDBMS

• Hadoop NoSQL HBase

Kudu

33

34

with Parquet

An

alyt

ics

(Fas

t Sc

ans)

Online (Fast Random Access)Slow Fast

Slow

Fast

Hadoop SQL• BI Impara + Pentaho, Tableau,

PowerBI

• SQL Hive on Spark, Spark SQL

• Java, Python, Scala….

35

SQL

orImpala

SQL

36

BI SQL

/

Azure GUI

• JSON Powershell

37

Azure Quickstart Templates

38

Apache Spark Hadoop

• Apache Spark

• Hadoop MapReduce

• Hadoop

39

http://www.slideshare.net/Cloudera_jp/spark-cwt2015

40

Spark Hadoop

Spark Impala Search MR Others

YARN

HDFS, HBase, Kudu

Spark Streaming

MLlib / Spark ML

SparkSQL DataFrame

41

Azure Stack• Azure

Azure

• Azure

Azure

42

43

Hadoop•

44

45

46

!Azure

G

A

Azure

• Template

• ”HPC Linux Workload”

47

HPC Pack cluster for Linux workloads

48

Cloudera Hadoop

49

Fast -

How can you make real-time analytics a reality?

How can you unblock ad hoc data access?

How can you optimize the right workloads for Hadoop?

50

Easy -

How many people know how to manage your system?

How fast can you troubleshoot and fix issues?

How do you scale?

51

Secure -

How can you protect everything?

How can you control access for the entire platform?

How can you know what happened and is happening?

52

Hadoop

Cloudera Enterprise

53

Hadoop

54

Hadoop

55

MapReduce

HDFS /

Hadoop

56

Hadoop

57

 

recommendation accounts for all hardware components such as CPU, memory, and disk options, including ephemeral SSDs.  

The following table describes workload categories and services typically combined for workload types. 

Table 1: Workload categories. 

Workload Type  Typical Services  Comments 

Low  ● MRv2 (YARN) 

● Hive 

● Pig 

● Crunch 

Suitable for workloads that are predominantly batch 

oriented and involve MapReduce.  

Medium  ● HBase 

● Solr 

● Spark 

Suitable for higher resource­consuming services and 

production workloads but limited to one of these 

running at any time. 

High / Full EDH  ● Impala 

● All CDH services 

Full­scale production workloads with multiple services 

running in parallel on a multi­tenant cluster. 

Table 2 identifies service roles for the different node types. 

Table 2: Service Roles and Node Types 

  Master Node 1  Master Node 2  Master Node 3  Worker Nodes 

ZooKeeper  ZooKeeper  ZooKeeper  ZooKeeper   

HDFS  NN, QJN  NN, QJN  QJN  DataNode 

YARN  RM  RM  History Server  NodeManager 

Hive      MetaStore, WebHCat, 

HiveServer2  

Management 

(misc) 

Cloudera Agent  Cloudera Agent  Cloudera Agent, 

Oozie, Management 

Services, Cloudera 

Manager (CM will be 

on dedicated node for 

Director 

Deployments) 

Cloudera Agent 

Hue      Hue Server   

Cloudera Enterprise Reference Architecture for Azure Deployments  |9  

Azure Hadoop CPU

• Azure

58

http://tiny.cloudera.com/cm-azure-ref-arch 

must use the Director Command Line Interface and configuration files based on the published examples. The log and data locations must not change, except as needed to reflect the number of data drives.  

Azure Marketplace 

The  Cloudera Enterprise Data Hub  offering in the Azure Marketplace allows you to quickly deploy a properly configured cluster in Azure.The automation logic assigns the correct number of master and worker nodes. The same configuration can also be created using the high availability (HA) option from the Azure portal. This offering provides simple, one­click provisioning of a cluster for proof­of­concept, prototype, development, and production environments. The provisioned cluster runs on the latest distribution of CDH with services such as HDFS, YARN, Impala, Oozie, Hive, Hue, and ZooKeeper with Cloudera Manager.  

Deployment Scripts 

You can use the Cloudera on CentOS   Azure Quickstart Template  as a starting point for a simple or customized deployment to Azure.  The scripts demonstrate proper setup of Nodes, Role Assignment, and Configuration. 

Figure 1 shows a deployment scenario using Cloudera Director, Azure Marketplace offer or Deployment Scripts to launch a Cloudera cluster. 

Figure 1: Deployment Scenario 

 

Edge Security 

The accessibility of the Cloudera Enterprise cluster is defined by the Azure Network Security Group (NSG) configurations assigned to VMs and/or Virtual Network subnets and depends on security requirements and workload. Typically, edge nodes (also known as client nodes or gateway nodes) have direct access to the cluster’s internal network. Through client applications on these edge nodes, users interact with the cluster and its data. These edge nodes can run web applications for real­time serving workloads, BI tools, or the Hadoop command­line client used to interact with HDFS. 

Azure Resource Quotas 

The default quota for cores within an Azure subscription is 20 cores per region; for more than 20 cores, customers will need engage Microsoft Support within the Azure portal to request an increased quota. The available quota for the region you are deploying to has to be greater than the number of cores used by all VMs you are launching. 

Workloads & Roles In this reference architecture, we consider different kinds of workloads that are run on top of an Enterprise Data Hub. The initial requirements focus on instance types that are suitable for a diverse set of workloads. As service offerings change, these requirements may change to specify instance types that are unique to specific workloads. The 

Cloudera Enterprise Reference Architecture for Azure Deployments  |8  

   

Cloudera Enterprise Reference Architecture for Azure Deployments 

 

Cloudera Hadoop Azure

59

60

Azure Hadoop

61

Solution

Case Studies : UNCOVER TRUTH

63https://www.microsoft.com/ja-jp/casestudies/uncovertruth.aspx

© Hitachi Solutions, Ltd. 2016. All rights reserved.

»¹º¼Bºº^¾\�

dD��³\�ĩŁľŇģĿŅĦ�ââģĥĮĽ4�a§â�»§âââ>m&@�

>H=C26?LDA��5*.�&&&&&&&&&&&&&&&& ��:�96+���

ÁÏÑÖÈÉÓƳÅÑÓÏȳÄÑÎØѳ»¹º¼�

© Hitachi Solutions, Ltd. 2016. All rights reserved.

��'*%FĄ±;įŇĪē�z÷þĊãdD��ŀŅıĥĠėĹh�Ąè��±;ĄTď ó÷ĩŁľŇģĿŅéÿ`EøĐ�b�cē�GĎđúä�

g$~�­

SN��6���

5

��ÀÙÖÓÉĀĄħĝľĕĂķĔėł¡£ãîčćÀÙÖÓɳÃÕÑÓÆÊÉ�ÿĄ¦�ĂĕĞħĥ HēîóĂíĐóĀē��ÿñúä�

Q�~�­

ÀÙÖÓÉąãæĹŀėĴģŇL5'*%ĩŁľŇģĿŅçēU�ÿñĐYúĂķĖŇłıÿêĐň�

© Hitachi Solutions, Ltd. 2016. All rights reserved.

TN+��'./'1';�,87�

6

ËÕÕÒ¿¸¸×××·ËÌÕÆÇË̶ÔÑÏÖÕÌÑÐÔ·ÇÑ·ÍÒ¸ÒÈÆÒ �̧Úî-ë)ĒùķěŇĽïĎ]¬ô¤�òûõëň�

üëăã�

â⻹º½BăæXl��L5��oçð[�õđĈøň��

Nå\�ĩŁľŇģĿŅĦðãw�ĄÎ¶'*%Q�ăčď�.ô�÷Ĉ÷účìĂY÷ëįŇĪpzZoÿ�îP�ëëú÷Ĉøä��

Y÷ëĵĤijĥē��ă7ĉĈ÷Čìä�

a�+ąō��Ą0<ĹńĤęĞİĄMcēpz÷þëĈøä�ü â�syg�ōAMŏŎBCæL56�qĹńĤęĞİçÿ©|õđúæ��L5'*%4�çăªøĐQ�L5�ü â�syg�ōAMŏŏŕŏŐBCæk��²�¯ņ�Ěij2ÂÄ4�Q�©|ņ:��gŊ�#L5pz2ĞŀĘıĢŇĵĥS�ĄúĉĄįŇĪ'*%ĹŀĭİķěŇĽQ�©|�gŋçÿ©|õđúQ�L5�

Œ,iœ��¢õđþëĐ�*ō�+*Ăāąō(�Ą,iĈúą}¨,iÿøä�

© Hitachi Solutions, Ltd. 2016. All rights reserved.

END

»¹º¼Bºº^¾\�

dD��³\�ĩŁľŇģĿŅĦ�ââģĥĮĽ4�a§â�»§âââ>m&@�

>H=C26?LDA��5*.�&&&&&&&&&&&&&&&& ��:�96+���