cloudera + microsoftでhadoopするのがイイらしい。 #cwt2016
TRANSCRIPT
Cloudera Microsoft Hadoop
Cloudera World Tokyo 2016
Microsoft Corporation Global Blackbelt Sales
Japan OSS TSP Rio Fujita
Cloudera, Inc. Senior Technical Manager Tatsuo Kawasaki
Hitachi Solutions Lead Engineer
Masaki Iwanaga
10
Platform Services
Infrastructure Services
Web Apps
MobileApps
APIManagement
API Apps
Logic Apps
Notification Hubs
Content DeliveryNetwork (CDN)
Media Services
BizTalkServices
HybridConnections
Service Bus
StorageQueues
HybridOperations
Backup
StorSimple
Azure SiteRecovery
Import/Export
SQL Database
DocumentDB
RedisCache
AzureSearch
StorageTables
DataWarehouse Azure AD
Health Monitoring
AD PrivilegedIdentity Management
OperationalAnalytics
Cloud Services
Batch RemoteApp
ServiceFabric
Visual Studio
AppInsights
Azure SDK
VS Online
Domain Services
HDInsight MachineLearning
StreamAnalytics
Data Factory
EventHubs
MobileEngagement
Data Lake
IoT Hub
Data Catalog
Security & Management
Azure ActiveDirectory
Multi-FactorAuthentication
Automation
Portal
Key Vault
Store/Marketplace
VM Image Gallery& VM Depot
Azure ADB2C
Scheduler
Infrastructure
+Hundreds of community supported images on
Databases
SQL
App
Clients
Management
Applications
Web App Gallery Dozens of .NET & PHP
1/3 : Linux11
Datacenter
Internet Exchange
Terrestrial Network
Subsea Network
Edge Node
CDN Locations
56Earth laps
18
Hadoop
Finance Government Telecom Manufacturing Energy Healthcare
1PL\ ^A]_!-0��'��&��+/
�"*���0/�Il:
@UjT�Il: fe�nX
�/�-#�0-/�+/
��� /"*��/�
hboO Gp2S ?r�0��8W ��$hJ
+�#��%/��* ���Il:
B4C<�Il:RFID�0��;q
�#0�%0�0�9J
dD=k ��9J
�������
FE9J
Web�"+�0�)/�Il:�0�',CRM / ���.09J-�(+�� / "-&0�)/9J
ERP
5K�Il:
#0���/H`�Il:
7m+�����%/�
;Z6i
!*/�aW
VR�Il:
IT�/ *�9JQY�����+0
3M7��Lc
N>�[g
21
ExpressRoute
Microsoft Edge
Customer’s network
ExpressRoutePartnerEdge
Traffic to public IP addresses in AzureTraffic to Virtual Networks
Traffic to Office 365 Services
27
29
2006 2008 2009 2010 2011 2012 2013
CoreHadoop(HDFS,
MapReduce)
HBaseZooKeeper
SolrPig
CoreHadoop
HiveMahoutHBase
ZooKeeperSolrPig
CoreHadoop
SqoopAvroHive
MahoutHBase
ZooKeeperSolrPig
CoreHadoop
FlumeBigtopOozie
HCatalogHueSqoopAvroHive
MahoutHBase
ZooKeeperSolrPigYARN
CoreHadoop
SparkTez
ImpalaKafkaDrillFlumeBigtopOozie
HCatalogHueSqoopAvroHive
MahoutHBase
ZooKeeperSolrPigYARN
CoreHadoop
ParquetSentrySparkTez
ImpalaKafkaDrillFlumeBigtopOozie
HCatalogHueSqoopAvroHive
MahoutHBase
ZooKeeperSolrPigYARN
CoreHadoop
2007
SolrPig
CoreHadoop
KnoxFlink
ParquetSentrySparkTez
ImpalaKafkaDrillFlumeBigtopOozie
HCatalogHueSqoopAvroHive
MahoutHBase
ZooKeeperSolrPigYARN
CoreHadoop
2014 2015
KuduRecordService
IbisFalconKnoxFlink
ParquetSentrySparkTez
ImpalaKafkaDrillFlumeBigtopOozie
HCatalogHueSqoopAvroHive
MahoutHBase
ZooKeeperSolrPigYARN
CoreHadoop
IoT
JBoss BRMS + Azure
Microsoft Power BI
SORACOM
Red Hat JBoss BRMS
Beam
SORACOMAir
SORACOMBeam
HDInsightSQLDatabase
StorageMachineLearning
StreamAnalytics
Event Hubs
Microsoft Azure
IoT Microsoft Azure
IoT
INTERNET of
THINGS
RED HAT ENTERPRISE LINUX
App JBoss BRMS/CEP
JBoss EAP
Java VM DataBase
Hadoop
SORACOM
Red Hat
Cloudera
32
Hadoop SQL• BI Impara + Pentaho, Tableau,
PowerBI
• SQL Hive on Spark, Spark SQL
• Java, Python, Scala….
35
Spark Hadoop
Spark Impala Search MR Others
YARN
HDFS, HBase, Kudu
Spark Streaming
MLlib / Spark ML
SparkSQL DataFrame
41
Fast -
How can you make real-time analytics a reality?
How can you unblock ad hoc data access?
How can you optimize the right workloads for Hadoop?
50
Easy -
How many people know how to manage your system?
How fast can you troubleshoot and fix issues?
How do you scale?
51
Secure -
How can you protect everything?
How can you control access for the entire platform?
How can you know what happened and is happening?
52
recommendation accounts for all hardware components such as CPU, memory, and disk options, including ephemeral SSDs.
The following table describes workload categories and services typically combined for workload types.
Table 1: Workload categories.
Workload Type Typical Services Comments
Low ● MRv2 (YARN)
● Hive
● Pig
● Crunch
Suitable for workloads that are predominantly batch
oriented and involve MapReduce.
Medium ● HBase
● Solr
● Spark
Suitable for higher resourceconsuming services and
production workloads but limited to one of these
running at any time.
High / Full EDH ● Impala
● All CDH services
Fullscale production workloads with multiple services
running in parallel on a multitenant cluster.
Table 2 identifies service roles for the different node types.
Table 2: Service Roles and Node Types
Master Node 1 Master Node 2 Master Node 3 Worker Nodes
ZooKeeper ZooKeeper ZooKeeper ZooKeeper
HDFS NN, QJN NN, QJN QJN DataNode
YARN RM RM History Server NodeManager
Hive MetaStore, WebHCat,
HiveServer2
Management
(misc)
Cloudera Agent Cloudera Agent Cloudera Agent,
Oozie, Management
Services, Cloudera
Manager (CM will be
on dedicated node for
Director
Deployments)
Cloudera Agent
Hue Hue Server
Cloudera Enterprise Reference Architecture for Azure Deployments |9
Azure Hadoop CPU
• Azure
58
http://tiny.cloudera.com/cm-azure-ref-arch
must use the Director Command Line Interface and configuration files based on the published examples. The log and data locations must not change, except as needed to reflect the number of data drives.
Azure Marketplace
The Cloudera Enterprise Data Hub offering in the Azure Marketplace allows you to quickly deploy a properly configured cluster in Azure.The automation logic assigns the correct number of master and worker nodes. The same configuration can also be created using the high availability (HA) option from the Azure portal. This offering provides simple, oneclick provisioning of a cluster for proofofconcept, prototype, development, and production environments. The provisioned cluster runs on the latest distribution of CDH with services such as HDFS, YARN, Impala, Oozie, Hive, Hue, and ZooKeeper with Cloudera Manager.
Deployment Scripts
You can use the Cloudera on CentOS Azure Quickstart Template as a starting point for a simple or customized deployment to Azure. The scripts demonstrate proper setup of Nodes, Role Assignment, and Configuration.
Figure 1 shows a deployment scenario using Cloudera Director, Azure Marketplace offer or Deployment Scripts to launch a Cloudera cluster.
Figure 1: Deployment Scenario
Edge Security
The accessibility of the Cloudera Enterprise cluster is defined by the Azure Network Security Group (NSG) configurations assigned to VMs and/or Virtual Network subnets and depends on security requirements and workload. Typically, edge nodes (also known as client nodes or gateway nodes) have direct access to the cluster’s internal network. Through client applications on these edge nodes, users interact with the cluster and its data. These edge nodes can run web applications for realtime serving workloads, BI tools, or the Hadoop commandline client used to interact with HDFS.
Azure Resource Quotas
The default quota for cores within an Azure subscription is 20 cores per region; for more than 20 cores, customers will need engage Microsoft Support within the Azure portal to request an increased quota. The available quota for the region you are deploying to has to be greater than the number of cores used by all VMs you are launching.
Workloads & Roles In this reference architecture, we consider different kinds of workloads that are run on top of an Enterprise Data Hub. The initial requirements focus on instance types that are suitable for a diverse set of workloads. As service offerings change, these requirements may change to specify instance types that are unique to specific workloads. The
Cloudera Enterprise Reference Architecture for Azure Deployments |8
Cloudera Enterprise Reference Architecture for Azure Deployments
© Hitachi Solutions, Ltd. 2016. All rights reserved.
»¹º¼Bºº^¾\�
dD��³\�ĩŁľŇģĿŅĦ�ââģĥĮĽ4�a§â�»§âââ>m&@�
>H=C26?LDA��5*.�&&&&&&&&&&&&&&&& ��:�96+���
ÁÏÑÖÈÉÓƳÅÑÓÏȳÄÑÎØѳ»¹º¼�
© Hitachi Solutions, Ltd. 2016. All rights reserved.
��'*%FĄ±;įŇĪē�z÷þĊãdD��ŀŅıĥĠėĹh�Ąè��±;ĄTď ó÷ĩŁľŇģĿŅéÿ`EøĐ�b�cē�GĎđúä�
g$~�
SN��6���
5
��ÀÙÖÓÉĀĄħĝľĕĂķĔėł¡£ãîčćÀÙÖÓɳÃÕÑÓÆÊÉ�ÿĄ¦�ĂĕĞħĥ HēîóĂíĐóĀē��ÿñúä�
Q�~�
ÀÙÖÓÉąãæĹŀėĴģŇL5'*%ĩŁľŇģĿŅçēU�ÿñĐYúĂķĖŇłıÿêĐň�
© Hitachi Solutions, Ltd. 2016. All rights reserved.
TN+��'./'1';�,87�
6
�
ËÕÕÒ¿¸¸×××·ËÌÕÆÇË̶ÔÑÏÖÕÌÑÐÔ·ÇÑ·ÍÒ¸ÒÈÆÒ �̧Úî-ë)ĒùķěŇĽïĎ]¬ô¤�òûõëň�
�
üëăã�
â⻹º½BăæXl��L5��oçð[�õđĈøň��
Nå\�ĩŁľŇģĿŅĦðãw�ĄÎ¶'*%Q�ăčď�.ô�÷Ĉ÷účìĂY÷ëįŇĪpzZoÿ�îP�ëëú÷Ĉøä��
Y÷ëĵĤijĥē��ă7ĉĈ÷Čìä�
a�+ąō��Ą0<ĹńĤęĞİĄMcēpz÷þëĈøä�ü â�syg�ōAMŏŎBCæL56�qĹńĤęĞİçÿ©|õđúæ��L5'*%4�çăªøĐQ�L5�ü â�syg�ōAMŏŏŕŏŐBCæk��²�¯ņ�Ěij2ÂÄ4�Q�©|ņ:��gŊ�#L5pz2ĞŀĘıĢŇĵĥS�ĄúĉĄįŇĪ'*%ĹŀĭİķěŇĽQ�©|�gŋçÿ©|õđúQ�L5�
Œ,iœ��¢õđþëĐ�*ō�+*Ăāąō(�Ą,iĈúą}¨,iÿøä�