introduction to hortonworks data cloud for aws

30
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Data Cloud Enterprise ready Hadoop on the cloud 蒋 逸峰(しょう いつほう/Yifeng JiangSolutions Engineer, Hortonworks @uprush December 14, 2016

Upload: yifeng-jiang

Post on 08-Jan-2017

346 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Introduction to Hortonworks Data Cloud for AWS

1 ©HortonworksInc.2011– 2016.AllRightsReserved

HortonworksDataCloudEnterprisereadyHadooponthecloud

蒋逸峰(しょういつほう/YifengJiang)SolutionsEngineer,Hortonworks@uprushDecember14,2016

Page 2: Introduction to Hortonworks Data Cloud for AWS

2 ©HortonworksInc.2011– 2016.AllRightsReserved

About Me

蒋 逸峰 (しょう いつほう / Yifeng Jiang)• Solutions Engineer, Hortonworks• Apache HBase book author• I like hiking & running• Twitter: @uprush

Page 3: Introduction to Hortonworks Data Cloud for AWS

3 ©HortonworksInc.2011– 2016.AllRightsReserved

HortonworksDataPlatform(HDP)

Page 4: Introduction to Hortonworks Data Cloud for AWS

4 ©HortonworksInc.2011– 2016.AllRightsReserved

What’sMissing?

à Ambari makesdeployingHDPsupereasy,but..– Itisnoteasytogetthere– Clustersizing– HWpurchase,setupinDC,network– OSsetup

à Averagethreeweeksorevenmore

Page 5: Introduction to Hortonworks Data Cloud for AWS

5 ©HortonworksInc.2011– 2016.AllRightsReserved

Page 6: Introduction to Hortonworks Data Cloud for AWS

©HortonworksInc.2011– 2016.AllRightsReserved6

IntroducingHortonworksDataCloudforAWS

à AnewcloudproductfromHortonworks– PoweredbyHortonworks DataPlatform

à OffersPay-As-You-Go(PAYG)pricing

à DeliveredandsoldviaAWSMarketplace

à HandlesmostcommonbigdatausecaseswithApacheHadoop,Spark,andHive– Choosefromasetofprescriptiveclustertypes

à Focusesoneaseofuseandbusinessagility– Avoidsinfiniteconfigurabilityandcustomization

à OptionalFreeCommunitySupport**

**EnterpriseSupportoptioncomingsoon

Page 7: Introduction to Hortonworks Data Cloud for AWS

7 ©HortonworksInc.2011– 2016.AllRightsReserved

DEMO

Page 8: Introduction to Hortonworks Data Cloud for AWS

8 ©HortonworksInc.2011– 2016.AllRightsReserved

Architecture

AmazonWebServices

CloudbreakServices

Cloudcontroller(akaCloudbreak)

CloudbreakDB

Connector

AWS GCE Azure

HDPCluster:ETL/EDW

MasterGroupMasterGroup:Hive,Spark

Ambari

SlaveGroup

Blueprint

HDPCluster:Analytics

MasterGroupMasterGroup:LLAP,Zeppelin

Ambari

SlaveGroup

Blueprint

CloudbreakDeployer

Accesstools

Shell RESTAPI WebUI

OpenStack

S3aFileSystem

S3aFileSystem

Page 9: Introduction to Hortonworks Data Cloud for AWS

9 ©HortonworksInc.2011– 2016.AllRightsReserved

HortonworksDataCloud- Summary

à Launchandmanageclustersbyworkloadtype– ETL/EDW,Datascience,Businessanalytics

à Usehighlyscalable,durablestoragefordata(S3)&metadata(RDS)

à Sharedataandmetadataamongmultipleephemeralclusters

à Scaleupanddownattheclickofabutton

à SecureclusterswithIAMroles,securitygroups,etc.

Page 10: Introduction to Hortonworks Data Cloud for AWS

10 ©HortonworksInc.2011– 2016.AllRightsReserved

ImprovingEnterpriseReadiness

Page 11: Introduction to Hortonworks Data Cloud for AWS

11 ©HortonworksInc.2011– 2016.AllRightsReserved

EnterpriseReadiness

Improvingenterprisereadinessinthecloudà Cloudstorageà Securityandgovernanceà Reliabilityandfaulttolerance

Page 12: Introduction to Hortonworks Data Cloud for AWS

12 ©HortonworksInc.2011– 2016.AllRightsReserved

MatchingHadoopwiththeCloud

Datacenter• DataLocality• Consistent

Storage• Singlecluster

administration

Cloud• Scalablestorage• Customizability• Costeffective

compute

• Scalablestoragewithperformanceandconsistency

• Customizabilitywitheaseofadministration

• Costeffectivecomputewith SLApolicies

Page 13: Introduction to Hortonworks Data Cloud for AWS

13 ©HortonworksInc.2011– 2016.AllRightsReserved

CloudStorageaccessfacts

HDFS

Application

Input Output tmp

Interactionmodels

Application

HDFSInput

Output

Copy

à Cloudstorageoptimizesforscale– S3dataisreplicatedforhighscaleaccess,durability

à Dataaccessisremote– Datalocality– Costliermetadataoperations(E.g.hadoop fs–mv isactuallyacopyanddelete)

à EventualConsistency– Takestimeforeffectofmodificationoperationstopermeatetoallcopies

Page 14: Introduction to Hortonworks Data Cloud for AWS

14 ©HortonworksInc.2011– 2016.AllRightsReserved

PerformancewithScalability

à Generalstrategy:Optimizebyworkloadtypes

à ETLworkloads

– Typicalpipeline:Bringindata=>Transform=>Repairpartitions=>Computestatistics

– Multiplemetadatacalls:Batchedandissuedinparallelforperformancegains

à Distcp

– Optimizedbuffermanagementfortransferringlargefiles

– RandomizeinputtoDistcp toavoidhot-spottingS3nodes

Page 15: Introduction to Hortonworks Data Cloud for AWS

15 ©HortonworksInc.2011– 2016.AllRightsReserved

PerformancewithScalability

à Analyticsworkloads– ORCfilerelatedoptimizations

– Supportfastrandomaccessreads(bothdirections)byavoidingtearingdown

S3HTTPconnections

– Passindexinformationtocomputetasksaspartofsplitdatatoavoidre-

computation

à Status:Available,butperformanceoptimizationsneverstopJ

https://hortonworks.github.io/hdp-aws/s3-performance/index.html

Page 16: Introduction to Hortonworks Data Cloud for AWS

16 ©HortonworksInc.2011– 2016.AllRightsReserved

Correctnesswithstrongconsistency

à Writeoperationsfollowedbyreadmaynotreturncorrect

results

– Issuesfordatapipelines,multi-stagejobs,etc.

à S3Guardproject:Intermediate,consistentmetadatastore

à WritecallsfromS3AFileSystemupdatebothS3andmetadata

store

à S3AFileSystemautomaticallytriestoreconcilemetadata

betweenS3andmetadatastoreonsubsequentreads

– Inconsistenciesarehandledbasedonpolicy

à Status:Inprogress

16

https://issues.apache.org/jira/browse/HADOOP-13345

Page 17: Introduction to Hortonworks Data Cloud for AWS

17 ©HortonworksInc.2011– 2016.AllRightsReserved

SecuringdataaccessviaIAMRoles

à Integrationwithcloudprovider

à ProvideanIAMroleasinstanceprofileforacluster

à AttachpoliciesforaccessingS3totherole– E.g.Read-onlyaccessforBIclusterto

specificbuckets

à Status:Available

Page 18: Introduction to Hortonworks Data Cloud for AWS

18 ©HortonworksInc.2011– 2016.AllRightsReserved

DataSecurityinHadoop

ApacheRangerà Finegrained,role-basedaccesspoliciesto

data– Table/columnlevelACL

à Auditaccessinformationà Rowlevelfilteringà Dynamicdatamasking

Page 19: Introduction to Hortonworks Data Cloud for AWS

19 ©HortonworksInc.2011– 2016.AllRightsReserved

DataGovernanceinHadoop

ApacheAtlasà Autodiscover&indexmetadataà Tagdataà Trackdatalineage

Page 20: Introduction to Hortonworks Data Cloud for AWS

20 ©HortonworksInc.2011– 2016.AllRightsReserved

Datagovernancetechnicalarchitecture– OnPremise

OnPremiseHDPCluster

RangerAdmin

PolicyPolicy

AtlasAdmin Metadata

GovernedHDPComponent(E.g.Hive)

RangerPlugin

AtlasPlugin

LDAP/AD

DataSteward

Page 21: Introduction to Hortonworks Data Cloud for AWS

21 ©HortonworksInc.2011– 2016.AllRightsReserved

DataGovernanceintheCloud:Easeofadministrationwithflexibility

à Nolongerasinglecomputeclustergenerating/accessingdata

à Data&Metadataarestillsingleandshared

à EvolveAtlasandRangertobedatalakecentricthanclustercentric– SharedlongrunningAdmincomponents– Ephemeralpluginsoncomputeclusters

à Status:AvailableasaTechPreview

https://github.com/hortonworks/hdc-cli/blob/master/shared_cluster.md

Page 22: Introduction to Hortonworks Data Cloud for AWS

22 ©HortonworksInc.2011– 2016.AllRightsReserved

SharedRanger/Atlasadminservices

AvailableinTechPreviewinHortonworksDataCloud

ETL-EDWCluster

GovernedHDPComponent(E.g.Hive)

LDAP/AD

RangerPlugin

AtlasPlugin

DataAnalyticsCluster

GovernedHDPComponent(E.g.Hive)

RangerPlugin

AtlasPlugin

RangerAdmin Policy

Policy

AtlasAdmin Metadata

CloudController

SharedEnterpriseServices

DataSteward

Page 23: Introduction to Hortonworks Data Cloud for AWS

23 ©HortonworksInc.2011– 2016.AllRightsReserved

HDPCloudComputenodesonAWS

à RegularEC2instancesà CanattachEBSvolumesorephemeralstoragedisksà Groupedaccordingtofunctionality/accessrequirementsà Opportunisticprovisioning– spotinstances(workinprogress)

HDPCluster

MasterGroupGroup#1

Gatewaynode:Ambari

MasterGroupGroup#2

CloudController

Page 24: Introduction to Hortonworks Data Cloud for AWS

24 ©HortonworksInc.2011– 2016.AllRightsReserved

HDPCloudComputenodesonAWS

24

Page 25: Introduction to Hortonworks Data Cloud for AWS

25 ©HortonworksInc.2011– 2016.AllRightsReserved

Reliabilitywithcostbenefits

à HDPhostinstancescouldbecomeunhealthy– Unreliableunderlyinginfrastructure– Spotinstancesaretransient,dependentonbidprice– SLAimpactforworkloads

à Automaticallyreplaceun-healthynodes– Nocostsincurredifnodeisnotfunctional– Replaceunhealthyinstancestomaintainadesiredcapacity

à Status:Workinprogress

Page 26: Introduction to Hortonworks Data Cloud for AWS

26 ©HortonworksInc.2011– 2016.AllRightsReserved

Auto-recoveryofslavenodes

à UseAmbaritodetectunhealthystatus&notifyCloudbreakà Decommissionandterminateunhealthyinstancesà Provisionnewinstancesandaddtocluster

HDPCluster

MasterGroupGroup#1

Gatewaynode:Ambari

MasterGroupGroup#2CloudController

Page 27: Introduction to Hortonworks Data Cloud for AWS

27 ©HortonworksInc.2011– 2016.AllRightsReserved

Summary

Page 28: Introduction to Hortonworks Data Cloud for AWS

28 ©HortonworksInc.2011– 2016.AllRightsReserved

OurConnectedDataPlatformSolutions

Hortonworks:PoweringtheFutureofData(Everybusinessisadatabusiness,mastervalueofdataviaopenapproach)

ModernDataApplications(CyberSecurity,IoT,Partners,Custom,etc.)

ConnectedDataPlatforms(ManageAllData:data-at-rest,data-in-motion,datacenter&cloud)

Training|Consulting|CommunityConnection|Partnerworks

DataCenterSolutions CloudSolutions

HortonworksDataCloudforAWS

AzureHDInsight

RackspaceAccentureOthers

HDP HDFSyncsortAtScale

PivotalHDBOthers

EnterpriseSubscription

SmartSense operationalsvc’s24x7SupportMaintenance

Etc.

Page 29: Introduction to Hortonworks Data Cloud for AWS

29 ©HortonworksInc.2011– 2016.AllRightsReserved

http://hortonworks.com/info/aws-marketplace-credits-signup/

Page 30: Introduction to Hortonworks Data Cloud for AWS

30 ©HortonworksInc.2011– 2016.AllRightsReserved

THANKYOU