simplified cluster operation & troubleshooting

33
Simplified Cluster Operation & Troubleshooting Alejandro Fernandez + Jayush Luniya

Upload: hadoop-summit

Post on 07-Jan-2017

315 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Simplified Cluster Operation & Troubleshooting

Simplified Cluster Operation & Troubleshooting

Alejandro Fernandez + Jayush Luniya

Page 2: Simplified Cluster Operation & Troubleshooting

Speakers

Alejandro FernandezSr. Software Engineer @ HortonworksApache Ambari [email protected]

Jayush LuniyaStaff Engineer @ HortonworksApache Ambari [email protected]

Page 3: Simplified Cluster Operation & Troubleshooting

What is Apache Ambari?

Apache Ambari is the open-source platform to provision, manage and monitor Hadoop clusters

Page 4: Simplified Cluster Operation & Troubleshooting

New Enterprise Features

Ambari 2.4• New Services: Log Search, Zeppelin, Hive

LLAP• Role Based Access Control• Management Packs• Grafana UI for Ambari Metrics System• New Views: Zeppelin, Storm

Page 5: Simplified Cluster Operation & Troubleshooting

Apache Ambari Jiras

April 2015

1690 1864

277379

797

206

488

July - Sept 2015

Dec 2015 –Feb 2016

Today

v2.0

v2.1

v2.2v2.41542 and

growing

Page 6: Simplified Cluster Operation & Troubleshooting

Deploy

Secure/LDAP

Smart Configs

Monitor

Upgrade

Scale, Extend, Analyz

e

Simply Operations - Lifecycle

Ease-of-Use Deploy

Page 7: Simplified Cluster Operation & Troubleshooting

Deploy On Premise

Ambari UI wizard handles all of these combinations and makes

recommendations based on host specs.

Page 8: Simplified Cluster Operation & Troubleshooting

Deploy On The Cloud

Certified environmentsSysprepped VMsHundreds of similar clusters

Page 9: Simplified Cluster Operation & Troubleshooting

Deploy with Blueprints

• Systematic way of defining a cluster

• Export existing cluster into blueprint/api/v1/clusters/:clusterName?format=blueprint

Configs Topology Hosts Cluster

Page 10: Simplified Cluster Operation & Troubleshooting

Create a cluster with Blueprints{ "configurations" : [ { "hdfs-site" : {

"dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" }}

{ "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org"

} ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org"

}, { "fqdn" : "worker002.ambari.apache.org"

}, … { "fqdn" : "worker099.ambari.apache.org"

} ] } ]}

1. POST /api/v1/blueprints/my-blueprint

2. POST /api/v1/clusters/my-cluster

Page 11: Simplified Cluster Operation & Troubleshooting

Create a cluster with Blueprints{ "configurations" : [ { "hdfs-site" : {

"dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" }}

{ "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org"

} ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org"

}, { "fqdn" : "worker002.ambari.apache.org"

}, … { "fqdn" : "worker099.ambari.apache.org"

} ] } ]}

1. POST /api/v1/blueprints/my-blueprint

2. POST /api/v1/clusters/my-cluster

Page 12: Simplified Cluster Operation & Troubleshooting

Create a cluster with Blueprints{ "configurations" : [ { "hdfs-site" : {

"dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" }}

{ "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org"

} ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org"

}, { "fqdn" : "worker002.ambari.apache.org"

}, … { "fqdn" : "worker099.ambari.apache.org"

} ] } ]}

1. POST /api/v1/blueprints/my-blueprint

2. POST /api/v1/clusters/my-cluster

Page 13: Simplified Cluster Operation & Troubleshooting

Create a cluster with Blueprints{ "configurations" : [ { "hdfs-site" : {

"dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" }}

{ "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org"

} ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org"

}, { "fqdn" : "worker002.ambari.apache.org"

}, … { "fqdn" : "worker099.ambari.apache.org"

} ] } ]}

1. POST /api/v1/blueprints/my-blueprint

2. POST /api/v1/clusters/my-cluster

Page 14: Simplified Cluster Operation & Troubleshooting

Blueprints for Large Scale• Kerberos, secure out-of-the-box

• High Availability is setup initially for NameNode, YARN, Hive, Oozie, etc

• Host Discovery allows Ambari to automatically install services for a Host when it comes online

• Stack Advisor recommendations

Page 15: Simplified Cluster Operation & Troubleshooting

POST /api/v1/clusters/MyCluster/hosts

[ { "blueprint" : "single-node-hdfs-test2", "host_groups" :[ { "host_group" : "slave", "host_count" : 3, "host_predicate" : "Hosts/cpu_count>1” }, { "host_group" : "super-slave", "host_count" : 5, "host_predicate" : "Hosts/cpu_count>2& Hosts/total_mem>3000000" } ] }]

Blueprint Host Discovery

Page 16: Simplified Cluster Operation & Troubleshooting

Kerberos Available since Ambari 2.0

• Ambari manages Kerberos principals and keytabs

• Works with existing MIT KDC or Active Directory• Once Kerberized, handles

• Adding hosts• Adding components to existing hosts• Adding services• Moving components to different hosts

Page 17: Simplified Cluster Operation & Troubleshooting

Management Packs - Motivation

• Release Managemento Ambari core and stacks released togethero Stack changes require Ambari releaseoDecouple stack and Ambari core releases

• Add-on ServicesoRelease vehicle for 3rd party serviceso Self contained release artifacts

Page 18: Simplified Cluster Operation & Troubleshooting

Management Packs – Release Trains

Page 19: Simplified Cluster Operation & Troubleshooting

Management Packs

• Generalized release artifact for stacks, add-on services, views, etc

• Decouples stack releases from Ambari core release

• Tarballs with metadata for applicability and content

• Stack is an overlay of multiple management packs

Page 20: Simplified Cluster Operation & Troubleshooting

Overlay of Management Packs

Page 21: Simplified Cluster Operation & Troubleshooting

Management Pack++

Short Term Goals (Ambari 2.4)• Retrofit in Stack Processing Framework• Enable 3rd party to ship add-on services• Command line support

Long Term Goals (Future)• Management Pack Framework• Deliver Views• Rest API support

Page 22: Simplified Cluster Operation & Troubleshooting

Role Based Access Control (RBAC)

As Ambari & organizations grow,so do security needs

Ambari integrates with external authentication systems & LDAP

Page 23: Simplified Cluster Operation & Troubleshooting

RBAC Terms

• Roles have permissions,e.g., add services to cluster

• Roles are applied to Resourcese.g., Ambari, particular Cluster, particular View

• Users belong to groups• A group has a role• Users can also have additional roles

Page 24: Simplified Cluster Operation & Troubleshooting

New RBAC Roles

allAmbari Admin

Cluster Admin except manage permissions

Cluster Op except add services, Kerberos,manage Alerts, & upgrades

Service Admin except alter cluster topologyor install components

Service Op except change configsRead-Only only view

Page 25: Simplified Cluster Operation & Troubleshooting

Background: Upgrade Terminology

Manual Upgrade

The user follows instructions to upgrade the stack Incurs downtime

Page 26: Simplified Cluster Operation & Troubleshooting

Background: Upgrade Terminology

Manual Upgrade

The user follows instructions to upgrade the stack Incurs downtime

Rolling Upgrade

Automated Upgrades one component per host at a time Preserves cluster operation and minimizes service impact

Page 27: Simplified Cluster Operation & Troubleshooting

Background: Upgrade Terminology

ExpressUpgrade

Automated Runs in parallel across hosts Incurs downtime

Manual Upgrade

The user follows instructions to upgrade the stack Incurs downtime

Rolling Upgrade

Automated Upgrades one component per host at a time Preserves cluster operation and minimizes service impact

Page 28: Simplified Cluster Operation & Troubleshooting

Automated Upgrade: Rolling or Express

Check Prerequisites

Review the prereqs to confirm your cluster configs are ready

Prepare

Take backups of critical cluster metadata

Perform Upgrade

Perform the HDP upgrade. The steps depend on upgrade method: Rolling or Express

Register + Install

Register the HDP repository and install the target HDP version on the cluster

Finalize

Finalize the upgrade, making the target version the current version

Page 29: Simplified Cluster Operation & Troubleshooting

Process: Rolling Upgrade

ZooKeeper

Ranger

Core Masters

Core Slaves

Hive

Oozie

Falcon

Clients

Kafka

Knox

Storm

Slider

Flume

Finalize or Downgrade

HDFS, YARN, MR, Tez, HBase, Pig. Hive, etc.

HDFS

YARN

HBase

Page 30: Simplified Cluster Operation & Troubleshooting

Grafana for Ambari Metrics

• Grafana as a “Native UI” for Ambari Metrics

• Pre-built DashboardsHost-level, Service-level

• Supports HTTPS

• System Home, Servers

• HDFS Home, NameNodes, DataNodes

• YARN Home, Applications, Job History Server

• HBase Home, Performance, Misc

FEATURES DASHBOARDS

Page 31: Simplified Cluster Operation & Troubleshooting

Grafana includes pre-built dashboards for visualizing the most important cluster metrics.

Page 32: Simplified Cluster Operation & Troubleshooting

The HDFS NameNodedashboard highlightsfile system activity.

Page 33: Simplified Cluster Operation & Troubleshooting

Future of Ambari

• Cloud features• Multiple instances of same service at different

versions, e.g., Spark 1.6 and Spark 2.0• YARN assemblies• Component & Patch Upgrades: upgrade

individual components in the same stack version, e.g., just DN and RM in HDP 2.4.*.* with zero downtime