how bigtop leveraged docker for build automation and one click hadoop provisioning @ hadoopcon 2015

Download How bigtop leveraged docker for build automation and  one click hadoop provisioning @ HadoopCon 2015

If you can't read please download the document

Upload: evans-ye

Post on 13-Apr-2017

2.562 views

Category:

Software


1 download

TRANSCRIPT

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Evans Ye

HadoopCon 2015

About meApache Bigtop PMC memberWAT? checkout my talk in HadoopCon 2014made Committer 2014/12, PMC member 2015/06Develop Big Data infra & apps @ Trend Micro

OutlineWhat is Apache Bigtop?Achieving Build AutomationOne-Click Hadoop ProvisioningBigtop at Trend Micro

What is Apache Bigtop ?

Simple answer :

A project helps youto build your ownBig Data Stack

Hmmwhat exactly Big Data Stack is?

Big Data StackPowered by Bigtop

Hadoop YARNHadoop HDFSTezSparkSpark StreamingSpark GraphXHivePigApp AApp BApp CTachyonHBasePheonixResource ManagementStorageProcessing EngineAPIs andInterfasesIn-house AppsIgnite

Comprehensive answer :

Bigtop is a project for

PackagingTestingDeploymentVirtualization

Build RPM, DEB packages for Hadoop ecosystem$ yum -y install hadoop$ apt-get upgrade hbaseWhy not just untar hadoop-2.7.1.tar.gz ?Version control, upgrade/downgrade, dependency managementUnified view of configuration, log, binary directory layoutDaemons for better process managementPackaging

You can find binary repos here:http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/Supported Linux distro.

Supported components

A fundamental CI problem may happen in your Big Data Stack, tooTestingHadoop 2.4.1Hadoop 2.6.0Hive 1.1.0Hadoop 2.7.1Hive 1.2.0Hive 1.2.1Tez 0.6.0Tez 0.6.2Tez 0.7.0

A fundamental CI problem may happen in your Big Data Stack, tooTestingHadoop 2.4.1Hadoop 2.6.0Hive 1.1.0Hadoop 2.7.1Hive 1.2.0Hive 1.2.1Tez 0.6.0Tez 0.6.2Tez 0.7.0

A fundamental CI problem may happen in your Big Data Stack, tooTestingHadoop 2.4.1Hadoop 2.6.0Hive 1.1.0Hadoop 2.7.1Hive 1.2.0Hive 1.2.1Tez 0.6.0Tez 0.6.2Tez 0.7.0

Does this combination still work?

Theres a need for integration tests !

Bigtop has a testing framework that provides APIsBuilt-in tests that Bigtop provides:smoke-testsBasic Hadoop ecosystem interoperabilityintegration testsAdvanced tests runs from jar filesBigtop Tests

Okay, then how to run those tests?

Use Bigtop Puppet to deploy a fully functional, distributed Hadoop clusterIntegration tests running on a fully distributed clusterBigtop PuppetMasterless PuppetNumbers of components are supportedKerberos enabled cluster deploymentDeployment

Where to deploy ?

Automate the infrastructure creation & setupBigtop ProvisionerVirtualbox VMs auto-provisioningDocker containers auto-provisioningVirtualization

Thats good !

But I dont need that muchWhat else I can have?

Apache Hadoop App developersOne-click to run distributed cluster to test your code onCluster administrators or deployment gurusUse Bigtop Puppet to deploy and manage your clusterRun Bigtop tests to ensure your cluster is workingVendorsBuild your own Hadoop distribution, customized from Bigtop bitsFrom user point of view

Achieving Build Automation

As you can see, we support a number of Linux distributions (m)Whats the challenge ?

Numbers of components (n)Whats the challenge ?

We need to build them all > m * n

Preparing build environment

Seriously ?Preparing build environment

Puppet recipes automatically install required libraries, build toolsTo transform a machine into a build slave, simply do$ ./gradlew toolchainPrerequisites :Puppet 3.xJRE 7Bigtop Toolchain

Continuous Integration(CI)

CentOS slaveFedora slaveUbuntu slaveDebian slaveOpenSUSE slave

Continuous Integration(CI)

CentOS slaveFedora slaveUbuntu slaveDebian slaveOpenSUSE slave

Bigtop ToolchainBigtop ToolchainBigtop ToolchainBigtop ToolchainBigtop Toolchain

!?

We got Bigtop Toolchain for you.

Hi Bigtop, How do I test my packaging code?A contributorA committer

(Dont want to answer)

I need to setup all the Linux environments on my laptop?A contributorA committer

A very high level view : think it like a VMThree key techniques in DockerUse Linux Kernels Namespaces to create isolated resourcesUse Linux Kernels cgroups to constrain resource usageUnion file system that supports git-like image creationDocker - How it works

LightweightFast creationRepeatablePortabilityruns on *any* Linux environmentThe nice things about it

build, test, and run different Linux distribution in one single machineBest fit in our use case

OS specific Bins/Libs

Bigtop/slaves images

Dockerhub official imagesInstall Puppetbigtop/slaves

Install Bigtop Toolchain

Now, easy to build & test on your laptop

Linux

boot2docker

CentOS slave

Ubuntu slave

Mac OS XWindows

CentOS slave

Ubuntu slave

Docker engineDocker engine

Flexible CI infra

CentOS slaveFedora slaveUbuntu slaveDebian slaveOpenSUSE slave

One-Click Hadoop Provisioning

A tool to demonstrate the full life cycle of BigtopBigtop Provisioner

PackagingTestingDeploymentVirtualization

Spin up resourcesRun Bigtop PuppetRun Bigtop Tests

Bigtop Provisioner

Fast iterative developmentTest your code in the cluster, on your laptop, w/o human interventionFlexibilityChoose any combination of components as you wantResponsive CIIntegration tests that get you the result in minsA Big Data Stack playgroundSpark + Tachyon, Pig on Tez, etcThe goal

Vagrant+Automation Code+Bigtop Puppet

One-click Hadoop Provisioning

[Bigtop Provisionerw/ provider=

We use Vagrant as an abstraction layer to support different kinds of resource providersVagrant

Providers

One click Hadoop provisioning

./docker-hadoop.sh -c 3

bigtop/deploy images on Docker hub

./docker-hadoop.sh -c 3

One click Hadoop provisioning

bigtop/deploy images on Docker hub

./docker-hadoop.sh -c 3

puppet applypuppet applypuppet applyOne click Hadoop provisioning

Bigtop/deploy images

Dockerhub official imagesinstall Puppetbigtop/deploy

install Vagrant ssh key

Bigtop image hierarchy

Dockerhub official imagesbigtop/puppetbigtop/deploybigtop/slavesbigtop/puppet

=diff

All on the Docker hub

https://hub.docker.com/u/bigtop/

A demo is worth 1k words

In case my demo failhttps://asciinema.org/a/a7kiibouqav38xl5p2wrms51r

Supported providers in Bigtop 1.0.0 releaseVirtaulbox VMDocker containerOpenStack support is in master branch nowBigtop Provisioner

For Hadoop app developers, cluster admins, usersRun a distributed cluster to test your code onTweak & test configurations before applying to Production Play around with Bigtop Big Data StackFor Bigtop contributorsEasy to test your packaging, deployment, testing codeFor vendorsCI out of the box > patch upstream code made easierUse cases

Bigtop at Trend Micro

Use Bigtop as the basis for our internal custom distribution of HadoopApply community, internal patches to upstream projects for business and operational needNewest TMH7 is based on Bigtop 1.0 SNAPSHOTTrend Micro Hadoop (TMH)

Working with community made our life easier

Knowing community status made us being able to release TMH7 based on 1.0 SNAPSHOT

Working with community made our life easier

Contribute Bigtop Provisioner, packaging code, pupept recipes, bugfixes, CI setup, anything!Knowing community status made us being able to release TMH7 based on 1.0 SNAPSHOT

Working with community made our life easier

Running Bigtop smoke tests and integration tests using Bigtop Provisioner for TMH7

Working with community made our life easier

Contribute feedback, evaluation, and bugfixes through Production level adoptionRunning Bigtop smoke tests and integration tests using Bigtop Provisioner for TMH7

Hadoop YARNHadoop HDFSMapreduceAd-hoc Query UDFsPigApp AApp COozieResource ManagementStorageProcessing EngineAPIs andInterfasesIn-house AppsTrend Micro Big Data Stack Powered by BigtopKerberosApp BApp CHBaseWuji

Challenges were facing !

Were moving our cluster to another Data CenterNew cluster will be running TMH7Need Distcp to copy data from old TMH6 clusterBoth are Kerberos enabled secure clustersMigration x Upgrade

Distcp betweenTMH6 and TMH7 ?

From our long history we can say:

Things always get complicated with Kerberos

Multi-Cluster Live Synchronization with Kerberos Federated HadoopMammi Chang16:20~17:103F

Moving data across Production and Staging cluster in different Data Centers is a needDoing Distcp across 4 clusters !Our Hadoop deployment tool needs to support Distcop auto-configurationDeveloping the deployment tool is challenging !Hadoop cluster deployment

Run multiple secure HA clusters on DockerDevelop & test the deployment tool repeatedly Docker comes to the rescue

TMH6 ProductionTMH7 ProductionTMH6 StagingTMH7 Staging

Hadoop apps development on Docker

A Devops toolkit for Hadoop app developer to develop and test its code onImages with TMH7(Bigtop) already built-in> up and running w/o deploymentA Hadoop environment for apps to test against our new Hadoop distributionHadoocker

internal Docker registry

./execute.sh

Hadoop serverHadoop clientdataZero deployment Hadoop

TMH7

Hadoop appRestfull APIs

sample datahadoop fs put

Support apps to run end-to end CI against a Hadoop environmentSupport image creation from scratch all the way up to Hadoop pseudo cluster and your appsAllowing customization for your own environmentSame feature is also delivered in Vagrant boxeshttps://github.com/evans-ye/hadoockerFeatures

Summary

Bigtop helps you to build your own Big Data StackBuilding Bigtop packages made easier by offering immutable build environment through Docker imagesBigtop Provisioner creates you Hadoop cluster by just one clickUse Docker to simulate complex environment to ease development and testing effortsShip you apps in Docker image with zero deploymentSummary

Come and join the worldwide open source party !

Questions ?

Thank you !