how bigtop leveraged docker for build automation and one click hadoop provisioning @ hadoopcon 2015
TRANSCRIPT
How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning
Evans Ye
HadoopCon 2015
About meApache Bigtop PMC memberWAT? checkout my talk in HadoopCon 2014made Committer 2014/12, PMC member 2015/06Develop Big Data infra & apps @ Trend Micro
OutlineWhat is Apache Bigtop?Achieving Build AutomationOne-Click Hadoop ProvisioningBigtop at Trend Micro
What is Apache Bigtop ?
Simple answer :
A project helps youto build your ownBig Data Stack
Hmmwhat exactly Big Data Stack is?
Big Data StackPowered by Bigtop
Hadoop YARNHadoop HDFSTezSparkSpark StreamingSpark GraphXHivePigApp AApp BApp CTachyonHBasePheonixResource ManagementStorageProcessing EngineAPIs andInterfasesIn-house AppsIgnite
Comprehensive answer :
Bigtop is a project for
PackagingTestingDeploymentVirtualization
Build RPM, DEB packages for Hadoop ecosystem$ yum -y install hadoop$ apt-get upgrade hbaseWhy not just untar hadoop-2.7.1.tar.gz ?Version control, upgrade/downgrade, dependency managementUnified view of configuration, log, binary directory layoutDaemons for better process managementPackaging
You can find binary repos here:http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/Supported Linux distro.
Supported components
A fundamental CI problem may happen in your Big Data Stack, tooTestingHadoop 2.4.1Hadoop 2.6.0Hive 1.1.0Hadoop 2.7.1Hive 1.2.0Hive 1.2.1Tez 0.6.0Tez 0.6.2Tez 0.7.0
A fundamental CI problem may happen in your Big Data Stack, tooTestingHadoop 2.4.1Hadoop 2.6.0Hive 1.1.0Hadoop 2.7.1Hive 1.2.0Hive 1.2.1Tez 0.6.0Tez 0.6.2Tez 0.7.0
A fundamental CI problem may happen in your Big Data Stack, tooTestingHadoop 2.4.1Hadoop 2.6.0Hive 1.1.0Hadoop 2.7.1Hive 1.2.0Hive 1.2.1Tez 0.6.0Tez 0.6.2Tez 0.7.0
Does this combination still work?
Theres a need for integration tests !
Bigtop has a testing framework that provides APIsBuilt-in tests that Bigtop provides:smoke-testsBasic Hadoop ecosystem interoperabilityintegration testsAdvanced tests runs from jar filesBigtop Tests
Okay, then how to run those tests?
Use Bigtop Puppet to deploy a fully functional, distributed Hadoop clusterIntegration tests running on a fully distributed clusterBigtop PuppetMasterless PuppetNumbers of components are supportedKerberos enabled cluster deploymentDeployment
Where to deploy ?
Automate the infrastructure creation & setupBigtop ProvisionerVirtualbox VMs auto-provisioningDocker containers auto-provisioningVirtualization
Thats good !
But I dont need that muchWhat else I can have?
Apache Hadoop App developersOne-click to run distributed cluster to test your code onCluster administrators or deployment gurusUse Bigtop Puppet to deploy and manage your clusterRun Bigtop tests to ensure your cluster is workingVendorsBuild your own Hadoop distribution, customized from Bigtop bitsFrom user point of view
Achieving Build Automation
As you can see, we support a number of Linux distributions (m)Whats the challenge ?
Numbers of components (n)Whats the challenge ?
We need to build them all > m * n
Preparing build environment
Seriously ?Preparing build environment
Puppet recipes automatically install required libraries, build toolsTo transform a machine into a build slave, simply do$ ./gradlew toolchainPrerequisites :Puppet 3.xJRE 7Bigtop Toolchain
Continuous Integration(CI)
CentOS slaveFedora slaveUbuntu slaveDebian slaveOpenSUSE slave
Continuous Integration(CI)
CentOS slaveFedora slaveUbuntu slaveDebian slaveOpenSUSE slave
Bigtop ToolchainBigtop ToolchainBigtop ToolchainBigtop ToolchainBigtop Toolchain
!?
We got Bigtop Toolchain for you.
Hi Bigtop, How do I test my packaging code?A contributorA committer
(Dont want to answer)
I need to setup all the Linux environments on my laptop?A contributorA committer
A very high level view : think it like a VMThree key techniques in DockerUse Linux Kernels Namespaces to create isolated resourcesUse Linux Kernels cgroups to constrain resource usageUnion file system that supports git-like image creationDocker - How it works
LightweightFast creationRepeatablePortabilityruns on *any* Linux environmentThe nice things about it
build, test, and run different Linux distribution in one single machineBest fit in our use case
OS specific Bins/Libs
Bigtop/slaves images
Dockerhub official imagesInstall Puppetbigtop/slaves
Install Bigtop Toolchain
Now, easy to build & test on your laptop
Linux
boot2docker
CentOS slave
Ubuntu slave
Mac OS XWindows
CentOS slave
Ubuntu slave
Docker engineDocker engine
Flexible CI infra
CentOS slaveFedora slaveUbuntu slaveDebian slaveOpenSUSE slave
One-Click Hadoop Provisioning
A tool to demonstrate the full life cycle of BigtopBigtop Provisioner
PackagingTestingDeploymentVirtualization
Spin up resourcesRun Bigtop PuppetRun Bigtop Tests
Bigtop Provisioner
Fast iterative developmentTest your code in the cluster, on your laptop, w/o human interventionFlexibilityChoose any combination of components as you wantResponsive CIIntegration tests that get you the result in minsA Big Data Stack playgroundSpark + Tachyon, Pig on Tez, etcThe goal
Vagrant+Automation Code+Bigtop Puppet
One-click Hadoop Provisioning
[Bigtop Provisionerw/ provider=
We use Vagrant as an abstraction layer to support different kinds of resource providersVagrant
Providers
One click Hadoop provisioning
./docker-hadoop.sh -c 3
bigtop/deploy images on Docker hub
./docker-hadoop.sh -c 3
One click Hadoop provisioning
bigtop/deploy images on Docker hub
./docker-hadoop.sh -c 3
puppet applypuppet applypuppet applyOne click Hadoop provisioning
Bigtop/deploy images
Dockerhub official imagesinstall Puppetbigtop/deploy
install Vagrant ssh key
Bigtop image hierarchy
Dockerhub official imagesbigtop/puppetbigtop/deploybigtop/slavesbigtop/puppet
=diff
All on the Docker hub
https://hub.docker.com/u/bigtop/
A demo is worth 1k words
In case my demo failhttps://asciinema.org/a/a7kiibouqav38xl5p2wrms51r
Supported providers in Bigtop 1.0.0 releaseVirtaulbox VMDocker containerOpenStack support is in master branch nowBigtop Provisioner
For Hadoop app developers, cluster admins, usersRun a distributed cluster to test your code onTweak & test configurations before applying to Production Play around with Bigtop Big Data StackFor Bigtop contributorsEasy to test your packaging, deployment, testing codeFor vendorsCI out of the box > patch upstream code made easierUse cases
Bigtop at Trend Micro
Use Bigtop as the basis for our internal custom distribution of HadoopApply community, internal patches to upstream projects for business and operational needNewest TMH7 is based on Bigtop 1.0 SNAPSHOTTrend Micro Hadoop (TMH)
Working with community made our life easier
Knowing community status made us being able to release TMH7 based on 1.0 SNAPSHOT
Working with community made our life easier
Contribute Bigtop Provisioner, packaging code, pupept recipes, bugfixes, CI setup, anything!Knowing community status made us being able to release TMH7 based on 1.0 SNAPSHOT
Working with community made our life easier
Running Bigtop smoke tests and integration tests using Bigtop Provisioner for TMH7
Working with community made our life easier
Contribute feedback, evaluation, and bugfixes through Production level adoptionRunning Bigtop smoke tests and integration tests using Bigtop Provisioner for TMH7
Hadoop YARNHadoop HDFSMapreduceAd-hoc Query UDFsPigApp AApp COozieResource ManagementStorageProcessing EngineAPIs andInterfasesIn-house AppsTrend Micro Big Data Stack Powered by BigtopKerberosApp BApp CHBaseWuji
Challenges were facing !
Were moving our cluster to another Data CenterNew cluster will be running TMH7Need Distcp to copy data from old TMH6 clusterBoth are Kerberos enabled secure clustersMigration x Upgrade
Distcp betweenTMH6 and TMH7 ?
From our long history we can say:
Things always get complicated with Kerberos
Multi-Cluster Live Synchronization with Kerberos Federated HadoopMammi Chang16:20~17:103F
Moving data across Production and Staging cluster in different Data Centers is a needDoing Distcp across 4 clusters !Our Hadoop deployment tool needs to support Distcop auto-configurationDeveloping the deployment tool is challenging !Hadoop cluster deployment
Run multiple secure HA clusters on DockerDevelop & test the deployment tool repeatedly Docker comes to the rescue
TMH6 ProductionTMH7 ProductionTMH6 StagingTMH7 Staging
Hadoop apps development on Docker
A Devops toolkit for Hadoop app developer to develop and test its code onImages with TMH7(Bigtop) already built-in> up and running w/o deploymentA Hadoop environment for apps to test against our new Hadoop distributionHadoocker
internal Docker registry
./execute.sh
Hadoop serverHadoop clientdataZero deployment Hadoop
TMH7
Hadoop appRestfull APIs
sample datahadoop fs put
Support apps to run end-to end CI against a Hadoop environmentSupport image creation from scratch all the way up to Hadoop pseudo cluster and your appsAllowing customization for your own environmentSame feature is also delivered in Vagrant boxeshttps://github.com/evans-ye/hadoockerFeatures
Summary
Bigtop helps you to build your own Big Data StackBuilding Bigtop packages made easier by offering immutable build environment through Docker imagesBigtop Provisioner creates you Hadoop cluster by just one clickUse Docker to simulate complex environment to ease development and testing effortsShip you apps in Docker image with zero deploymentSummary
Come and join the worldwide open source party !
Questions ?
Thank you !