putting hadoop on any cloud. nati shalom at big data spain 2012
DESCRIPTION
Session presented at Big Data Spain 2012 Conference 16th Nov 2012 ETSI Telecomunicacion UPM Madrid www.bigdataspain.org More info: http://www.bigdataspain.org/es-2012/conference/putting-hadoop-cloud/nati-shalomTRANSCRIPT
The Elephant
in the Cloud
Putting Hadoop on Any Cloud
@natishalom
Columbus & The Cloud
THE DISCOVERY OF AMERICA THE THING THAT MADE IT POSSIBLE
Why Cloud Portability
Matters
Cloud Portability Myth #1
No one really needs cloud portability
Cloud Portability
Facts
Zynga moved ~80% of their workload from Amazon to their private zCloud
“own the base, rent the spike”
http://code.zynga.com/2012/02/the-evolution-of-zcloud/
Cloud Portability
Facts Started with Linode, then moved to RackSpace, then to AWS
http://code.mixpanel.com/2010/11/08/amazon-vs-rackspace/
Cloud Portability
Facts
• You want the flexibility to choose what’s right for you, when it’s right for you
• Based on pricing, features, availability, performance, etc.
Cloud Portability Myth #2
Cloud Portability ==
Cloud API Standardization
Cloud APIs, Today
Standard APIs (?)OCCIVCloud
OSS FrameworksOpenStackCloudStackEucalyptus
Abstraction frameworksJCloudsDeltacloudFogLibvirt
Cloud APIs, Today
Standard APIsNot practical in the foreseeable future
OSS Projects Need a couple more years to converge &
mature
Abstraction FrameworksProbably the only
practical (near-term) option
Realization:
What You Really Care
about Is App
Portability
OS is the same on any cloud
Most clouds have compute & storage
Elasticity & scaling have same effects on the app, regardless of the cloud
Cloud Portability Myth #3 All infrastructure
clouds were born equal
Food for Thought
Offerings can vary quite a bit:
• Amazon guarantees only 99.5% uptime
• RackSpace will give you $$$ every time they crash
• Joyent claims to be significantly faster than both
And Some Features Are
Unique…
Amazon the only major vendor to offer SSD storage. Netflix says it’s:
• ½ the price for the same throughput
• ⅕ the latency on avg.
• Even slowest requests are 6x faster
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
@uri1803
Let’s Talk Big Data on the Cloud
A Typical Big Data App…
Managing Big Data on the
Cloud
• Auto start VMs• Install and configure
app components • Monitor • Repair • (Auto) Scale• Burst…
The Challenges ..
Consistent Management
Making the deployment, installation, scaling, fail-over looks the same through the entire stack
The Challenges (Cont)..
Cloud Portability
Choosing the Right Cloud for the Job
Running Bare-Metal for high I/O workload, Public cloud for sporadic workloads..
Hadoop
• Available under different distributions
• Cloudera• IBM BigInsights• MapR• Hortonworks
Big Data Apps, on Any Cloud, Your Way
Open source (Apache2)
Putting Cloudify and
Hadoop Together
• Run on Any Cloud• Consistent MGT• Dynamic Scaling • Auto Recovery• Auto Scaling• Role Assignments • Monitoring• Simple maintenance
Few Snippets..
Thank You!
References: http://www.cloudifysource.org http://github.com/CloudifySource https://github.com/CloudifySource/cloudify-recipes/tree/master/services/biginsights