putting hadoop on any cloud big data spain
DESCRIPTION
The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. Now more than ever, there is a growing number of choices of cloud infrastructure providers, from Amazon AWS, OpenStack offered by the likes of HP, Rackspace and soon even Dell, VMware vCloud as well a... INCLUDING - Effectively managing your Hadoop stack in any data center (on-premise, cloud, hybrid…) - Maintaining the flexibility to choose the right cloud for the job in an ever-changing environment - Consistently manage your hadoop deployment with other elements of your Big Data system such as NoSQL DB, Web Tier etc.TRANSCRIPT
The Elephant
in the Cloud
Putting Hadoop on Any Cloud
@natishalom
Columbus & The Cloud
THE DISCOVERY OF AMERICA THE THING THAT MADE IT POSSIBLE
Why Cloud Portability
Matters
Cloud Portability Myth #1
No one really needs cloud portability
Cloud Portability
Facts
Zynga moved ~80% of their workload from Amazon to their private zCloud
“own the base, rent the spike”
http://code.zynga.com/2012/02/the-evolution-of-zcloud/
Cloud Portability
Facts Started with Linode, then moved to RackSpace, then to AWS
http://code.mixpanel.com/2010/11/08/amazon-vs-rackspace/
Cloud Portability
Facts
• You want the flexibility to choose what’s right for you, when it’s right for you
• Based on pricing, features, availability, performance, etc.
Cloud Portability Myth #2
Cloud Portability ==
Cloud API Standardization
Cloud APIs, Today
Standard APIs (?)OCCIVCloud
OSS FrameworksOpenStackCloudStackEucalyptus
Abstraction frameworksJCloudsDeltacloudFogLibvirt
Cloud APIs, Today
Standard APIsNot practical in the foreseeable future
OSS Projects Need a couple more years to converge &
mature
Abstraction FrameworksProbably the only
practical (near-term) option
Realization:
What You Really Care
about Is App
Portability
OS is the same on any cloud
Most clouds have compute & storage
Elasticity & scaling have same effects on the app, regardless of the cloud
Cloud Portability Myth #3 All infrastructure
clouds were born equal
Food for Thought
Offerings can vary quite a bit:
• Amazon guarantees only 99.5% uptime
• RackSpace will give you $$$ every time they crash
• Joyent claims to be significantly faster than both
And Some Features Are
Unique…
Amazon the only major vendor to offer SSD storage. Netflix says it’s:
• ½ the price for the same throughput
• ⅕ the latency on avg.
• Even slowest requests are 6x faster
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Let’s Talk Big Data on the Cloud
A Typical Big Data App…
Managing Big Data on the
Cloud
• Auto start VMs• Install and configure
app components • Monitor • Repair • (Auto) Scale• Burst…
The Challenges ..
Consistent Management
Making the deployment, installation, scaling, fail-over looks the same through the entire stack
The Challenges (Cont)..
Cloud Portability
Choosing the Right Cloud for the Job
Running Bare-Metal for high I/O workload, Public cloud for sporadic workloads..
Hadoop
• Available under different distributions
• Cloudera• IBM BigInsights• MapR• Hortonworks
Big Data Apps, on Any Cloud, Your Way
Open source (Apache2)
Putting Cloudify and
Hadoop Together
• Run on Any Cloud• Consistent MGT• Dynamic Scaling • Auto Recovery• Auto Scaling• Role Assignments • Monitoring• Simple maintenance
How it works..1 Upload your recipe.
2 Cloudify creates VM’s & installs agents
3 Agents install and manage your app
4 Cloudify automate the scaling
Few Snippets..
Thank You!
References: http://www.cloudifysource.org http://github.com/CloudifySource https://github.com/CloudifySource/cloudify-recipes/tree/master/services/biginsights