openstack in the enterprise - are you ready? - maish saidel-keesing

OpenStack in the Enterprise

Maish Saidel-Keesing

Platform Architect - Cisco

June 2, 2014

Are you ready?

2 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

@maishsk

Technodrone (http://technodrone.blogspot.com)

A little bit about me

http://twitter.com/maishsk

http://twitter.com/maishsk

http://technodrone.blogspot.com/


This is not an OpenStack Bashing session

I really like OpenStack

This is supposed to be an eye-opener

And have I said I really like OpenStack?

Disclaimer


Where are we today?

Enterprise Deployments

Place for improvement

Today’s Agenda


OpenStack has grown up

9th Release

OpenStack Summit ~4,500 attendees

Where are we today?


Multiple deployments/distro’s

Cisco (COI)

HP (Helion)

Redhat (RHOS)

Mirantis (Fuel)

Piston

RackSpace (Private Cloud)

IBM (Smartcloud)

Where are we today? #2

http://docwiki.cisco.com/wiki/OpenStack

http://www8.hp.com/us/en/cloud/helion-overview.html

http://openstack.redhat.com/Main_Page

http://software.mirantis.com/

http://www.pistoncloud.com/

http://www.rackspace.com/cloud/private/

http://www.ibm.com/cloud-computing/us/en/private-cloud.html


• How do I keep my Management stack running smoothly?

• How do I upgrade?

• Rapid release cycles (every 6 months)

• No Downtime during upgrades

• Support


The bible (Introduction to OpenStack High Availability)

The manual process is not simple

Automation tools alleviate this (partially)

HA is not the same for all components

Active/Active

Active/Passive

There is no single best way to do it

OpenStack HA

http://docs.openstack.org/high-availability-guide/content/ch-intro.html




An OpenStack service that provides HA for underlying components

New service graduates from incubation

Adding HA is a breeze

Eventually... Hopefully… One day…

Incubated project

Install component

OpenStack HA Service Component

is HA


Not a smooth process

It is getting better (Nova improvements in Icehouse)

Not always backward compatible

Upgrade paths between older versions don’t always work

It is not uncommon to see people running: Cactus, Diablo, Essex, Folsom, Grizzly, Icehouse All in one datacenter.

Ready for an upgrade?


Patches are provided for 2 previous releases

Perhaps an LTS version in the future? (Redhat are already going in that direction)

Introduction of a new release

Testing

Deployment plan

Implementation

Stabilize

Release Cycles and Why We Are Chasing Our Tails?

And there is a new version every 6 months

http://technodrone.blogspot.com/2012/09/release-cycles-and-why-we-are-chasing.html


Enterprises – want Enterprise support

Not everyone can provide the support themselves

If your environment crashed – you will want someone on the line Yesterday!!

Who do I release my wrath upon?


Is your enterprise Cloud ready?


Backup

The management cluster should be relatively simple to rebuild – with automation

Tenants and their workloads

Is this an issue?

Replication

Not something that can be easily provided today (There are things in the works)

DR

Nothing today.

Services provided by you today.


Monitoring Ceilometer

How do I get the relevant information out of it.

Not everything is being measured Volume metrics

Cumulative uptime



General rules for loglevels: Critical: Shit's on fire, yo. Expected, known issue where things will break and

bad.

Error: Standard unexpected error trap - final, top-level error trap should dump the message to ERROR.

Also, known error cases that someone should handle that aren't necessarily "the world is exploding"

Warn: expected error conditions that might be an issue, but not huge problems. Example at session: Glance's error at startup that it can't find a storage device ID (which is currently error, should be warn)

Info: Standard operational logging: VM request received, scheduled to launch on hypervisor X

Debug: What's going on under the hood. So you can trace down origins of errors - shouldn't have to be on by default

Trace: Super debug. Method-level logging, or some otherwise extra-detailed info like slightly sanitized api conversations

Logging as an example

https://etherpad.openstack.org/p/juno-summit-ops-monitoringlogging


Auditing & Compliance Who did what

And when

“detecting the tenants who added "allow all" rules to essentially turn off security groups”

Can this workload run in this cluster?

If not – then what?

Shut it down?

Move to correct location?

Notify the president?????



• There are several gaps that need to be addressed

• Great work is being done - there is still more to accomplish

• It is all a question of how much you are willing to be flexible? How much responsibility you are willing to take upon yourself?

• Not everything should (or can) run in OpenStack

Thank you!

openstack in the enterprise - are you ready? - maish saidel-keesing

Technology