a closer look to locaweb iaas - · pdf filea closer look to locaweb iaas ... cmdb power...

Post on 28-Mar-2018

223 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A closer look to Locaweb Iaas

Gleicon MoraesEngineering Manager PaaS/IaaS @gleicon - http://blog.7co.cc

Agenda

Agenda

• Engineering Team

Agenda

• Engineering Team

• IaaS

Agenda

• Engineering Team

• IaaS

• Virtual/Physical servers

Agenda

• Engineering Team

• IaaS

• Virtual/Physical servers

• Architecture

Agenda

• Engineering Team

• IaaS

• Virtual/Physical servers

• Architecture

• OSS

Agenda

• Engineering Team

• IaaS

• Virtual/Physical servers

• Architecture

• OSS

• Provisioning

Agenda

• Engineering Team

• IaaS

• Virtual/Physical servers

• Architecture

• OSS

• Provisioning

• CMDB/Closed Loop

Agenda

• Engineering Team

• IaaS

• Virtual/Physical servers

• Architecture

• OSS

• Provisioning

• CMDB/Closed Loop

• Resource usage gathering

Agenda

• Engineering Team

• IaaS

• Virtual/Physical servers

• Architecture

• OSS

• Provisioning

• CMDB/Closed Loop

• Resource usage gathering

• Software defined networks

Engineering Team

• We aim to be efficient

• DC and IaaS Automation

• IaaS and PaaS products

• Email and Domain Registration products

• Coffee/psychological help/counseling

• 40 people team (devs/architects/1 master devops)

IaaS - NIST definition

“ The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications.

The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).”

* http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

IaaS - Wikipedia

“In this most basic cloud service model, cloud providers offer computers, as physical or more often as virtual machines, and other resources. The virtual machines are run as guests by a hypervisor, such as Xen or KVM. Management of pools of hypervisors by the cloud operational support system leads to the ability to scale to support a large number of virtual machines. Other resources in IaaS clouds include images in a virtual machine image library, raw (block) and file-based storage, firewalls, load balancers, IP addresses, virtual local area networks (VLANs), and software bundles.[46] IaaS cloud providers supply these resources on demand from their large pools installed in data centers. For wide area connectivity, the Internet can be used or—in carrier clouds -- dedicated virtual private networks can be configured.”

* http://en.wikipedia.org/wiki/Infrastructure_as_a_service#Service_models

IaaS - tl;dr

“Automate infrastructure such as the customer will not know the underlying details, will not manage them and can provision services automagically.”

IaaS - building blocks

• Servers: virtual and physical

• Storage area

• Network devices: firewall, switches, load balancer

IaaS - High Level

• Automation

• Resource Management

• Install, Uninstall, Migrate

• High Availability, Scalability, Capacity Planning

IaaS at Locaweb• 3 DCs, 6k Servers (physical), 1k storages 6PB area,

12K network equipments/ports, > 100 Km of cables

• 10k VMs, 3.2M email accounts, 250k hosting customers, ~500k sites, ~600k DB

• 130 people at day to day 24/7 Operations team (from DC basics to managing apps and platforms), < 40 sysadmins

• Currently ~ 18 people from Engineering team taking care of IaaS

Virtual and Physical

• Single tenant per Physical Server

• Single tenant per VM

• Multiple tenants per VM

• Multiple tenants per Physical Server

• Multiple VMs per Physical Server

Virtual and Physical

• Single tenant per Physical Server

• Single tenant per VM

• Multiple tenants per VM

• Multiple tenants per Physical Server

• Multiple VMs per Physical Server = Cloud

Cloud

• Check back NIST definition

• Hypervisor + set of servers + set of storages + network = time sharing

• Distinct capacity planning than physical servers

• Flexible configuration options

• Vertical Scaling

• Horizontal scaling

Architecture - Cloud

Simplestack SimpleNet/Quantum

FirewallNetwork Gear

Physical Servershypervisor

Main Network

Internet

ovs

Architecture - Physical

Simplestack SimpleNet/Quantum

FirewallNetwork Gear

Physical Servers

Main Network

Internet

Why not ?

OSS

OSS

• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs

OSS

• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs

• Up-to-date technology

OSS

• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs

• Up-to-date technology

• No lock-ins

OSS

• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs

• Up-to-date technology

• No lock-ins

• Vendor neutral

OSS

• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs

• Up-to-date technology

• No lock-ins

• Vendor neutral

• We contribute back

Our projects

http://locaweb.github.com

Our projects

Our projects• Leela - Data collection monster

• SimpleStack - Provisioning made easy

• SimpleNet - OVS and FW controller

• NET/L2 - Controller/Inventory for network equipment

• BrickLayer - packaging for normal people

• Logix - Graylog2 message bus for log streams

Our projects• Leela - Data collection monster

• SimpleStack - Provisioning made easy

• SimpleNet - OVS and FW controller

• NET/L2 - Controller/Inventory for network equipment

• BrickLayer - packaging for normal people

• Logix - Graylog2 message bus for log streams

• xenapi-ruby - XEN API bindings for Ruby

Our projects• Leela - Data collection monster

• SimpleStack - Provisioning made easy

• SimpleNet - OVS and FW controller

• NET/L2 - Controller/Inventory for network equipment

• BrickLayer - packaging for normal people

• Logix - Graylog2 message bus for log streams

• xenapi-ruby - XEN API bindings for Ruby

• otto, debundler, bpmachine and more each week

Our Contributions

Our Contributions

• Contributed to Quantum, from Openstack

• Snorby/snort contributions

• Mod_security for Nginx and helping on IIS

Our Contributions

• Contributed to Quantum, from Openstack

• Snorby/snort contributions

• Mod_security for Nginx and helping on IIS

• hired consulting from grsecurity and dovecot teams - we support OSS companies

Bricklayer

Bricklayer

• First opensource project from Locaweb

Bricklayer

• First opensource project from Locaweb

• Package builder (deb + rpm) straight from git

Bricklayer

• First opensource project from Locaweb

• Package builder (deb + rpm) straight from git

• 150+ projects, 500+ builds/day

Bricklayer

• First opensource project from Locaweb

• Package builder (deb + rpm) straight from git

• 150+ projects, 500+ builds/day

• tag your project, get the packages done and on repositories

Logix

Logix

• We have lots of logs. Everything broke.

Logix

• We have lots of logs. Everything broke.

• 26.753.205.474 lines of log/day

Logix

• We have lots of logs. Everything broke.

• 26.753.205.474 lines of log/day

• Highly distributed: local syslog daemon to RabbitMQ

Logix

• We have lots of logs. Everything broke.

• 26.753.205.474 lines of log/day

• Highly distributed: local syslog daemon to RabbitMQ

• Elastic search + graylog2 to store, search and filter

Provisioning

• Ruby: Panel, Control panel, Scheduler

• Python: Provisioning, Server management, Metric collection

• REST APIs to Hypervisor, Network, Firewall, XMPP

Provisioning - Cloud

Simplestack SimpleNet/Quantum

FirewallNetwork Gear

Physical Servershypervisor

Main Network

Internet

ovsProvisioner

Control Panel API Sales

Cloud

Provisioning - Servers

Simplestack SimpleNet/Quantum

FirewallNetwork Gear

Physical Servershypervisor

Main Network

Internet

ovs

Closed Loop Racked Servers

Control Panel

API

Sales

Dedicated Servers

Provisioning - Managed Servers

Simplestack SimpleNet/Quantum

FirewallNetwork Gear

Physical Servershypervisor

Main Network

Internet

ovsProvisioner

Control Panel API Sales

Control Panel Sales

PaaS Provisioner Closed Loop Racked Servers

Control Panel

API

Sales

Managed Servers

Dedicated Servers

Cloud

Cloud provisioner

CMDB

Resque

simplestack

core

API

quantum/simplenet

FW

DHCP

consoleNotifications Leela

Control Panel

Jobs

Sales

Closed loop

Futurama

APICMDB

Cobbler

Conductor Network

Hardware

The closed loop process

Closed loop

Closed loop

• All servers get racked, wired, tested and configured

Closed loop

• All servers get racked, wired, tested and configured

• Power management discovery

Closed loop

• All servers get racked, wired, tested and configured

• Power management discovery

• Network configuration

Closed loop

• All servers get racked, wired, tested and configured

• Power management discovery

• Network configuration

• OS install: Windows, Linux and OpenSolaris aware

Closed loop

• All servers get racked, wired, tested and configured

• Power management discovery

• Network configuration

• OS install: Windows, Linux and OpenSolaris aware

• Server life cycle: once deactivated it gets back to the pool to be used again

CMDB

Power audit

Futurama

Product provisioners

IT chg management

IP provisioning

Server provisioning

SAP

Controllers

API

Database

Resque

Ops FrontendNET/L2

Futurama

Conductor-audit

CF-Agent

Leela-agent bkp-agent

Server side Planet Express

CFEngine

Leela-Server

Management

Cegonha

Asdrubal

CFTools

CMDB

Resource Metering and Monitoring - Leela

Leela-Lasergun

Leela-agentLeela-Reader

API

Cassandra

Cassandra

Cassandra

Cassandra

Cassandra

Cassandra

Resource Metering and Monitoring - Leela

Resource Metering and Monitoring - Leela

• 18k writes/sec

Resource Metering and Monitoring - Leela

• 18k writes/sec

• 6 TB total per cluster

Resource Metering and Monitoring - Leela

• 18k writes/sec

• 6 TB total per cluster

• 13 baseline metrics + 68 distinct metrics

Resource Metering and Monitoring - Leela

• 18k writes/sec

• 6 TB total per cluster

• 13 baseline metrics + 68 distinct metrics

• ~600GB/mo

Resource Metering and Monitoring - Leela

• 18k writes/sec

• 6 TB total per cluster

• 13 baseline metrics + 68 distinct metrics

• ~600GB/mo

• 1M keys (~5k servers)

Resource Metering and Monitoring - Leela

• 18k writes/sec

• 6 TB total per cluster

• 13 baseline metrics + 68 distinct metrics

• ~600GB/mo

• 1M keys (~5k servers)

• Write latency: 15 us

Resource Metering and Monitoring - Leela

• 18k writes/sec

• 6 TB total per cluster

• 13 baseline metrics + 68 distinct metrics

• ~600GB/mo

• 1M keys (~5k servers)

• Write latency: 15 us

• Read latency: 1s to read 1mo worth of data

Resource Metering and Monitoring - Leela

• 18k writes/sec

• 6 TB total per cluster

• 13 baseline metrics + 68 distinct metrics

• ~600GB/mo

• 1M keys (~5k servers)

• Write latency: 15 us

• Read latency: 1s to read 1mo worth of data

• Down to minute resolution

Resource Metering and Monitoring - Leela

• 18k writes/sec

• 6 TB total per cluster

• 13 baseline metrics + 68 distinct metrics

• ~600GB/mo

• 1M keys (~5k servers)

• Write latency: 15 us

• Read latency: 1s to read 1mo worth of data

• Down to minute resolution

• http://leela.readthedocs.org/en/latest/intro/archnut.html

Resource Metering and Monitoring - Leela

Resource Metering and Monitoring - Leela

• Map/Reduce with SQL like interface:

Resource Metering and Monitoring - Leela

• Map/Reduce with SQL like interface:

- SELECT mov_avg_samples = 7 (function)

Resource Metering and Monitoring - Leela

• Map/Reduce with SQL like interface:

- SELECT mov_avg_samples = 7 (function)

- FROM cpro9559.cpu.cpu8.idle (metric)

Resource Metering and Monitoring - Leela

• Map/Reduce with SQL like interface:

- SELECT mov_avg_samples = 7 (function)

- FROM cpro9559.cpu.cpu8.idle (metric)

-WHERE timestamp >= 1346279003 (timeframe)

Resource Metering and Monitoring - Leela

Resource Metering and Monitoring - Leela

• Create charts

Resource Metering and Monitoring - Leela

• Create charts

- var widget = LEELA.widget(jQuery.(“#target”));

Resource Metering and Monitoring - Leela

• Create charts

- var widget = LEELA.widget(jQuery.(“#target”));

- jQuery.ajax(“/v1/pastweek/cpro9559.cpu.cpu8.idle”, {dataType: “jsonp”, success: widget.render});

Software defined network

Software defined network

• Traditional equipment: local config and controller

Software defined network

• Traditional equipment: local config and controller

• SDN: flows (commands), openflow 1.0, central controller, distributed data plane

Software defined network

• Traditional equipment: local config and controller

• SDN: flows (commands), openflow 1.0, central controller, distributed data plane

• Abstraction over VLANs with ACLs, Tunnels or even VLAN QoQ

Software defined network

OpenVSwitchControllerAPI

Control path

Data path (hardware)

Openflow

Switch Vendor A

Control path

Data path (hardware)

Openflow

Switch Vendor B

Software defined network

Cisco

Quantum

Force 10 HP OpenVSwitch

Net/L2

Firewalls

CMDB

API

Ruby @ Locaweb Not only for front-end

?

Thanks !

top related