a closer look to locaweb iaas - · pdf filea closer look to locaweb iaas ... cmdb power...
TRANSCRIPT
A closer look to Locaweb Iaas
Gleicon MoraesEngineering Manager PaaS/IaaS @gleicon - http://blog.7co.cc
Agenda
Agenda
• Engineering Team
Agenda
• Engineering Team
• IaaS
Agenda
• Engineering Team
• IaaS
• Virtual/Physical servers
Agenda
• Engineering Team
• IaaS
• Virtual/Physical servers
• Architecture
Agenda
• Engineering Team
• IaaS
• Virtual/Physical servers
• Architecture
• OSS
Agenda
• Engineering Team
• IaaS
• Virtual/Physical servers
• Architecture
• OSS
• Provisioning
Agenda
• Engineering Team
• IaaS
• Virtual/Physical servers
• Architecture
• OSS
• Provisioning
• CMDB/Closed Loop
Agenda
• Engineering Team
• IaaS
• Virtual/Physical servers
• Architecture
• OSS
• Provisioning
• CMDB/Closed Loop
• Resource usage gathering
Agenda
• Engineering Team
• IaaS
• Virtual/Physical servers
• Architecture
• OSS
• Provisioning
• CMDB/Closed Loop
• Resource usage gathering
• Software defined networks
Engineering Team
• We aim to be efficient
• DC and IaaS Automation
• IaaS and PaaS products
• Email and Domain Registration products
• Coffee/psychological help/counseling
• 40 people team (devs/architects/1 master devops)
IaaS - NIST definition
“ The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications.
The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).”
* http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
IaaS - Wikipedia
“In this most basic cloud service model, cloud providers offer computers, as physical or more often as virtual machines, and other resources. The virtual machines are run as guests by a hypervisor, such as Xen or KVM. Management of pools of hypervisors by the cloud operational support system leads to the ability to scale to support a large number of virtual machines. Other resources in IaaS clouds include images in a virtual machine image library, raw (block) and file-based storage, firewalls, load balancers, IP addresses, virtual local area networks (VLANs), and software bundles.[46] IaaS cloud providers supply these resources on demand from their large pools installed in data centers. For wide area connectivity, the Internet can be used or—in carrier clouds -- dedicated virtual private networks can be configured.”
* http://en.wikipedia.org/wiki/Infrastructure_as_a_service#Service_models
IaaS - tl;dr
“Automate infrastructure such as the customer will not know the underlying details, will not manage them and can provision services automagically.”
IaaS - building blocks
• Servers: virtual and physical
• Storage area
• Network devices: firewall, switches, load balancer
IaaS - High Level
• Automation
• Resource Management
• Install, Uninstall, Migrate
• High Availability, Scalability, Capacity Planning
IaaS at Locaweb• 3 DCs, 6k Servers (physical), 1k storages 6PB area,
12K network equipments/ports, > 100 Km of cables
• 10k VMs, 3.2M email accounts, 250k hosting customers, ~500k sites, ~600k DB
• 130 people at day to day 24/7 Operations team (from DC basics to managing apps and platforms), < 40 sysadmins
• Currently ~ 18 people from Engineering team taking care of IaaS
Virtual and Physical
• Single tenant per Physical Server
• Single tenant per VM
• Multiple tenants per VM
• Multiple tenants per Physical Server
• Multiple VMs per Physical Server
Virtual and Physical
• Single tenant per Physical Server
• Single tenant per VM
• Multiple tenants per VM
• Multiple tenants per Physical Server
• Multiple VMs per Physical Server = Cloud
Cloud
• Check back NIST definition
• Hypervisor + set of servers + set of storages + network = time sharing
• Distinct capacity planning than physical servers
• Flexible configuration options
• Vertical Scaling
• Horizontal scaling
Architecture - Cloud
Simplestack SimpleNet/Quantum
FirewallNetwork Gear
Physical Servershypervisor
Main Network
Internet
ovs
Architecture - Physical
Simplestack SimpleNet/Quantum
FirewallNetwork Gear
Physical Servers
Main Network
Internet
Why not ?
OSS
OSS
• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs
OSS
• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs
• Up-to-date technology
OSS
• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs
• Up-to-date technology
• No lock-ins
OSS
• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs
• Up-to-date technology
• No lock-ins
• Vendor neutral
OSS
• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs
• Up-to-date technology
• No lock-ins
• Vendor neutral
• We contribute back
Our projects
Our projects• Leela - Data collection monster
Our projects• Leela - Data collection monster
• SimpleStack - Provisioning made easy
Our projects• Leela - Data collection monster
• SimpleStack - Provisioning made easy
• SimpleNet - OVS and FW controller
Our projects• Leela - Data collection monster
• SimpleStack - Provisioning made easy
• SimpleNet - OVS and FW controller
• NET/L2 - Controller/Inventory for network equipment
Our projects• Leela - Data collection monster
• SimpleStack - Provisioning made easy
• SimpleNet - OVS and FW controller
• NET/L2 - Controller/Inventory for network equipment
• BrickLayer - packaging for normal people
Our projects• Leela - Data collection monster
• SimpleStack - Provisioning made easy
• SimpleNet - OVS and FW controller
• NET/L2 - Controller/Inventory for network equipment
• BrickLayer - packaging for normal people
• Logix - Graylog2 message bus for log streams
Our projects• Leela - Data collection monster
• SimpleStack - Provisioning made easy
• SimpleNet - OVS and FW controller
• NET/L2 - Controller/Inventory for network equipment
• BrickLayer - packaging for normal people
• Logix - Graylog2 message bus for log streams
• xenapi-ruby - XEN API bindings for Ruby
Our projects• Leela - Data collection monster
• SimpleStack - Provisioning made easy
• SimpleNet - OVS and FW controller
• NET/L2 - Controller/Inventory for network equipment
• BrickLayer - packaging for normal people
• Logix - Graylog2 message bus for log streams
• xenapi-ruby - XEN API bindings for Ruby
• otto, debundler, bpmachine and more each week
Our Contributions
Our Contributions
• Contributed to Quantum, from Openstack
Our Contributions
• Contributed to Quantum, from Openstack
• Snorby/snort contributions
Our Contributions
• Contributed to Quantum, from Openstack
• Snorby/snort contributions
• Mod_security for Nginx and helping on IIS
Our Contributions
• Contributed to Quantum, from Openstack
• Snorby/snort contributions
• Mod_security for Nginx and helping on IIS
• hired consulting from grsecurity and dovecot teams - we support OSS companies
Bricklayer
Bricklayer
• First opensource project from Locaweb
Bricklayer
• First opensource project from Locaweb
• Package builder (deb + rpm) straight from git
Bricklayer
• First opensource project from Locaweb
• Package builder (deb + rpm) straight from git
• 150+ projects, 500+ builds/day
Bricklayer
• First opensource project from Locaweb
• Package builder (deb + rpm) straight from git
• 150+ projects, 500+ builds/day
• tag your project, get the packages done and on repositories
Logix
Logix
• We have lots of logs. Everything broke.
Logix
• We have lots of logs. Everything broke.
• 26.753.205.474 lines of log/day
Logix
• We have lots of logs. Everything broke.
• 26.753.205.474 lines of log/day
• Highly distributed: local syslog daemon to RabbitMQ
Logix
• We have lots of logs. Everything broke.
• 26.753.205.474 lines of log/day
• Highly distributed: local syslog daemon to RabbitMQ
• Elastic search + graylog2 to store, search and filter
Provisioning
• Ruby: Panel, Control panel, Scheduler
• Python: Provisioning, Server management, Metric collection
• REST APIs to Hypervisor, Network, Firewall, XMPP
Provisioning - Cloud
Simplestack SimpleNet/Quantum
FirewallNetwork Gear
Physical Servershypervisor
Main Network
Internet
ovsProvisioner
Control Panel API Sales
Cloud
Provisioning - Servers
Simplestack SimpleNet/Quantum
FirewallNetwork Gear
Physical Servershypervisor
Main Network
Internet
ovs
Closed Loop Racked Servers
Control Panel
API
Sales
Dedicated Servers
Provisioning - Managed Servers
Simplestack SimpleNet/Quantum
FirewallNetwork Gear
Physical Servershypervisor
Main Network
Internet
ovsProvisioner
Control Panel API Sales
Control Panel Sales
PaaS Provisioner Closed Loop Racked Servers
Control Panel
API
Sales
Managed Servers
Dedicated Servers
Cloud
Cloud provisioner
CMDB
Resque
simplestack
core
API
quantum/simplenet
FW
DHCP
consoleNotifications Leela
Control Panel
Jobs
Sales
Closed loop
Futurama
APICMDB
Cobbler
Conductor Network
Hardware
The closed loop process
Closed loop
Closed loop
• All servers get racked, wired, tested and configured
Closed loop
• All servers get racked, wired, tested and configured
• Power management discovery
Closed loop
• All servers get racked, wired, tested and configured
• Power management discovery
• Network configuration
Closed loop
• All servers get racked, wired, tested and configured
• Power management discovery
• Network configuration
• OS install: Windows, Linux and OpenSolaris aware
Closed loop
• All servers get racked, wired, tested and configured
• Power management discovery
• Network configuration
• OS install: Windows, Linux and OpenSolaris aware
• Server life cycle: once deactivated it gets back to the pool to be used again
CMDB
Power audit
Futurama
Product provisioners
IT chg management
IP provisioning
Server provisioning
SAP
Controllers
API
Database
Resque
Ops FrontendNET/L2
Futurama
Conductor-audit
CF-Agent
Leela-agent bkp-agent
Server side Planet Express
CFEngine
Leela-Server
Management
Cegonha
Asdrubal
CFTools
CMDB
Resource Metering and Monitoring - Leela
Leela-Lasergun
Leela-agentLeela-Reader
API
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Resource Metering and Monitoring - Leela
Resource Metering and Monitoring - Leela
• 18k writes/sec
Resource Metering and Monitoring - Leela
• 18k writes/sec
• 6 TB total per cluster
Resource Metering and Monitoring - Leela
• 18k writes/sec
• 6 TB total per cluster
• 13 baseline metrics + 68 distinct metrics
Resource Metering and Monitoring - Leela
• 18k writes/sec
• 6 TB total per cluster
• 13 baseline metrics + 68 distinct metrics
• ~600GB/mo
Resource Metering and Monitoring - Leela
• 18k writes/sec
• 6 TB total per cluster
• 13 baseline metrics + 68 distinct metrics
• ~600GB/mo
• 1M keys (~5k servers)
Resource Metering and Monitoring - Leela
• 18k writes/sec
• 6 TB total per cluster
• 13 baseline metrics + 68 distinct metrics
• ~600GB/mo
• 1M keys (~5k servers)
• Write latency: 15 us
Resource Metering and Monitoring - Leela
• 18k writes/sec
• 6 TB total per cluster
• 13 baseline metrics + 68 distinct metrics
• ~600GB/mo
• 1M keys (~5k servers)
• Write latency: 15 us
• Read latency: 1s to read 1mo worth of data
Resource Metering and Monitoring - Leela
• 18k writes/sec
• 6 TB total per cluster
• 13 baseline metrics + 68 distinct metrics
• ~600GB/mo
• 1M keys (~5k servers)
• Write latency: 15 us
• Read latency: 1s to read 1mo worth of data
• Down to minute resolution
Resource Metering and Monitoring - Leela
• 18k writes/sec
• 6 TB total per cluster
• 13 baseline metrics + 68 distinct metrics
• ~600GB/mo
• 1M keys (~5k servers)
• Write latency: 15 us
• Read latency: 1s to read 1mo worth of data
• Down to minute resolution
• http://leela.readthedocs.org/en/latest/intro/archnut.html
Resource Metering and Monitoring - Leela
Resource Metering and Monitoring - Leela
• Map/Reduce with SQL like interface:
Resource Metering and Monitoring - Leela
• Map/Reduce with SQL like interface:
- SELECT mov_avg_samples = 7 (function)
Resource Metering and Monitoring - Leela
• Map/Reduce with SQL like interface:
- SELECT mov_avg_samples = 7 (function)
- FROM cpro9559.cpu.cpu8.idle (metric)
Resource Metering and Monitoring - Leela
• Map/Reduce with SQL like interface:
- SELECT mov_avg_samples = 7 (function)
- FROM cpro9559.cpu.cpu8.idle (metric)
-WHERE timestamp >= 1346279003 (timeframe)
Resource Metering and Monitoring - Leela
Resource Metering and Monitoring - Leela
• Create charts
Resource Metering and Monitoring - Leela
• Create charts
- var widget = LEELA.widget(jQuery.(“#target”));
Resource Metering and Monitoring - Leela
• Create charts
- var widget = LEELA.widget(jQuery.(“#target”));
- jQuery.ajax(“/v1/pastweek/cpro9559.cpu.cpu8.idle”, {dataType: “jsonp”, success: widget.render});
Software defined network
Software defined network
• Traditional equipment: local config and controller
Software defined network
• Traditional equipment: local config and controller
• SDN: flows (commands), openflow 1.0, central controller, distributed data plane
Software defined network
• Traditional equipment: local config and controller
• SDN: flows (commands), openflow 1.0, central controller, distributed data plane
• Abstraction over VLANs with ACLs, Tunnels or even VLAN QoQ
Software defined network
OpenVSwitchControllerAPI
Control path
Data path (hardware)
Openflow
Switch Vendor A
Control path
Data path (hardware)
Openflow
Switch Vendor B
Software defined network
Cisco
Quantum
Force 10 HP OpenVSwitch
Net/L2
Firewalls
CMDB
API
Ruby @ Locaweb Not only for front-end
?
Thanks !