how logging makes a private cloud a better cloud - openstack最新情報セミナー(2016年12月)

How logging makes a private cloud a better cloudDec/01/2016Kentaro SasakiGlobal Operations Department, Rakuten, Inc.

Rakuten is …a Tokyo-based e-commerce and Internet company

Rakuten EcosystemThe Rakuten Ecosystem and our membership database form the foundation of our business

Membership116.52 Million persons

Gross Transaction Volume7.6 Trillion JPY

Logging Infrastructure for Private Cloud

Private Cloud at Rakuten

Timeline of Private Cloud History

Hypervisor: XenOS Instances: 2,000+Management features from scratch

Hypervisor: KVMUse OpenStack API

2015Gen3

2012Gen2

2010Gen1

Hypervisor: VMware ESXiOS Instances: 25,000+Management features from scratch

Logging Matters

Benefits Logging enables log visualization Get easier to analysis and debugging

From a business point of viewShorten the time spent on troubleshooting Leads to a better Customer Support

AssumptionsMessages might be un-manageableIncreasing logs require huge log storage

ConcernsHow to take care of data lossHow to parse data from different sources

Log Management

High AvailabilityAvailability, Redundancy and Scalability

MaintainabilityMinimum data loss and operation overhead

Huge Number of TargetsHundreds of Hypervisors (ESXi & KVM)Tens of thousands of VMs

Cover many sort of logSplunk is suited for log analyticsNeed Time-series DB for performance logs

Splunk

InfluxDB

Overview of Our Logging Infrastructure

Logging Infrastructure

Event logPerformance log

InfluxDB & Grafana

GoogleCloudStorage

Splunk & PagerDuty

FluentdKafka

Splunk

Fluentd

Metricbeat

CloudFoundry

Event Logging Infrastructure

Event Logs in OpenStack

Huge Number of log files22 log files in a single clusterManage logs for every Regions & Availability Zones

Manage un-manageable logsCRITICAL message is un-manageableNeed to have strong analytical storage engine

Compo-nent

# Log files

Nova 8Keystone 1Neutron 6Glance 2Cinder 5etc. etc.

2013-02-25 21:05:51 17409 CRITICAL cinder [-] Bad or unexpected response from the storage volume backend API: volume group cinder-volumes doesn't exist...2013-02-25 21:05:51 17409 TRACE cinder VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: volume group cinder-volumes doesn't exist2013-02-25 21:05:51 17409 TRACE cinder

Event Logs in VMware

Almost all VMware logsEvent logs from vShpere Warning and error logs from ESXi

SAN storage logsError logs from multi vendor’s SAN storage

Log storage for Event logs: Splunk

System ConfigurationSplunk v6.4.x (as of Nov 2016)Using Indexer cluster and Search head cluster

Manage huge data150+ GB input size per a day30+ TB indexed data size

Input size / a day

Indexed data size

Alerting and Reporting on Splunk

OpenStack logs26 alerts16 dashboards for reporting

VMware logs68 alerts12 dashboards for reporting (e.g. Visualize number of errors)

Useful alerting functionCollaborate with Pagerduty

Strong analytical engineManage and analyze almost all type of logs Manage un-manageable logs

Performance Logging Infrastructure

Log Collector Requirements

Handle log streamsSupport various log file formatStrong parse engine

User-friendly agentMinimum computation resource usagePluggable Architecture

Log Collector: Fluentd, Metricbeat

HVs and Storage Performance logs

OpenStack Hosts logsUse Fluentd exec plugin for getting nf_conntrack_countMetricbeat v5 for cpu, mem, diskio, filesystem, network

VMware HVs and SAN logsUse In-house Fluentd custom plugin for getting Output to InfluxDB and analyze on Grafana

VMs Performance logs from Hypervisors

#!/usr/bin/env pythonimport json, libvirtconn = libvirt.openReadOnly()for id in conn.listDomainsID(): dom = conn.lookupByID(id) print(json.dumps({ "uuid": dom.UUIDString(), "name": dom.name(), "id": dom.ID(), "vcpus":dom.vcpus()[0][3], }))

From KVM (OpenStack)Use libvirt Python bindings to build the custom scriptsGenerate json data and use in_tail plugin

From ESXi (VMware)Get logs from vCenter

Log streaming: Kafka

Kafka SpecsKafka v0.10.0Run on OpenStack and use full SSDs

System Configuration100~500 partitions and 3 replications per topicsMake backup for important logs to GCSTransform to the other Kafka (If necessary)

KafkaGoogle Cloud

Storage

Log storage for Performance logs:InfluxDB and Grafana

InfluxDBRun InfluxDB v1.1.0 on physical serverMultiple post by using Kafka and Fluentd

Grafana72 dashboards for visualizing performance dataAccess to Multiple InfluxDBs via Load balancer

Grafana

Fluentd - Useful Log CollectorFluentd can handle various log format and be easy to parse logsMinimum resource usage

Redundant systemRealize InfluxDB mirroring by Kafka and FluentdMinimize data loss by transporting logs to Kafka – Additionally use GCS

Summary

2 logging EngineSplunk for event logs, InfluxDB for performance logs

Cover all of our requirementsEasy for troubleshooting, visualization, analysis and improvement

Our logging infra makes our private cloud a

better cloud

how logging makes a private cloud a better cloud - openstack最新情報セミナー(2016年12月)

Technology

어느클라우드인프라가우월한가 what makes...

dasar logging

application of production logging & noise logging for

macpa/bli makes the shift change - cloud and open,...

makalah metode logging

density logging

teori dasar logging

introduction to cloud computing · why virtualization is...

perl logging

illegal logging

geofisika well logging

testing & logging - uni-muenchen.de · –junit –test...

cloud application logging for forensics

logging migas

oracle cloud デザイン・パターン -cloud storage...

well logging analysis

wel logging

(6) logging

tugas logging sumur1

geologi logging