how logging makes a private cloud a better cloud - openstack最新情報セミナー(2016年12月)

41
How logging makes a private cloud a better cloud Dec/01/2016 Kentaro Sasaki Global Operations Department, Rakuten, Inc.

Upload: virtualtech-japan-inc

Post on 12-Jan-2017

379 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

How logging makes a private cloud a better cloudDec/01/2016Kentaro SasakiGlobal Operations Department, Rakuten, Inc.

Page 2: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

2

Rakuten is …a Tokyo-based e-commerce and Internet company

Page 3: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

3

Rakuten EcosystemThe Rakuten Ecosystem and our membership database form the foundation of our business

Page 4: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

4

Membership116.52 Million persons

Gross Transaction Volume7.6 Trillion JPY

Page 5: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

5

Logging Infrastructure for Private Cloud

Page 6: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

6

Private Cloud at Rakuten

Page 7: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

7

Timeline of Private Cloud History

Hypervisor: XenOS Instances: 2,000+Management features from scratch

Hypervisor: KVMUse OpenStack API

2015Gen3

2012Gen2

2010Gen1

Hypervisor: VMware ESXiOS Instances: 25,000+Management features from scratch

Page 8: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

8

Logging Matters

Page 9: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

9

Benefits Logging enables log visualization Get easier to analysis and debugging

From a business point of viewShorten the time spent on troubleshooting Leads to a better Customer Support

Page 10: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

10

AssumptionsMessages might be un-manageableIncreasing logs require huge log storage

ConcernsHow to take care of data lossHow to parse data from different sources

Page 11: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

11

Log Management

Page 12: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

12

High AvailabilityAvailability, Redundancy and Scalability

MaintainabilityMinimum data loss and operation overhead

Page 13: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

13

Huge Number of TargetsHundreds of Hypervisors (ESXi & KVM)Tens of thousands of VMs

Cover many sort of logSplunk is suited for log analyticsNeed Time-series DB for performance logs

Splunk

InfluxDB

Page 14: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

14

Overview of Our Logging Infrastructure

Page 15: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

15

Logging Infrastructure

Event logPerformance log

InfluxDB & Grafana

GoogleCloudStorage

Splunk & PagerDuty

FluentdKafka

Splunk

Kafka

Splunk

Fluentd

Fluentd

Metricbeat

CloudFoundry

Page 16: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

16

Event Logging Infrastructure

Page 17: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

17

Event Logs in OpenStack

Page 18: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

18

Huge Number of log files22 log files in a single clusterManage logs for every Regions & Availability Zones

Manage un-manageable logsCRITICAL message is un-manageableNeed to have strong analytical storage engine

Compo-nent

# Log files

Nova 8Keystone 1Neutron 6Glance 2Cinder 5etc. etc.

2013-02-25 21:05:51 17409 CRITICAL cinder [-] Bad or unexpected response from the storage volume backend API: volume group cinder-volumes doesn't exist...2013-02-25 21:05:51 17409 TRACE cinder VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: volume group cinder-volumes doesn't exist2013-02-25 21:05:51 17409 TRACE cinder

Page 19: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

19

Event Logs in VMware

Page 20: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

20

Almost all VMware logsEvent logs from vShpere Warning and error logs from ESXi

SAN storage logsError logs from multi vendor’s SAN storage

Page 21: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

21

Log storage for Event logs: Splunk

Page 22: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

22

System ConfigurationSplunk v6.4.x (as of Nov 2016)Using Indexer cluster and Search head cluster

Manage huge data150+ GB input size per a day30+ TB indexed data size

Input size / a day

Indexed data size

Page 23: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

23

Alerting and Reporting on Splunk

Page 24: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

24

OpenStack logs26 alerts16 dashboards for reporting

VMware logs68 alerts12 dashboards for reporting (e.g. Visualize number of errors)

Page 25: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

25

Useful alerting functionCollaborate with Pagerduty

Strong analytical engineManage and analyze almost all type of logs Manage un-manageable logs

Page 26: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

26

Performance Logging Infrastructure

Page 27: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

27

Log Collector Requirements

Page 28: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

28

Handle log streamsSupport various log file formatStrong parse engine

User-friendly agentMinimum computation resource usagePluggable Architecture

Page 29: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

29

Log Collector: Fluentd, Metricbeat

Page 30: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

30

HVs and Storage Performance logs

Page 31: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

31

OpenStack Hosts logsUse Fluentd exec plugin for getting nf_conntrack_countMetricbeat v5 for cpu, mem, diskio, filesystem, network

VMware HVs and SAN logsUse In-house Fluentd custom plugin for getting Output to InfluxDB and analyze on Grafana

Page 32: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

32

VMs Performance logs from Hypervisors

Page 33: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

33

#!/usr/bin/env pythonimport json, libvirtconn = libvirt.openReadOnly()for id in conn.listDomainsID(): dom = conn.lookupByID(id) print(json.dumps({ "uuid": dom.UUIDString(), "name": dom.name(), "id": dom.ID(), "vcpus":dom.vcpus()[0][3], }))

From KVM (OpenStack)Use libvirt Python bindings to build the custom scriptsGenerate json data and use in_tail plugin

From ESXi (VMware)Get logs from vCenter

Page 34: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

34

Log streaming: Kafka

Page 35: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

35

Kafka SpecsKafka v0.10.0Run on OpenStack and use full SSDs

System Configuration100~500 partitions and 3 replications per topicsMake backup for important logs to GCSTransform to the other Kafka (If necessary)

KafkaGoogle Cloud

Storage

Kafka

Page 36: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

36

Log storage for Performance logs:InfluxDB and Grafana

Page 37: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

37

InfluxDBRun InfluxDB v1.1.0 on physical serverMultiple post by using Kafka and Fluentd

Grafana72 dashboards for visualizing performance dataAccess to Multiple InfluxDBs via Load balancer

Kafka

Grafana

Page 38: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

38

Fluentd - Useful Log CollectorFluentd can handle various log format and be easy to parse logsMinimum resource usage

Redundant systemRealize InfluxDB mirroring by Kafka and FluentdMinimize data loss by transporting logs to Kafka – Additionally use GCS

Page 39: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

39

Summary

Page 40: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

40

2 logging EngineSplunk for event logs, InfluxDB for performance logs

Cover all of our requirementsEasy for troubleshooting, visualization, analysis and improvement

Page 41: How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)

41

Our logging infra makes our private cloud a

better cloud