rakuten’s journey with splunk - evolution of splunk as a service

43
Rakuten’s Journey with Splunk - Evolution of Splunk as a Service June/30/2016 Keisuke Noda / Peng Yang Rakuten, Inc.

Upload: rakuten-inc

Post on 12-Jan-2017

624 views

Category:

Technology


3 download

TRANSCRIPT

Rakuten’s Journey with Splunk

- Evolution of Splunk as a Service

June/30/2016

Keisuke Noda / Peng Yang

Rakuten, Inc.

About Us

• Name– Keisuke Noda– 野田啓介

• Position– Architect / Manager– Data Store Platform Group

• Background– Software engineer– Database administrator

About Us

• Name– Peng Yang (Larry)– 陽 鵬

• Position– Infra/Web engineer– Data Store Platform Group

• Background– Software engineer– AR/MR engineer

Founded: February 7, 1997

IPO: April 19, 2000 (JASDAQ Stock Exchange)

Office: Rakuten Crimson House (Tokyo, Japan)

Employees: 12,981 (as of Dec, 2015)

Capital: JPY 203,587 million (as of Dec, 2015)

About Company

About Company

About Company

• Splunk as a Service

• dotconf-assist

• Ending

• Appendix

Agenda

8

Splunk as a Service

• Summer 2011… I discovered Splunk

• Cool visuals

• Easy to use

• Looks interesting

Why Splunk?

• Self-made database monitoring system

• Legacy and complex system

Batch Server RDBMS Web Application

Add a column Modify

codesModifycodes

Add a column

Input the Datainto RDB

Store the DataVisualizethe Data

Before

OutputDatabase Status

Why Splunk?

• Self-made database monitoring system

• One Splunk is simple

Input Data / Store Data / Visualize DataOutput

Database Status

Then, Splunk began to be used in various groups…

So Easy!!

All in One!

Cool Visuals!!

After

Why Splunk?

. . . Splunk as a Service was born

• Splunking in various groups

• Many repetitive operations(license management, construction, operations …etc)

One big platform will solve the problem.In addition, it may have many other benefits…

Why as a Service?

Corporate IT …Merchant Security

Server

Example

Dep. A Dep. B

Marketplace Credit Card

E-money

Database

Network

Dep. C Dep. D Dep. E

• Rakuten’s organization

• Many departments and groups

Service Overview

Admin User

…Network

Security

Credit Card

Corporate IT

Our Group

• Roles of Splunk as a Service

• Admin

• User

Service Overview

• No need to manage Infrastructure

• Easy to start Splunking instantly

• Charged by measured rate

• Input Size

• Storage SizeRakuten

Splunk as a Service

Details inlater part

For UserService Overview

• Environment

• Private Cloud• High availability

• On time delivery

• Flexibility

• Physical servers for Indexer• Many-core

• Large Capacity SSD

Service Design

Splunk as a service

• System configuration• v6.3.X (as of June 2016)

• Using indexer cluster

• Using SHC

• Full components• SHC, >10 Dedicated SHs

• Cluster Master

• >10 Indexers

• Heavy Forwarders

• Deployment Server

CM

SHC Dedicated SH

Indexer

Forwarder

Server

DS

Service Design

• Other specifications• Splunk account is created for each user

• 1 user = 1 group, 1 service, or 1 project

• Each user has his/her own App

• Basically a user can see only his/her own data

• Users can choose the term of storage retention from 1 day to 6 years for each input

• Admin does not do backups

• Dedicated Search Head is ready for users who need

Service Design

DatabaseReal-time monitoringTroubleshootingUsage reportService KPI management

SecurityIDS real-time monitoringFraud detection

Private CloudReal-time monitoringResource management

ApplicationReal-time monitoringService KPI managementPerformance management

StorageReal-time monitoringResource managementService KPI management

More…

NetworkReal-time monitoringTroubleshootingTrend analysis

ServerReal-time monitoringTroubleshootingUsage report

Use Cases

DatabaseReal-time monitoringTroubleshootingUsage reportService KPI management

SecurityIDS real-time monitoringFraud detection

Private CloudReal-time monitoringResource management

ApplicationReal-time monitoringService KPI managementPerformance management

StorageReal-time monitoringResource managementService KPI management

More…

NetworkReal-time monitoringTroubleshootingTrend analysis

ServerReal-time monitoringTroubleshootingUsage report

Use Cases

Application

Use Cases

• Before

• Analyze by grep command

• Take >10 minutes to handle incidents

• After

• Application access/error monitoring in real-time

• Address incidents automatically

Application

Use Cases

ApplicationReal-time monitoringPerformance management

access log

Log Sharing among users

SecurityIDS real-time monitoringFraud detection

Use Cases

Security

Use Cases

• Before

• Have difficulties to get access log

• Take a lot of time to analyze…

• After

• Analyze log easily only by themselves

• Detect irregular accesses with deep algorism

Security

Use Cases

DatabaseReal-time monitoringTroubleshootingUsage reportService KPI management

SecurityIDS real-time monitoringFraud detection

Private CloudReal-time monitoringResource management

ApplicationReal-time monitoringService KPI managementPerformance management

StorageReal-time monitoringResource managementService KPI management

More…

NetworkReal-time monitoringTroubleshootingTrend analysis

ServerReal-time monitoringTroubleshootingUsage report

Use Cases

Availability Rate Indexed Data Size

Upgrade to v6.2Upgrade to v6.3

Current Status

Input Size# of Accounts

Current Status

29

dotconf-assist

• Users• Difficult to start using Splunk (Small number of users) • No standard format to configure .conf files (Take much time) • Difficult to manage current configurations (Inconvenient management)

• Admins• Make configurations for each user request manually (High Man-hour) • Difficult to manage current configurations (Hard to maintain)

Need a tool to improve the situation

Why dotconf-assist?

• Users of dotconf-assist• User• Admin

• Application type• RESTful web application based on Splunk API

• The features of dotconf-assist• (User) manage Splunk Inputs, Apps, Forwarders, Server Class and

Deployment requests etc.• (Admin) manage Splunk account information, users’ configurations,

users’ requests etc.

What is dotconf-assist?

Sign inSign up

Approve

Create Splunk

Account

SetServer Class

Set App

RequestDeployment

Search

ApproveDeployment

Deploy Apps(Automatically)

Install Forwarders

DEV STG PROD DEV STG PROD

User Process

AdminProcess

ManageDeployment

Users’ Servers

dotconf-assist

Splunk Servers

Workflow of dotconf-assist

Demo of dotconf-assist

Splunk Users Before After

Configurations Send ticket to admin Only input necessary value

Deployment request Send ticket to admin Simple clicks

Lead time to start Splunk 1 day <10 min

Splunk Admins Before After

Handle users’ requests Create an account (>10 min)Make input config (>5 min)

1 click (5 sec)4-Step click (10 sec)

Statistics information

(user, hosts, inputs…)

View from multiple Splunkservers

View from one interface

Contributions of dotconf-assist

• Github• https://github.com/rakutentech/dotconf-assist

• Frameworks• Ruby on Rails, Bootstrap

• License• MIT License

• Policies• Freely use• Accept pull requests

How to Access Source Code

36

Ending

• Expand users

• Upgrade to v6.4

• Enhance dotconf-assist

• Improve usability

• Visualize stats index size for each input

• Complete automation

• Re-Architect Log Management System in Rakuten

What is Next?

• Rakuten is using one big Splunk as a Service• Advantages for user

• No need to manage Infrastructure, License, and detailed configurations

• Easy log sharing among users

• Advantages for admin• Can manage operations and license efficiently

• Have many satisfied users

• dotconf-assist improves Rakuten Splunk as a Service• Helped users to start Splunking easily

• Decreased man-hour for Admins

Wrap up

• Tips for starting Splunk

• Purpose is very important

• Consider your business demands/problems

• No need to modify log format

• Collaborate with existing systems/tools

• Take useful training and Q&A meet up by Splunk Engineers

Appendix - Splunk Tips

• Tips for managing Splunk• Newer Splunk version is better than older

• High-end server is much better for Indexers

• Heavy forwarders are useful for splitting workloads of indexing pipeline

• Easy access control for users by using Tag

• Use DMC for monitoring

• Use Splunk API for better usability & reduction administration cost

Appendix - Splunk Tips

• Tips for using Splunk• Use alert and automatic delivering report & dashboard

• Use embedded reports

• See Splunk answers

• Share log data with other team

• Use Splunk API for collaboration with existing systems

• Dark background for dashboard is cool

• Enjoy Splunk

Appendix - Splunk Tips

• Rakuten is hiring

• http://global.rakuten.com/corp/careers/engineering/

Appendix - Hiring

43

Thank You