splunklive! utrecht 2016 - exact

25
Copyright © 2015 Splun Inc. DevOps - Lower meantime to resolution while facilitating growth André van de Graaf Principal, Quality Assurance Exact Software

Upload: splunk

Post on 06-Jan-2017

152 views

Category:

Technology


4 download

TRANSCRIPT

Copyright © 2015 Splunk Inc.

DevOps - Lower meantime to resolution while facilitating growth

André van de GraafPrincipal, Quality Assurance

Exact Software

© 2015 EXACT

2

350,000companies

7countries

LAUNCHEDExact Online, a SaaS-based

version of the product

We build business software aimed at SMBs

2005

Dutch based companyFounded in 1984 by Dutch students

5 Datacenters

© 2015 EXACT

3

Exact Infrastructure & Operations

Team of 7 engineers running a platform supporting

350,000 COMPANIES

Splunk team0.5 FTE Setup and configuration0.5 FTE Data import, dashboards, alerts, reports.

© 2015 EXACT

4

© 2015 EXACT

5

© 2015 EXACT6

Exact’s ambition: Exponential growth

For exponential growth we need to automate. Splunk will facilitate this.Last quarter: 250 new companies added on a daily basis

© 2015 EXACT7

Situation before Splunk?

• Support department was our alert system• 2 datacenters • 4 countries• Weekly builds• Manually analyzing logs• At least 1 war room session per month

8

How Did We Do?

Splunk implementation step by step in 12 months

© 2015 EXACT9

Splunk Implementation

• Operation Visibility• Business Insight• Pro Active Monitoring• Search and Investigation

© 2015 EXACT

Operational Visibility: +/- 20 different datasources

IIS log Eol log

Perfmon counters

© 2015 EXACT

Operational Visibility: Status Monitor dashboard

© 2015 EXACT

Oeps 1 datacenter down!

© 2015 EXACT

Drill down to Dashboard Current status

© 2015 EXACT

Business insight: Trends and patterns

© 2015 EXACT

Pro-active monitoring: Fair-use policy - Exceptional Uploads and Downloads

© 2015 EXACT

Search and investigate: Detailed perfmon counters

• We log performance counters every 5 seconds. So we are able to investigate issue to specific moments when did it start and when did it end. Logging per minute is not detailed enough for us.

- TITLE OF PRESENTATION16

• For statistics, we aggregate performance counter per hour.

© 2015 EXACT

Search + Investigation

Bugreport: Splunk queries to see how many customers are affected by a bug accros all countries. This will help development teams to priorities the bug intake.

After deployement we can use the same splunk query to see if it is really fixed.

© 2015 EXACT

Alerts to Mail and VictorOps

© 2015 EXACT19

Where are we as of today?

• From 2 to 5 datacenters• From 4 to 7 countries• From Weekly to Daily builds• Adding 250 companies on a daily basis• Data size increased with 100 % in 1 year

© 2015 EXACT20

Where are we as of today?

• Fully operational in DevOps team• Lowered meantime to resolve with 75%• Inform support department pro-active• Scale the platform not the team• Splunk is part of the delivery process of new

Exact Online functionality.

© 2015 EXACT21

What Did We Learn?

• Start documentation from the beginning• Rubbish in = rubbish out. Fix the source!• Implement 1 naming convention which applies

to all datacenters.• 1 naming convention within Splunk: Reports,

lookups etc...

© 2015 EXACT22

What Did We Learn?

• Do not hard code stuff.• End result of an incident is a new alert and or

dashboard. Continuous improvements. Part of RCA process.

• Analyze your imported sources. Is everything useful to import.

© 2015 EXACT23

Obstacles to overcome

• No ADFS support. Supported as of Splunk 6.4• Resources, small team

© 2015 EXACT24

Next Steps

• Roll out Splunk to our Development and Support teams.• Upgrade to Splunk 6.5• Implementation of Splunk IT Service Intelligence (ITSI)• POC Machine Learning

© 2015 EXACT

Thank You

[email protected]

GRAA1005