beyond pretty charts, analytics for the rest of us. toufic boubez devops days silicon valley...

21
Beyond The Pretty Charts Analytics for the rest of us Toufic Boubez, Ph.D. Co-Founder, CTO Metafor Software

Upload: tboubez

Post on 26-Jan-2015

105 views

Category:

Technology


1 download

DESCRIPTION

Current monitoring tools are clearly reaching the limit of their capabilities. That's because these tools are based on fundamental assumptions that are no longer true such as assuming that the underlying system being monitored is relatively static or that the behavioral limits of these systems can be defined by static rules and thresholds. Interest in applying analytics and machine learning to detect anomalies in dynamic web environments is gaining steam. However, understanding which algorithms should be used to identify and predict anomalies accurately within all that data we generate is not so easy. This talk builds on an Open Space discussion that was started at DevOps Days Austin. We will begin with a brief definition of the types of anomalies commonly found in dynamic data center environments and then discuss some of the key elements to consider when thinking about anomaly detection such as: Understanding your data and the two main approaches for analyzing operations data: parametric and non-parametric methods The importance of context Simple data transformations that can give you powerful results

TRANSCRIPT

Page 1: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

Beyond The Pretty Charts

Analytics for the rest of us

Toufic Boubez, Ph.D.Co-Founder, CTOMetafor Software

Page 2: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

2

Toufic intro – who I am

• Co-Founder/CTO Metafor Software• Co-Founder/CTO Layer 7 Technologies

– API Management– Acquired by Computer Associates in 2013

• I escaped • Building large scale software systems for 20

years (I’m older than I look, I know!)

Page 3: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

3

Why this talk?

• DevOps Days Austin: Open Space talk– Blog:

http://metaforsoftware.com/beyond-the-pretty-charts-a-report-from-devopsdays-in-austin/

• Five major discussion points/lessons learned

• Note: no labels on charts – on purpose!!• Note: real data

Page 4: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

4

Wall of charts

Page 5: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

5

1. We’ve moved beyond static thresholds

• Most current monitoring tools assume that the underlying system is relatively static so we can surround it with static thresholds and rules. BUT:– So what if my unicorn usage is at 91%, and has

been stable at 91% for a while?– I’d much rather know if it’s at 60% and has been

rapidly increasing over the last few hours.

Page 6: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

6

Need more better analytics

• Thresholds won’t help you in this case• Need some more dynamic analytics

Page 7: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

7

2. Context is really important

– Do I really want to be alerted when I know someone is performing maintenance or backups?

– Is there an event that caused the change in behaviour (e.g. new deploy)?

– Correlate your event line with your monitoringDown for maintenance?

Page 8: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

8

3. Know your data!!

– You need to understand the statistical properties of your data, and where it comes from, in order to determine what kind of analytics to use.

• For example, it’s important to know if your data is normally distributed.

• http://codeascraft.com/2013/06/11/introducing-kale/• https://github.com/etsy/skyline/blob/master/src/analy

zer/algorithms.py– Three-sigma, Grubbs and other algorithms assume normal

distribution

Page 9: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

9

What’s normal?

Page 10: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

10

What’s my distribution?

Page 11: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

11

Another common distribution

Page 12: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

12

4. Is all data important to collect?

– Two camps:• Data is data, let’s collect and analyze everything and

figure out the trends. • Not all data is important, so let’s figure out what’s

important first and understand the underlying model so we don’t waste resources on the rest.

– Similar to the very public bun fight between Noam Chomsky and Peter Norvig

• http://norvig.com/chomsky.html

– Unresolved as far as I know

Page 13: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

13

Do we need both metrics?

Page 14: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

14

5. We all want to automate

• Having humans in the way of detecting and solving DevOps issues doesn’t scale.

• At some point, we need systems that can detect anomalies before problems become critical, and take appropriate action.

Page 15: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

15

Open Loop Control System:Heating your house – the wrong way!

• Steps:– Tweak heater input– Get to ideal temperature– Lock gas valve– Hope nothing changes

Controller (gas valve)

System (heater)

Sensor (thermometer)

Page 16: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

16

Controller (gas valve)

System (heater)

Sensor (thermometer)

+

-

delta

desiredtemperature

currenttemperature

Open Loop Control System:Heating your house – the right way

• Steps:– Set the desired temperature– Sit back and let the system deal with changes

Page 17: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

17

Controller System

Sensor

+

-

PuppetChef

CFEngine…

MyInfrastrucutre

NagiosCacti

Zabbix…

?desiredstate

currentstate

What’s missing to get to self-healing systems

delta

• We have most of the tools already• Need to add:

– Error tracking (anomaly detection)– Corrective action

Page 18: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

18

How much data do we need?

• Trend towards higher and higher sampling rates in data collection

• Reminds me of Jorge Luis Borges’ story about Funes the Memorious– Perfect recollection of the slightest details of every

instant of his life, but lost the ability for abstraction

• Our brain works on abstraction– We notice patterns BECAUSE we can abstract

Page 19: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

19

The danger of over-abstraction

+

= comfortable?

Page 20: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

20

So, how much data DO you need?

– You don’t need more resolution that twice your highest frequency (Nyquist-Shanon sampling theorem)

– Most of the algorithms for analytics will smooth, average, filter, and pre-process the data.

– Watch out for correlated metrics (e.g. used vs. available memory)

Page 21: Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

21

More?

• I want to talk more about analytics, in more depth, but time’s up!!– (Actually John won’t let me)

• Come talk to me during the breaks!• Thank you!