the longest 5 minutes in our life

35
The longest 5 minutes in our life. @tagomoris 2013/11/30 Monitoring Casual Talks in Kyoto 131130日土曜日

Upload: satoshi-tagomori

Post on 15-Jan-2015

1.916 views

Category:

Technology


0 download

DESCRIPTION

Monitoring as retrospection/inspection/prospection

TRANSCRIPT

Page 1: The longest 5 minutes in our life

The longest 5 minutesin our life.

@tagomoris2013/11/30 Monitoring Casual Talks in Kyoto

13年11月30日土曜日

Page 2: The longest 5 minutes in our life

※タイトルは中二13年11月30日土曜日

Page 3: The longest 5 minutes in our life

TAGOMORI Satoshi (@tagomoris)LINE Corp.

Hadoop, Fluentd, Norikra, ...

13年11月30日土曜日

Page 4: The longest 5 minutes in our life

13年11月30日土曜日

Page 5: The longest 5 minutes in our life

ISUCON勝ちました

13年11月30日土曜日

Page 6: The longest 5 minutes in our life

石狩DC見学ツアーエヴァンジェリスト

13年11月30日土曜日

Page 7: The longest 5 minutes in our life

What 5min. is for?

ISUCON

Our new service launches

Our services in troubles

13年11月30日土曜日

Page 8: The longest 5 minutes in our life

What we can do in 5min.?Investigate logs! Logs! Logs!

Hot request paths

Heavy request paths

How many requests? How many users?

and, and, and ...

13年11月30日土曜日

Page 9: The longest 5 minutes in our life

Logs

Retrospection: past N min. logs

Inspection: logs now tailing

Prospection: incoming N min. logs

13年11月30日土曜日

Page 10: The longest 5 minutes in our life

Retrospectionin ISUCON

We MUST NOT be a slave of information.

Too many is worse.

We MUST know factors at least.

Too few is worse.

13年11月30日土曜日

Page 11: The longest 5 minutes in our life

analyze_apache_logsBundled with Apache::Log::Parser (in CPAN)

Read logs from STDIN, and analyze it

For each method/paths

HTTP response status code

Response duration (avg/min/max)

Query Strings / Referers (option)

13年11月30日土曜日

Page 12: The longest 5 minutes in our life

$ cat /var/log/httpd/access_log | analyze_apache_logs -s pathTOTAL: 1801*! duration avg:97.33, min:76, max:110!status 200:3/! duration avg:73517.00, min:6617, max:134667! status 200:6/entry!duration avg:168814.06, min:41780, max:378686! status 200:33/entry/15035!duration avg:34386.00, min:34386, max:34386! status 200:1/follow! duration avg:171574.81, min:4032, max:610354!status 200:145/icon! duration avg:262889.95, min:117225, max:784451! status 200:21/icon/03df2637e15ff22eeb825d3aa664c2ecbf399cbc0257c94db002497d508a476c!duration avg:292981.50, min:239181, max:346782! status 200:2/icon/06e3640fd416acffbbc63177bf5a65b9981de8dc3aae19ca9224fcf45c6fa1f6!duration avg:270258.61, min:73933, max:492001! status 200:18/icon/09228075c09882cbf065a30848e79bdc3e43f7b43273be98304a5f7712aa37d8!duration avg:198728.00, min:116202, max:271046! status 200:3/icon/0ab3a5827c926a148ef28d572e44a878a99ceecc11296025319f21826b77f352!duration avg:250647.07, min:63798, max:503243! status 200:14/icon/0d5f799ba92380f94f6108521aacb50280da2a731a9d5fb19d6da1f224837a4a!

13年11月30日土曜日

Page 13: The longest 5 minutes in our life

Retrospectionin actionShib: Hive WebUI -> mapreduce

ex: N min. logs of 10 mins ago

Import lag / MapReduce lag

Kibana: Elasticsearch WebUI

Scalability?

Fluentd + GrowthForecast

without on-demand queries

13年11月30日土曜日

Page 14: The longest 5 minutes in our life

Retrospection:Fluentd+GrowthForecast

HTTP Response Times (Avg, [50,90,95,98,99]%tiles)

HTTP Response Status

13年11月30日土曜日

Page 15: The longest 5 minutes in our life

InspectionImHacker by @cho45http://subtech.g.hatena.ne.jp/cho45/20120810/1344606438

13年11月30日土曜日

Page 16: The longest 5 minutes in our life

Prospection

Queries for future/incoming logs

both of access logs and application logs

results for 5min. logs at just 5min. later

13年11月30日土曜日

Page 17: The longest 5 minutes in our life

Norikra:Schema-less Stream Processing with SQL

13年11月30日土曜日

Page 18: The longest 5 minutes in our life

Norikra(1):Schema-less event stream:

Add/Remove data fields whenever you want

SQL:No more restarts to add/remove queriesw/ JOINs, w/ SubQueriesw/ UDF

Truly Complex events:Nested Hash/Array, accessible directly from SQL

13年11月30日土曜日

Page 19: The longest 5 minutes in our life

Norikra(2):Open source software:

Licensed under GPLv2Based on EsperUDF plugins from rubygems.org

Ultra-fast bootstrap & small start:3mins to install/start1 server

13年11月30日土曜日

Page 20: The longest 5 minutes in our life

Norikra Queries: (1)

SELECT name, ageFROM events

13年11月30日土曜日

Page 21: The longest 5 minutes in our life

Norikra Queries: (1)

SELECT name, ageFROM events

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}

{“name”:”tagomoris”,”age”:34}

13年11月30日土曜日

Page 22: The longest 5 minutes in our life

Norikra Queries: (1)

SELECT name, ageFROM events

{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}

nothing

13年11月30日土曜日

Page 23: The longest 5 minutes in our life

Norikra Queries: (2)

SELECT name, ageFROM events

WHERE current=”Kyoto”

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}

{“name”:”tagomoris”,”age”:34}

13年11月30日土曜日

Page 24: The longest 5 minutes in our life

Norikra Queries: (2){“name”:”secondlife”, “age”:99, “address”:”Tokyo”, “corp”:”Cookpad”, “current”:”Nara”}

SELECT name, ageFROM events

WHERE current=”Kyoto”

nothing

13年11月30日土曜日

Page 25: The longest 5 minutes in our life

Norikra Queries: (3)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

13年11月30日土曜日

Page 26: The longest 5 minutes in our life

Norikra Queries: (3)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}

{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...

every 5 mins

13年11月30日土曜日

Page 27: The longest 5 minutes in our life

Norikra Queries: (4)

SELECT age, COUNT(*) as cntFROM

events.win:time_batch(5 mins)GROUP BY age

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}

{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...every 5 mins

SELECT max(age) as maxFROM

events.win:time_batch(5 mins)

{“max”:51}

13年11月30日土曜日

Page 28: The longest 5 minutes in our life

Norikra Queries: (5)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...]}

13年11月30日土曜日

Page 29: The longest 5 minutes in our life

Norikra Queries: (5)

SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY user.age

{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...]}

13年11月30日土曜日

Page 30: The longest 5 minutes in our life

Norikra Queries: (5)

SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

WHERE current=”Kyoto” AND attend.$0 AND attend.$1GROUP BY user.age

{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...]}

13年11月30日土曜日

Page 31: The longest 5 minutes in our life

Before: Hive EVERY HOUR!SELECT yyyymmdd, hh, campaign_id, region, lang, count(*) AS click, count(distinct member_id) AS uuFROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20131101' AND hh='00' AND get_json_object(log, '$.type') = 'click') xGROUP BY yyyymmdd, hh, campaign_id, region, lang

13年11月30日土曜日

Page 32: The longest 5 minutes in our life

After: NorikraSELECT campaign.id AS campaign_id, member.region AS region, count(*) AS click, count(distinct member.id) AS uuFROM myservice.win:time_batch(1 hours)WHERE type="click"GROUP BY campaign.id, member.region

13年11月30日土曜日

Page 33: The longest 5 minutes in our life

Before: Fluentd

<match for.target.service> type numeric_monitor unit minute tag service.response output_key_prefix request_api aggregate all monitor_key api_response_time percentiles 50,90,95,98,99</match>

EACH SERVICES

... AND RESTART OF FLUENTD!!!!!!!!!!!!!!

13年11月30日土曜日

Page 34: The longest 5 minutes in our life

After: Norikra

SELECT percentiles(api_response_time, [50,90,95,98,99]) AS pFROM target_service.win:time_batch(1 min)

EACH SERVICES!

WITHOUT ANY RESTARTS!

13年11月30日土曜日

Page 35: The longest 5 minutes in our life

ConclusionRetrospections are important

We have many methods for retrospections now

Prospections are also important

For complex logs

For immediate reports

For less system managements

13年11月30日土曜日