the longest 5 minutes in our life
DESCRIPTION
Monitoring as retrospection/inspection/prospectionTRANSCRIPT
The longest 5 minutesin our life.
@tagomoris2013/11/30 Monitoring Casual Talks in Kyoto
13年11月30日土曜日
※タイトルは中二13年11月30日土曜日
TAGOMORI Satoshi (@tagomoris)LINE Corp.
Hadoop, Fluentd, Norikra, ...
13年11月30日土曜日
13年11月30日土曜日
ISUCON勝ちました
13年11月30日土曜日
石狩DC見学ツアーエヴァンジェリスト
13年11月30日土曜日
What 5min. is for?
ISUCON
Our new service launches
Our services in troubles
13年11月30日土曜日
What we can do in 5min.?Investigate logs! Logs! Logs!
Hot request paths
Heavy request paths
How many requests? How many users?
and, and, and ...
13年11月30日土曜日
Logs
Retrospection: past N min. logs
Inspection: logs now tailing
Prospection: incoming N min. logs
13年11月30日土曜日
Retrospectionin ISUCON
We MUST NOT be a slave of information.
Too many is worse.
We MUST know factors at least.
Too few is worse.
13年11月30日土曜日
analyze_apache_logsBundled with Apache::Log::Parser (in CPAN)
Read logs from STDIN, and analyze it
For each method/paths
HTTP response status code
Response duration (avg/min/max)
Query Strings / Referers (option)
13年11月30日土曜日
$ cat /var/log/httpd/access_log | analyze_apache_logs -s pathTOTAL: 1801*! duration avg:97.33, min:76, max:110!status 200:3/! duration avg:73517.00, min:6617, max:134667! status 200:6/entry!duration avg:168814.06, min:41780, max:378686! status 200:33/entry/15035!duration avg:34386.00, min:34386, max:34386! status 200:1/follow! duration avg:171574.81, min:4032, max:610354!status 200:145/icon! duration avg:262889.95, min:117225, max:784451! status 200:21/icon/03df2637e15ff22eeb825d3aa664c2ecbf399cbc0257c94db002497d508a476c!duration avg:292981.50, min:239181, max:346782! status 200:2/icon/06e3640fd416acffbbc63177bf5a65b9981de8dc3aae19ca9224fcf45c6fa1f6!duration avg:270258.61, min:73933, max:492001! status 200:18/icon/09228075c09882cbf065a30848e79bdc3e43f7b43273be98304a5f7712aa37d8!duration avg:198728.00, min:116202, max:271046! status 200:3/icon/0ab3a5827c926a148ef28d572e44a878a99ceecc11296025319f21826b77f352!duration avg:250647.07, min:63798, max:503243! status 200:14/icon/0d5f799ba92380f94f6108521aacb50280da2a731a9d5fb19d6da1f224837a4a!
13年11月30日土曜日
Retrospectionin actionShib: Hive WebUI -> mapreduce
ex: N min. logs of 10 mins ago
Import lag / MapReduce lag
Kibana: Elasticsearch WebUI
Scalability?
Fluentd + GrowthForecast
without on-demand queries
13年11月30日土曜日
Retrospection:Fluentd+GrowthForecast
HTTP Response Times (Avg, [50,90,95,98,99]%tiles)
HTTP Response Status
13年11月30日土曜日
InspectionImHacker by @cho45http://subtech.g.hatena.ne.jp/cho45/20120810/1344606438
13年11月30日土曜日
Prospection
Queries for future/incoming logs
both of access logs and application logs
results for 5min. logs at just 5min. later
13年11月30日土曜日
Norikra:Schema-less Stream Processing with SQL
13年11月30日土曜日
Norikra(1):Schema-less event stream:
Add/Remove data fields whenever you want
SQL:No more restarts to add/remove queriesw/ JOINs, w/ SubQueriesw/ UDF
Truly Complex events:Nested Hash/Array, accessible directly from SQL
13年11月30日土曜日
Norikra(2):Open source software:
Licensed under GPLv2Based on EsperUDF plugins from rubygems.org
Ultra-fast bootstrap & small start:3mins to install/start1 server
13年11月30日土曜日
Norikra Queries: (1)
SELECT name, ageFROM events
13年11月30日土曜日
Norikra Queries: (1)
SELECT name, ageFROM events
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}
{“name”:”tagomoris”,”age”:34}
13年11月30日土曜日
Norikra Queries: (1)
SELECT name, ageFROM events
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}
nothing
13年11月30日土曜日
Norikra Queries: (2)
SELECT name, ageFROM events
WHERE current=”Kyoto”
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}
{“name”:”tagomoris”,”age”:34}
13年11月30日土曜日
Norikra Queries: (2){“name”:”secondlife”, “age”:99, “address”:”Tokyo”, “corp”:”Cookpad”, “current”:”Nara”}
SELECT name, ageFROM events
WHERE current=”Kyoto”
nothing
13年11月30日土曜日
Norikra Queries: (3)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
13年11月30日土曜日
Norikra Queries: (3)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
every 5 mins
13年11月30日土曜日
Norikra Queries: (4)
SELECT age, COUNT(*) as cntFROM
events.win:time_batch(5 mins)GROUP BY age
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”}
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...every 5 mins
SELECT max(age) as maxFROM
events.win:time_batch(5 mins)
{“max”:51}
13年11月30日土曜日
Norikra Queries: (5)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...]}
13年11月30日土曜日
Norikra Queries: (5)
SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY user.age
{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...]}
13年11月30日土曜日
Norikra Queries: (5)
SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
WHERE current=”Kyoto” AND attend.$0 AND attend.$1GROUP BY user.age
{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...]}
13年11月30日土曜日
Before: Hive EVERY HOUR!SELECT yyyymmdd, hh, campaign_id, region, lang, count(*) AS click, count(distinct member_id) AS uuFROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20131101' AND hh='00' AND get_json_object(log, '$.type') = 'click') xGROUP BY yyyymmdd, hh, campaign_id, region, lang
13年11月30日土曜日
After: NorikraSELECT campaign.id AS campaign_id, member.region AS region, count(*) AS click, count(distinct member.id) AS uuFROM myservice.win:time_batch(1 hours)WHERE type="click"GROUP BY campaign.id, member.region
13年11月30日土曜日
Before: Fluentd
<match for.target.service> type numeric_monitor unit minute tag service.response output_key_prefix request_api aggregate all monitor_key api_response_time percentiles 50,90,95,98,99</match>
EACH SERVICES
... AND RESTART OF FLUENTD!!!!!!!!!!!!!!
13年11月30日土曜日
After: Norikra
SELECT percentiles(api_response_time, [50,90,95,98,99]) AS pFROM target_service.win:time_batch(1 min)
EACH SERVICES!
WITHOUT ANY RESTARTS!
13年11月30日土曜日
ConclusionRetrospections are important
We have many methods for retrospections now
Prospections are also important
For complex logs
For immediate reports
For less system managements
13年11月30日土曜日