norikra: sql stream processing in ruby

52
Norikra: SQL Stream Processing In Ruby 2014/11/19 RubyConf 2014 DAY 3 Satoshi Tagomori (@tagomoris)

Upload: satoshi-tagomori

Post on 02-Jul-2015

3.550 views

Category:

Technology


2 download

DESCRIPTION

Presentation in RubyConf 2014

TRANSCRIPT

Page 1: Norikra: SQL Stream Processing In Ruby

Norikra:SQL Stream ProcessingIn Ruby

2014/11/19RubyConf 2014 DAY 3

Satoshi Tagomori (@tagomoris)

Page 2: Norikra: SQL Stream Processing In Ruby

Topics

Why I wrote Norikra

Norikra overview

Norikra queries

Use cases in production

JRuby for me

Page 3: Norikra: SQL Stream Processing In Ruby

Satoshi Tagomori (@tagomoris)Tokyo, Japan

LINE Corporation

Page 4: Norikra: SQL Stream Processing In Ruby
Page 5: Norikra: SQL Stream Processing In Ruby
Page 6: Norikra: SQL Stream Processing In Ruby
Page 7: Norikra: SQL Stream Processing In Ruby

Monitoring/Data Analytics Overview

collect parseclean up

process

visualize

processstoreAccess logs,Application logs, ...

Page 8: Norikra: SQL Stream Processing In Ruby
Page 9: Norikra: SQL Stream Processing In Ruby

collect parseclean up

process

visualize

processstore

Page 10: Norikra: SQL Stream Processing In Ruby
Page 11: Norikra: SQL Stream Processing In Ruby

collect parseclean up

process

visualize

processstore

Page 12: Norikra: SQL Stream Processing In Ruby
Page 13: Norikra: SQL Stream Processing In Ruby

collect parseclean up

process

visualize

processstore

Fluentd stream aggregation:Good for simple data/calculation

Page 14: Norikra: SQL Stream Processing In Ruby

Our services:

More and more different services

Many changes in a day (including logging)

Many kind of logs for each services

Many different metrics for each services

Page 15: Norikra: SQL Stream Processing In Ruby

collect parseclean up

process

visualize

processstore

Fluentd stream aggregation:Not good for processingabout complex/fragile environment...

Page 16: Norikra: SQL Stream Processing In Ruby

We want to:

add/remove queries anytime we want

write many queries for a service log stream

ignore events without data we want

make our service directors / growth hackers to write their own queries!

Page 17: Norikra: SQL Stream Processing In Ruby

collect parseclean up

process

visualize

processstore

Page 18: Norikra: SQL Stream Processing In Ruby

break.

Page 19: Norikra: SQL Stream Processing In Ruby
Page 20: Norikra: SQL Stream Processing In Ruby

Norikra:Schema-less Stream Processing with SQL

Server software, written in JRuby, runs on JVM

Open source software (GPLv2)

http://norikra.github.io/

https://github.com/norikra/norikra

Page 21: Norikra: SQL Stream Processing In Ruby

How To Setup Norikra:Install JRuby

download jruby.tar.gz, extract it and export $PATHuse rbenv

rbenv install jruby-1.7.xx

rbenv shell jruby-..

Install Norikragem install norikra

Execute Norikra servernorikra start

Page 22: Norikra: SQL Stream Processing In Ruby

Norikra Interface:CLI client/Client library: norikra-client

norikra-client target open ...

norikra-client query add ...

tail -f ... | norikra-client event send ...

WebUI

show status

show/add/remove queries

HTTP API

JSON, MessagePack

Page 23: Norikra: SQL Stream Processing In Ruby

Norikra:

Schema-less event stream:Add/Remove data fields whenever you want

SQL:No more restarts to add/remove queriesw/ JOINs, w/ SubQueriesw/ UDF (in Java/Ruby as rubygems)

Truly Complex events:Nested Hash/Array, accessible directly from SQL

Page 24: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (1)

SELECT name, ageFROM events

target

Page 25: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (1)

SELECT name, ageFROM events

{“name”:”tagomoris”, “age”:35, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}

{“name”:”tagomoris”,”age”:35}

Page 26: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (1)

SELECT name, ageFROM events

nothing

without “age”

{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}

Page 27: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (2)

SELECT name, ageFROM events

WHERE current=”San Diego”

{“name”:”tagomoris”,”age”:35}

{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}

Page 28: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (2)

SELECT name, ageFROM events

WHERE current=”San Diego”

nothing

{“name”:”nobu”, “age”:0, “address”:”Somewhere”, “corp”:”Heroku”, “current”:”SAN”}

current is not “San Diego”

Page 29: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (3)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

Page 30: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (3)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

{”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...

every 5 mins

{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}

Page 31: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (4)

SELECT age, COUNT(*) as cntFROM

events.win:time_batch(5 mins)GROUP BY age

{”age”:35,”cnt”:3},{“age”:33,”cnt”:1},

...

SELECT max(age) as maxFROM

events.win:time_batch(5 mins)

{“max”:51}every 5 mins

{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}

Page 32: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (5)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}

Page 33: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (5)

SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY user.age

{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}

Page 34: Norikra: SQL Stream Processing In Ruby

Norikra Queries: (5)

SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

WHERE current=”San Diego”AND attend.$0 AND attend.$1

GROUP BY user.age

{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}

Page 35: Norikra: SQL Stream Processing In Ruby

break.next: use cases

Page 36: Norikra: SQL Stream Processing In Ruby

Use case 1:External API call reports for partners (LINE)

External API call for LINE Business Connect

LINE backend sends requests to partner’s API endpoint using users’ messages

http://developers.linecorp.com/blog/?p=3386

Page 37: Norikra: SQL Stream Processing In Ruby

Use case 1:External API call reports for partners (LINE)

channelgateway

partner’sserver

logs

queryresults

MySQL Mail

SELECT    channelId  AS  channel_id,    reason,    detail,    count(*)  AS  error_count,    min(timestamp)  AS  first_timestamp,    max(timestamp)  AS  last_timestampFROM    api_error_log.win:time_batch(60  sec)GROUP  BY  channelId,reason,detailHAVING  count(*)  >  0

http://developers.linecorp.com/blog/?p=3386

Page 38: Norikra: SQL Stream Processing In Ruby

Use case 1:External API call reports for partners (LINE)

API error response summaries

http://developers.linecorp.com/blog/?p=3386

Page 39: Norikra: SQL Stream Processing In Ruby

Use case 2: Lambda architecturePrompt reports for Ad service console

Prompt reports with Norikra + Fixed reports with Hive

appserverapp

serverappserver

appserverapp

serverappserver

Fluentd

HDFS

consoleservice

fetch query results(frequently)

execute hive query(daily)

impressionlogs

Page 40: Norikra: SQL Stream Processing In Ruby

SELECT    yyyymmdd,  hh,  campaign_id,  region,  lang,    COUNT(*)  AS  click,    COUNT(DISTINCT  member_id)  AS  uuFROM  (    SELECT  yyyymmdd,  hh,        get_json_object(log,  '$.campaign.id')  AS  campaign_id,        get_json_object(log,  '$.member.region')  AS  region,        get_json_object(log,  '$.member.lang')  AS  lang,        get_json_object(log,  '$.member.id')  AS  member_id    FROM  applog    WHERE  service='myservice'        AND  yyyymmdd='20140913'        AND  get_json_object(log,  '$.type')='click')  xGROUP  BY  yyyymmdd,  hh,  campaign_id,  region,  lang

Hive queryfor fixed reports

Use case 2:Prompt reports for Ad service console

Page 41: Norikra: SQL Stream Processing In Ruby

SELECT    campaign.id  AS  campaign_id,    member.region  AS  region,    member.lang  AS  lang,    COUNT(*)  AS  click,    COUNT(DISTINCT  member.id)  AS  uuFROM  myservice.win:time_batch(1  hours)WHERE  type="click"GROUP  BY  campaign.id,  member.region,  member.lang

Norikra query for prompt reports

Use case 2:Prompt reports for Ad service console

Page 42: Norikra: SQL Stream Processing In Ruby

Use case 3:Realtime access dashboard on Google Platform

Access log visualizationCount using Norikra (2-step), Store on Google BigQueryDashboard on Google Spreadsheet + Apps Script

https://www.youtube.com/watch?v=EZkw5TDcCGw

http://qiita.com/kazunori279/items/6329df57635799405547

Page 43: Norikra: SQL Stream Processing In Ruby

Use case 3:Realtime access dashboard on Google Platform

https://www.youtube.com/watch?v=EZkw5TDcCGwhttp://qiita.com/kazunori279/items/6329df57635799405547

Server

Fluentd

ngnix

access log

access logsto BigQuery

norikra query resultsto aggregate nodenorikra query

to aggregate locally

Page 44: Norikra: SQL Stream Processing In Ruby

Use case 3:Realtime access dashboard on Google Platform

https://www.youtube.com/watch?v=EZkw5TDcCGwhttp://qiita.com/kazunori279/items/6329df57635799405547

Fluentd

ngnix

70 servers, 120,000 requests/sec (or more!)

ngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnix

GoogleBigQuery

GoogleSpreadsheet+ Apps script

...

counts per host

logs to store

total count

Page 45: Norikra: SQL Stream Processing In Ruby

Why Norikra is written in JRuby

Esper

CEP(Complex Event Processing) library, written in Java

Rubygems.org

Open repository, for public UDF plugins of Norikra provided as gem

Page 46: Norikra: SQL Stream Processing In Ruby

JRuby for me

Ruby! (by great JRuby developer team!)

makes developing Norikra dramatically faster

with rubygems and rubygems.org for easy deployment/installation

with Java libraries, ex: Jetty, Esper, ...

There are not so many users in Tokyo :(

Page 47: Norikra: SQL Stream Processing In Ruby

More queries, more simplicityand less latency

in data processing

Thanks!

photo: by my co-workers

http://norikra.github.io/https://github.com/norikra/norikra

Page 49: Norikra: SQL Stream Processing In Ruby

Storm or Norikra?

Simple and fixed workload for huge traffic

Use Storm!

Complex and fragile workload for non-huge traffic

Use Norikra!

Page 50: Norikra: SQL Stream Processing In Ruby

Scalability?

10,000 - 100,000 events/sec

on 2CPU 8Core server

Page 51: Norikra: SQL Stream Processing In Ruby

HA? Distributed?

NO!

I have some idea, but I have no time to implement it

There are no needs for HA/Distributed processing

Page 52: Norikra: SQL Stream Processing In Ruby

Data flow & API?

Use Fluentd!