fluentd: data streams in ruby world #rdrc2014

76
Fluentd: Data streams in Ruby world @tagomoris RedDotRubyConf 2014 Day1, 26 June 2014 14626日木曜日

Upload: satoshi-tagomori

Post on 08-Sep-2014

23 views

Category:

Software


8 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd:Data streams in Ruby world

@tagomorisRedDotRubyConf 2014

Day1, 26 June 2014

14年6月26日木曜日

Page 2: Fluentd: Data streams in Ruby world #rdrc2014

TAGOMORI Satoshi a.k.a. @tagomoris

14年6月26日木曜日

Page 3: Fluentd: Data streams in Ruby world #rdrc2014

14年6月26日木曜日

Page 4: Fluentd: Data streams in Ruby world #rdrc2014

14年6月26日木曜日

Page 5: Fluentd: Data streams in Ruby world #rdrc2014

14年6月26日木曜日

Page 6: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd

Fluentd is an open source data collector to simplify log management.Fluentd is designed to process high-volume data streams reliably. Use cases include real-time search and monitoring, Big Data analytics, reliable archiving and more.

http://www.fluentd.org/

14年6月26日木曜日

Page 7: Fluentd: Data streams in Ruby world #rdrc2014

Before Fluentd:Access logs Metrics

Archives

apachenginx

graphs

Amazon S3Filesystem

tail -f

scppython

Error handling? Buffering?

14年6月26日木曜日

Page 8: Fluentd: Data streams in Ruby world #rdrc2014

Before Fluentd:Access logs Metrics

Analytics

Archives

apachenginx

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

Error handling? Buffering?Routing? API Keys?

14年6月26日木曜日

Page 9: Fluentd: Data streams in Ruby world #rdrc2014

Before Fluentd:Access logs

App logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

file rubylogger

Error handling? Buffering?Routing? API Keys? Formats?

14年6月26日木曜日

Page 10: Fluentd: Data streams in Ruby world #rdrc2014

Before Fluentd:Access logs

App logs

System logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

file rubylogger

Error handling? Buffering?Routing? API Keys? Formats?

14年6月26日木曜日

Page 11: Fluentd: Data streams in Ruby world #rdrc2014

Before Fluentd: CHAOSAccess logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

file rubylogger

filelogger ruby

cmd

ruby

Error handling? Buffering?Routing? API Keys? Formats?

14年6月26日木曜日

Page 12: Fluentd: Data streams in Ruby world #rdrc2014

After Fluentd: ControllableAccess logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

14年6月26日木曜日

Page 13: Fluentd: Data streams in Ruby world #rdrc2014

Access logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

Fluentd does:Format, Buffer, Retry, Route

After Fluentd: Controllable

14年6月26日木曜日

Page 14: Fluentd: Data streams in Ruby world #rdrc2014

FluentdOpen source data collector

Written in Ruby, runs on CRuby on UNIX-like OSWith error handling and routing in core

Plugin systemsInput, Output and Buffer (w/ many built-in plugins)

Distributed on rubygems.orgFluentd and its plugins: gem install fluentdrpm/deb are also available (td-agent)

14年6月26日木曜日

Page 15: Fluentd: Data streams in Ruby world #rdrc2014

Why Fluentd?

14年6月26日木曜日

Page 16: Fluentd: Data streams in Ruby world #rdrc2014

Why Fluentd?Fluentd’s logo is very cute!

14年6月26日木曜日

Page 17: Fluentd: Data streams in Ruby world #rdrc2014

He is also very cute...

14年6月26日木曜日

Page 18: Fluentd: Data streams in Ruby world #rdrc2014

Why Fluentd?Simple data structure

tag, time and record(hash)

Apache-like configuration syntax

Simple / powerful routing

Many public plugins

Just few steps for custom plugins

Scalability

14年6月26日木曜日

Page 19: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd Event

app.device.ios2014-06-24 16:28:50{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}

Event

14年6月26日木曜日

Page 20: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd Event

app.device.ios1403512916 (2014-06-23 16:41:56 +0800)

{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}

tagtime

record

14年6月26日木曜日

Page 21: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd Event

app.device.ios1403512916 (2014-06-23 16:41:56 +0800)

{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}

tag for routing

record

structured data

time by unix time

14年6月26日木曜日

Page 22: Fluentd: Data streams in Ruby world #rdrc2014

# read from a file and parse<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to MongoDB and S3<match app.**> type copy

<store> type mongo host mongo.example.com capped capped_size 200m </store>

<store> type s3 path archive/ </store></match>

Fluentd Configuration14年6月26日木曜日

Page 23: Fluentd: Data streams in Ruby world #rdrc2014

# read from a file and parse<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to MongoDB and S3<match app.**> type copy

<store> type mongo host mongo.example.com capped capped_size 200m </store>

<store> type s3 path archive/ </store></match>

Fluentd Configuration

for input for output

14年6月26日木曜日

Page 24: Fluentd: Data streams in Ruby world #rdrc2014

# read from a file and parsesource { type ”tail” path “/var/log/httpd.log” format “apache2” tag ”web.access”}

# logs from client librariessource { type ”forward” port 24224}

# store logs to MongoDB and S3match(“app.**”) { type ”copy”

store { type ”mongo” host “mongo.example.com” capped capped_size “200m” }

store { type ”s3” path “archive/” }}

Fluentd Configuration DSL14年6月26日木曜日

Page 25: Fluentd: Data streams in Ruby world #rdrc2014

Tag based routing

input

input

output

output

input

output

output

coretagtime

record

web.log

sys.*

app.**

**

14年6月26日木曜日

Page 26: Fluentd: Data streams in Ruby world #rdrc2014

Tag based routing

input

input

output

output

input

output

output

coretagtime

record

web.log

sys.*

app.**

**

converted.web.log

14年6月26日木曜日

Page 27: Fluentd: Data streams in Ruby world #rdrc2014

300+ Public Plugins

access, add, aes-forward, airbrake-python, amazon_sns, amplifier-filter, amqp, amqp2, andon, anomalydetect, anonymizer, arango, arduino, axlsx, backlog, bigquery, boundio, buffer-

lightening, buffered-filter, buffered-hipchat, buffered-stdout, bufferize, calc, cassandra, cassandra-cql, cloudstack, cloudwatch, cloudwatch_ya, combiner, conditional_filter, config-expander, config_pit, config_reloader, convert-value-to-sha, copy_ex, couch, couch-sharded,

couchbase, dashing, data-rejecter, datacalculator, datacounter, dbi, dd, debug, delay-inspector, delayed, derive, df, droonga, dstat, dummydata-producer, dynamodb, ec2-metadata, elapsed-time, elasticsearch, elasticsearch-cluster, elasticsearch-ruby, elb-log, embedded-

elasticsearch, eval-filter, event-tail, extract_query_params, file-alternative, file-sprintf, filter, filter_keys, flatten, flatten-hash, flowcounter, flowcounter-simple, flume,

fnordmetric, forest, fork, format, forward-aws, ftp, gamobile, ganglia, gc, geoip, glusterfs, graphite, grassland, gree_community, grep, grepcounter, groonga, groupcounter, growl,

growthforecast, gstore, hash-forward, hato, hbase, hekk_redshift, heroku-postgres, heroku-syslog, hipchat, histogram, hoop, hostname, hrforecast, http-enhanced, http-ex, http-list, http-status, https-json, idobata, ikachan, imagefile, imkayac, in-udp-event, incremental,

influxdb, influxdb_metrics, inline-classifier, irc, jabber, json-api, json-nest2flat, jsonbucket, jstat, jubatus, jvmwatcher, kafka, kanicounter, keep-forward, kestrel, kibana-

server, kinesis-alt, latency, leftronic, librato-metrics, loggly, lossycount, mackerel, mail, map, measure_time, mecab, metricsense, mixi_community, mixpanel, mobile-carrier, mongo,

mongo-typed, mongokpi, mqtt, msgpack-rpc, mssql, multiprocess, munin, mysql, mysql-binlog, mysql-bulk, mysql-load, mysql-prepared-statement, mysql-query, mysql-replicator,

mysqlslowquery, mysqlslowquerylog, nats, network-probe, nginx-status, nicorepo, norikra, notifier, numeric-counter, numeric-monitor, onlineuser, openldap-monitor, opentsdb, order,

out-http, out-http-buffered, out-solr, parser, pgdist, pghstore, pgjson, ping-message, postgres, qqwry, rambler, rawexec, rds-log, rds-slowlog, reassemble, record

http://www.fluentd.org/plugins

14年6月26日木曜日

Page 28: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd patterns

14年6月26日木曜日

Page 29: Fluentd: Data streams in Ruby world #rdrc2014

1.read logs from fileand write these on storages

file in_tailread, parse

out_fileformat, write

file

14年6月26日木曜日

Page 30: Fluentd: Data streams in Ruby world #rdrc2014

1.read logs from fileand write these on storages

fileread, parse insert

MongoDBout_mongo

https://github.com/fluent/fluent-plugin-mongo

in_tail

14年6月26日木曜日

Page 31: Fluentd: Data streams in Ruby world #rdrc2014

1.read logs from fileand write these on storages

fileread, parse

out_mysqlinsert

MySQL

https://github.com/tagomoris/fluent-plugin-mysql

in_tail

14年6月26日木曜日

Page 32: Fluentd: Data streams in Ruby world #rdrc2014

1.read logs from fileand write these on storages

fileread, parse

out_elasticsearch

sendElasticsearch

https://github.com/uken/fluent-plugin-elasticsearch

in_tail

14年6月26日木曜日

Page 33: Fluentd: Data streams in Ruby world #rdrc2014

1.read logs from fileand write these on storages

fileread, parse

out_webhdfsformat, write

Hadoop HDFS

https://github.com/fluent/fluent-plugin-webhdfs

in_tail

14年6月26日木曜日

Page 34: Fluentd: Data streams in Ruby world #rdrc2014

1.read logs from fileand write these on storages

fileread, parse

out_s3format, write

Amazon S3

https://github.com/fluent/fluent-plugin-s3

in_tail

14年6月26日木曜日

Page 35: Fluentd: Data streams in Ruby world #rdrc2014

1.read logs from fileand write these on storages

fileread, parse

out_redshiftinsert

Amazon Redshift

https://github.com/hapyrus/fluent-plugin-redshift

in_tail

14年6月26日木曜日

Page 36: Fluentd: Data streams in Ruby world #rdrc2014

1.read logs from fileand write these on storages

fileread, parse

out_bigqueryinsert

Google BigQuery

https://github.com/tagomoris/fluent-plugin-bigquery

in_tail

14年6月26日木曜日

Page 37: Fluentd: Data streams in Ruby world #rdrc2014

2.receive and forward datafrom/to other node

forward

forward

forward

inputevents

inputevents

outputevents

fluent-logger-ruby

fluent-logger-java

...

send events over TCP

14年6月26日木曜日

Page 38: Fluentd: Data streams in Ruby world #rdrc2014

2.receive and forward datafrom/to other node

forward

forward

forward

load balance, active-standby forward

forward

forward

14年6月26日木曜日

Page 39: Fluentd: Data streams in Ruby world #rdrc2014

datacenter

2’.receive and forward datafrom/to other node, over internet & SSL

secure-forwardsecure-forward

datacenter

secure-forward

send events over SSLwith authentication

https://github.com/tagomoris/fluent-plugin-secure-forward

14年6月26日木曜日

Page 40: Fluentd: Data streams in Ruby world #rdrc2014

3.connect with other middleware

in_syslog

syslog

Flume

Scribe

Kafka

in_flume

in_scribe

in_kafka

out_flume

in_scribe

in_kafka

Flume

Scribe

Kafka

https://github.com/fluent/fluent-plugin-flumehttps://github.com/fluent/fluent-plugin-scribehttps://github.com/htgc/fluent-plugin-kafka/

14年6月26日木曜日

Page 41: Fluentd: Data streams in Ruby world #rdrc2014

4.copy events

forward copy

forward

webhdfs Hadoop HDFS

14年6月26日木曜日

Page 42: Fluentd: Data streams in Ruby world #rdrc2014

5.count events by string values

forward any outputs

count recordsby regexp patterns

events

{ “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ...}

datacounter

https://github.com/tagomoris/fluent-plugin-datacounter

14年6月26日木曜日

Page 43: Fluentd: Data streams in Ruby world #rdrc2014

5.count events by numeric values

forward numeric-counter any outputs

count recordsby numerical range

https://github.com/tagomoris/fluent-plugin-numeric-counter

events

{ “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ...}

14年6月26日木曜日

Page 44: Fluentd: Data streams in Ruby world #rdrc2014

5.aggregate numeric values

forward numeric-monitor any outputs

calculate real-time metricsof numeric values

{ “max”: 128, “min”: 16, “avg”: 64.0, “sum”: 1024, “num”: 20, “percentile_50”: 48, “percentile_90”: 112, ...}

https://github.com/tagomoris/fluent-plugin-numeric-monitor

events

14年6月26日木曜日

Page 45: Fluentd: Data streams in Ruby world #rdrc2014

6.various inputs: Linux performance (dstat)

in_dstatdstat

collect server performance data

https://github.com/shun0102/fluent-plugin-dstat14年6月26日木曜日

Page 46: Fluentd: Data streams in Ruby world #rdrc2014

6.various inputs: SQL execution

in_sql

input from SELECT

RDBMS

https://github.com/fluent/fluent-plugin-sql

14年6月26日木曜日

Page 47: Fluentd: Data streams in Ruby world #rdrc2014

6.various inputs: external command

in_execany commands

input from STDOUT of any commands

14年6月26日木曜日

Page 48: Fluentd: Data streams in Ruby world #rdrc2014

7.various outpus: notification on IRC

out_ikachan

notice on IRC channel

IRC

https://github.com/tagomoris/fluent-plugin-ikachan

14年6月26日木曜日

Page 49: Fluentd: Data streams in Ruby world #rdrc2014

7.various outpus: notification on IRC

out_ikachan

notice on IRC channel

IRC

https://github.com/tagomoris/fluent-plugin-ikachan

14:56 ikachan: HTTP status_4xx crit [2014-06-23 14:56:29 +0900] serviceX: 100.00 (threshold 75.0) http://graph.tool.local/view_graph/accesslog/httpstatus/serviceX_4xx_percentage14:57 kazeburo: ↑ 40x 100%...

14年6月26日木曜日

Page 50: Fluentd: Data streams in Ruby world #rdrc2014

7.various outpus: notification on HipChat

out_hipchat

notice on HipChat

HipChat

https://github.com/hotchpotch/fluent-plugin-hipchat

14年6月26日木曜日

Page 51: Fluentd: Data streams in Ruby world #rdrc2014

7.various outpus: graph tools

out_growthforecast

POST data into graph tools

GrowthForecastor

Focuslight

https://github.com/tagomoris/fluent-plugin-growthforecast

14年6月26日木曜日

Page 52: Fluentd: Data streams in Ruby world #rdrc2014

7.various outpus

out_growthforecast

POST data into graph tools

GrowthForecastor

Focuslight

https://github.com/tagomoris/fluent-plugin-growthforecast

14年6月26日木曜日

Page 53: Fluentd: Data streams in Ruby world #rdrc2014

7.various outpus: external command

out_exec any commands

output into STDIN of any commands

14年6月26日木曜日

Page 54: Fluentd: Data streams in Ruby world #rdrc2014

8. filters:stream processing: external command

any inputs any outputs

format & writeinto STDIN

exec_filter

any commands

read & parsefrom STDOUT

read from STDINdo WHATEVER you want

write into STDOUT

ex: tail -f | grep ... | sed ... | cat

events

14年6月26日木曜日

Page 55: Fluentd: Data streams in Ruby world #rdrc2014

8. filters: stream processing w/ external server RPC

any inputs any outputs

send

out_norikra

fetch

stream processing w/ SQL

in_norikra

http://norikra.github.io/

SELECT stage, score, COUNT(*) AS cFROM results.win:time_batch(1 min)WHERE stage > 1 AND user.validGROUP BY stage, score

events

14年6月26日木曜日

Page 56: Fluentd: Data streams in Ruby world #rdrc2014

... And,Fluentd does

error handling and retriesfor all of these plugins!

14年6月26日木曜日

Page 57: Fluentd: Data streams in Ruby world #rdrc2014

Before Fluentd: CHAOSAccess logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

file rubylogger

filelogger ruby

cmd

ruby

14年6月26日木曜日

Page 58: Fluentd: Data streams in Ruby world #rdrc2014

After Fluentd: ControllableAccess logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

14年6月26日木曜日

Page 59: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd: Now and then

14年6月26日木曜日

Page 60: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd versions

Latest: v0.10.50

released on Jun 17, 2014

v0.10.x: Stable versions

many minor feature updates, bug fixes

new features for v1

14年6月26日木曜日

Page 61: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd v1Planned as the first major release

someday in 2014 (?)

100% Compatible with v0.10.x

New (and additional) features on v1.x loadmap

https://github.com/fluent/fluentd/issues/251

new configuration syntax, plugin backends

daemon process management

multi core CPU supports

14年6月26日木曜日

Page 62: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd on JRuby

Under development!

trying to fix Cool.io to support JRuby

14年6月26日木曜日

Page 63: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd on Windows

Under development!

“windows” branch on github fluent/fluend

14年6月26日木曜日

Page 64: Fluentd: Data streams in Ruby world #rdrc2014

Use case in LINE

14年6月26日木曜日

Page 65: Fluentd: Data streams in Ruby world #rdrc2014

Analytics data flow overview

servers FluentdCluster

archive

visualization

notifications

Hadoop

Fluentd

Norikra

applicationmetrics

14年6月26日木曜日

Page 66: Fluentd: Data streams in Ruby world #rdrc2014

servers FluentdCluster

archive

visualization

notifications

Hadoop

Fluentd

Norikra

applicationmetrics

delivery/stream-map

aggregate/stream-reduce

14年6月26日木曜日

Page 67: Fluentd: Data streams in Ruby world #rdrc2014

archive

visualization

notifications

Hadoop

Norikra

applicationmetrics

fluent-agent-lite

non-parsed raw logsnon-parsedaccess logs

deliver: receive/archive/load-balance

worker:parse/store/forward

watcher: monitor/notify

cep:general-purpose

stream processing

14年6月26日木曜日

Page 68: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd cluster statistics

Fluentd nodesaccess/application logs from 600+ nodesreceiver: 5 server (60 process)parser/converter: 10 server (90 process)stream processing: 3 server

14年6月26日木曜日

Page 69: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd cluster statistics

Daily:5.5+ Billion events, 1.5TB+ data

Peak time:150,000+ events /sec, 300+ Mbps

14年6月26日木曜日

Page 70: Fluentd: Data streams in Ruby world #rdrc2014

Fluentd is the best partnerfor stream-processing newbiesand rubyists!

Check out sites and code!http://fluentd.org/

https://github.com/fluent/fluentd

14年6月26日木曜日

Page 71: Fluentd: Data streams in Ruby world #rdrc2014

FAQ

14年6月26日木曜日

Page 72: Fluentd: Data streams in Ruby world #rdrc2014

Fault-tolerance?

Node level fault-tolerance

File buffer: processing data can be serialized on disk

Cluster level fault-tolerance

Copy + Forward(load balance, active-standby)

Event level assurance: ACK?

NO (for performance reason)

14年6月26日木曜日

Page 73: Fluentd: Data streams in Ruby world #rdrc2014

Performance?

NOT SO BAD:

real throughput depends on plugin/configuration

simple event transferring: 10-20k events/sec

14年6月26日木曜日

Page 74: Fluentd: Data streams in Ruby world #rdrc2014

vs Scribe? vs Flume?

14年6月26日木曜日

Page 75: Fluentd: Data streams in Ruby world #rdrc2014

vs Storm?

14年6月26日木曜日

Page 76: Fluentd: Data streams in Ruby world #rdrc2014

Eco-system? Clones?

ik

fluent-agent-lite

fluenpy

14年6月26日木曜日