retrospection / prospection and schema

15
Retrospection / prospection and schema TAGOMORI Satoshi (@tagomoris) LINE Corp. 2014/01/31 (Fri) at University of Tsukuba the 1st half 14131日金曜日

Upload: satoshi-tagomori

Post on 08-May-2015

2.889 views

Category:

Technology


0 download

DESCRIPTION

筑波大学 集中講義資料 2014/01/31

TRANSCRIPT

Page 1: Retrospection / prospection and schema

Retrospection / prospectionand schema

TAGOMORI Satoshi (@tagomoris)LINE Corp.

2014/01/31 (Fri) at University of Tsukubathe 1st half

14年1月31日金曜日

Page 2: Retrospection / prospection and schema

TAGOMORI Satoshi (@tagomoris)LINE Corp.

Development Support Team

14年1月31日金曜日

Page 3: Retrospection / prospection and schema

14年1月31日金曜日

Page 4: Retrospection / prospection and schema

14年1月31日金曜日

Page 5: Retrospection / prospection and schema

Logs

Service metrics (Users, PageViews, ...)

UX/UI metrics (Access path, Taps/views, ...)

Monitoring metrics (Traffic Gbps, TBytes/day, ...)

System monitoring (Error rates, Response time, ...)

14年1月31日金曜日

Page 6: Retrospection / prospection and schema

Software for Logging

Collection: Fluentd, Scribed, Flume, LogStash, ...

Storage: RDBMS, Hadoop HDFS, NoSQLs, Elasticsearch, ....

Processing: SQL, Hadoop MapReduce(Hive), Presto, Impala, ... Stream-Processing: Storm, Kafka, Norikra, ...

Visualization: Kibana, Tableau Fnordmetric, GrowthForecast, Focuslight, ...

Appliance: DHW + BI Tools

Services: Google BigQuery, Treasure Data, ...

14年1月31日金曜日

Page 7: Retrospection / prospection and schema

How inspect logs

Retrospection (reactive search)

Store data, and search

Prospection (proactive search)

Define what should be processed, and store data

14年1月31日金曜日

Page 8: Retrospection / prospection and schema

What logs inspected

Schema-full data:

strict schema: pre defined fields w/ types (or reject)

schema on read: try to read known fields (or ignore)

Schema-less data:

any fields (or ignore), any types (implicit/explicit conversion)

fit for services in-development (all internet services!)

14年1月31日金曜日

Page 9: Retrospection / prospection and schema

How/what

How\What Schema-full Schema-less

RetrospectRDBMS,

Hive, BigQuery,Cassandra, HBase, ...

MongoDB,Hive(SerDe), TD,Plain text file, ...

ProspectEsper,

many of stream CEPs,...

Norikra, ...

14年1月31日金曜日

Page 10: Retrospection / prospection and schema

Data size: schema & indexLogs: size is always important (xTB - xPB)Schema:

size optimizationaccess optimization on memory/disk

Index:access optimization on memory/diskmore memory/disk requiredhard to distribute

14年1月31日金曜日

Page 11: Retrospection / prospection and schema

Query response improvementsof retrospection

Schema-full + indexed (RDBMS)

Query plan optimization

Schema on read

I/O and Task size optimization & scale out

Schema-less + indexed (Mongo)

mmap-ed index & data (!)

14年1月31日金曜日

Page 12: Retrospection / prospection and schema

Query response improvementsof prospection

Time window + incremental calculation

Stream processing engines

14年1月31日金曜日

Page 13: Retrospection / prospection and schema

Stream processingand data size

No disks: reduction of failure points

Less memory:

size of just processing and I/O buffers

aggregation results

Easy to distribute:

stream duplication

stream splitting by aggregation key

14年1月31日金曜日

Page 14: Retrospection / prospection and schema

Stream processing and schema

Stream processing: query -> data

Prospective schema by queries:

Queries know required fields and its types

Unused fields can be ignored

Implicit type conversion available

Schema-less data + schema-full queries

14年1月31日金曜日

Page 15: Retrospection / prospection and schema

My goal:Schema-less data stream + schema-full queries

It’s Norikra!

14年1月31日金曜日