retrospection / prospection and schema
DESCRIPTION
筑波大学 集中講義資料 2014/01/31TRANSCRIPT
Retrospection / prospectionand schema
TAGOMORI Satoshi (@tagomoris)LINE Corp.
2014/01/31 (Fri) at University of Tsukubathe 1st half
14年1月31日金曜日
TAGOMORI Satoshi (@tagomoris)LINE Corp.
Development Support Team
14年1月31日金曜日
14年1月31日金曜日
14年1月31日金曜日
Logs
Service metrics (Users, PageViews, ...)
UX/UI metrics (Access path, Taps/views, ...)
Monitoring metrics (Traffic Gbps, TBytes/day, ...)
System monitoring (Error rates, Response time, ...)
14年1月31日金曜日
Software for Logging
Collection: Fluentd, Scribed, Flume, LogStash, ...
Storage: RDBMS, Hadoop HDFS, NoSQLs, Elasticsearch, ....
Processing: SQL, Hadoop MapReduce(Hive), Presto, Impala, ... Stream-Processing: Storm, Kafka, Norikra, ...
Visualization: Kibana, Tableau Fnordmetric, GrowthForecast, Focuslight, ...
Appliance: DHW + BI Tools
Services: Google BigQuery, Treasure Data, ...
14年1月31日金曜日
How inspect logs
Retrospection (reactive search)
Store data, and search
Prospection (proactive search)
Define what should be processed, and store data
14年1月31日金曜日
What logs inspected
Schema-full data:
strict schema: pre defined fields w/ types (or reject)
schema on read: try to read known fields (or ignore)
Schema-less data:
any fields (or ignore), any types (implicit/explicit conversion)
fit for services in-development (all internet services!)
14年1月31日金曜日
How/what
How\What Schema-full Schema-less
RetrospectRDBMS,
Hive, BigQuery,Cassandra, HBase, ...
MongoDB,Hive(SerDe), TD,Plain text file, ...
ProspectEsper,
many of stream CEPs,...
Norikra, ...
14年1月31日金曜日
Data size: schema & indexLogs: size is always important (xTB - xPB)Schema:
size optimizationaccess optimization on memory/disk
Index:access optimization on memory/diskmore memory/disk requiredhard to distribute
14年1月31日金曜日
Query response improvementsof retrospection
Schema-full + indexed (RDBMS)
Query plan optimization
Schema on read
I/O and Task size optimization & scale out
Schema-less + indexed (Mongo)
mmap-ed index & data (!)
14年1月31日金曜日
Query response improvementsof prospection
Time window + incremental calculation
Stream processing engines
14年1月31日金曜日
Stream processingand data size
No disks: reduction of failure points
Less memory:
size of just processing and I/O buffers
aggregation results
Easy to distribute:
stream duplication
stream splitting by aggregation key
14年1月31日金曜日
Stream processing and schema
Stream processing: query -> data
Prospective schema by queries:
Queries know required fields and its types
Unused fields can be ignored
Implicit type conversion available
Schema-less data + schema-full queries
14年1月31日金曜日
My goal:Schema-less data stream + schema-full queries
It’s Norikra!
14年1月31日金曜日