technologies, data analytics service and enterprise business

26
Technologies, Data Analytics Service and Enterprise Businesses SENDAI IT COMMUNE #2 2018-01-09 Satoshi Tagomori (@tagomoris) Treasure Data, Inc.

Upload: satoshi-tagomori

Post on 23-Jan-2018

551 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Technologies, Data Analytics Service and Enterprise Business

Technologies, Data Analytics Service and Enterprise Businesses

SENDAI IT COMMUNE #2 2018-01-09

Satoshi Tagomori (@tagomoris) Treasure Data, Inc.

Page 2: Technologies, Data Analytics Service and Enterprise Business

Satoshi Tagomori (@tagomoris)

Fluentd, MessagePack-Ruby, Norikra, Woothee, ...

Treasure Data, Inc.

Page 3: Technologies, Data Analytics Service and Enterprise Business
Page 4: Technologies, Data Analytics Service and Enterprise Business

Retry-able Failures or Not

Idempotent Operations: (冪等な操作)べきとう

Page 5: Technologies, Data Analytics Service and Enterprise Business

Technologies Data Analytics Service Enterprise Business

Page 6: Technologies, Data Analytics Service and Enterprise Business

Technologies ↓ Data Analytics Service ↓ Enterprise Business

Page 7: Technologies, Data Analytics Service and Enterprise Business

Enterprise Business ?

• Many different definitions and discussions about "Enterprise"... :(

• MY DEFINITION IN THIS TALK: "Businesses NOT about IT"

• Thus, most of businesses are "Enterprise", everywhere, not only in Tokyo

Page 8: Technologies, Data Analytics Service and Enterprise Business

Data Analytics Service ?

• Provides ways to know: • How many people are reaching our products? • How many times are they seeing our advertisements?

• And how many times do they buy our products? • When are they use our products? • When did they buy our products? • Where did they buy our products? • ...

• Something helps our business using data

Page 9: Technologies, Data Analytics Service and Enterprise Business

Data Analytics Service for Enterprise Business ?

• Something helps "Business not about IT", using data (IT)

• Staffs (using data analytics service) doesn't know about IT • and also don't take care about IT • but "need" result of analytics

• Everyone are checking report about yesterday at 10:00 AM • We need results before 10:00AM • 10:10 AM is too late, but 2:00 AM is too early...

Page 10: Technologies, Data Analytics Service and Enterprise Business

Deadline and Retries

Big Job: Power 1

10:00AM00:00AM 05:30AM01:00AM

Big Job: Power 1Crash! Delay...

Big Job: Power 2

Big Job: Power 2Crash! OK!

Small Jobs: Power 1

Small Jobs: Power 1Crash! OK!

Page 11: Technologies, Data Analytics Service and Enterprise Business

Missions of Data Analytics Service for Enterprise Business

Fast "enough" Cheap "enough"

Stable Easy to use "enough"

Page 12: Technologies, Data Analytics Service and Enterprise Business

Technologies for Data Analytics Service

• Data Management System

• Distributed Processing System

• Queue and Scheduler

• Connecting Systems and Services

• Controlling Jobs, Tasks and Workflows

• Managing Retries

Page 13: Technologies, Data Analytics Service and Enterprise Business

Data Management Systems

• Data Collecting Systems • Fluentd, Embulk, ...

• Distributed Database and Storage • Storing data in efficient format (MPC1, MessagePack columnar format) • Managing index • Managing schema • Providing transactional operations

Page 14: Technologies, Data Analytics Service and Enterprise Business

Distributed Processing System

• Running Analytics Queries • MapReduce engines: Hadoop + Hive • MPP (Massive Parallel Processing systems): Presto

• Running Data Management Jobs • Converting data formats, re-index, detecting schema, ...

• Computing Resource Management • Customer queries (and internal use) must be separated!

Page 15: Technologies, Data Analytics Service and Enterprise Business

Queue and Scheduler

• Queuing Queries • Allow to enqueue queries, run these next-to-next

Power 1

CustomerRequest

• Scheduling Queries • Run queries when it's ok to run

Data for Queries

01:00AM 03:00AM

Page 16: Technologies, Data Analytics Service and Enterprise Business

Connecting Systems and Services

• Non-"connected" Data Analytics Service

Ultra Super GreatAnalytics Service

Database QueryResult

Not "easy enough"

Page 17: Technologies, Data Analytics Service and Enterprise Business

Connecting Systems and Services

• Data Analytics Service MUST be "connected"

Treasure DataDatabase

QueryResult

Page 18: Technologies, Data Analytics Service and Enterprise Business

Control Jobs/Tasks

• A Job needs results of other jobs

"Risky"Time based schedule

A,B,C -> D,E -> F

01:00AM

03:10AM ?

03:30AM06:30AM ?

07:00AM 10:00AM

"Risky"Time based schedule

A,B,C -> D,E -> F

01:00AM

Crash!

03:30AM

Oops, No Data...

10:00AM

• "Risk" for failures07:00AM

Oops, No Data...

08:15AM ?

Page 19: Technologies, Data Analytics Service and Enterprise Business

Control Jobs/Tasks

• A Job needs results of other jobs

Time based scheduleA,B,C -> D,E -> F

01:00AM

03:10AM ?

06:00AM08:30AM ?

11:00AM ???

• "Time based schedule" needs • Wide space for retries • Big resource for fast results (not cheap!)

Space for Retries Space for Retries

Page 20: Technologies, Data Analytics Service and Enterprise Business

Control Jobs/Tasks

• Workflow pattern

Workflow executionA,B,C -> D,E -> F

01:00AM07:15AM ?

10:00AMWorkflow control barriers

Workflow executionA,B,C -> D,E -> F

01:00AM 10:00AMWorkflow control barriers

• Workflow pattern with retries

Crash!

Page 21: Technologies, Data Analytics Service and Enterprise Business

Retries !!!!!!!!!!!!!!!!!!!!!!!!

Page 22: Technologies, Data Analytics Service and Enterprise Business

Retry-able Failures or Not

• "Retry-able Failures" • Crash of compute nodes • Communication errors • Service down of "connected" services • ...

• Non-"Retry-able Failures" • SQL syntax error • Missing data sources / Missing tables • Wrong API key of "connected" services • ...

Page 23: Technologies, Data Analytics Service and Enterprise Business

Table B

Table B

Retry-able Operations ?

• For example.... : • Run Query A • Append result of A into B • Count rows of B

• Failures?: • Run Query A • Append result of A into B ... (Failed!) • Retry Query A • Retry to append result of A into B • Count rows of B

1234

12

1234

Page 24: Technologies, Data Analytics Service and Enterprise Business

Idempotent Operations

• "Idempotent" (冪等である) operation

• can get "same" result when it's executed twice or more

べきとう

Table B

1234

• Idempotent Operation: • Run Query A • "Replace" table B with result of A • Count rows of B

Table B

12

Page 25: Technologies, Data Analytics Service and Enterprise Business

Replay-able Data Analytics Workflow

• Need to do many "try-and-error" • w/ updated queries • w/ updated data...

• Idempotent operations makes workflow "Replay-able" • Fast try-and-error (PDCA!) cycles • → Fast business growth!

Page 26: Technologies, Data Analytics Service and Enterprise Business

Enterprise Business ❤

Technologies

Thank you! @tagomoris