google for モバイル アプリ 16:00: モバイル kpi 分析の新標準 fluentd + google big...

Post on 20-Jul-2015

322 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

モバイル KPI 分析の新標準Fluentd + Google BigQueryCloud Platformチームデベロッパーアドボケイト佐藤一憲

#gcpライブ

+Kazunori Sato@kazunori_279

Developer Advocate,

Cloud Platform, Google Inc

- GCP developer community support- GCP product launch support

agenda

Big Data in Google and Google BigQuery

Why BigQuery is so fast?

Real-time Streaming Import by Fluentd + BigQuery

Real-time KPI analytics by Lambda Architecture

Big Data in Google and

Google BigQuery

100 hours/min

100 petabytes

500+ million users

900+ million devices

Big Data in Google

At Google, we have “big” big data everywhere

What if a Googler is asked:“Can you give me the list of top 20 Android apps installed in 2012?”

In Google, we don’t use MapReduce for this

We use Dremel

Google BigQuery

FROM installlog.2012

ORDER BY

count DESC

It scans 100B rows in ~30 sec,No index used.

Google BigQuery: Massively Parallel Query Service

Storage: $0.020 per GB per month

Queries: $5 per TB

Cost of BigQuery

Gaming, Social, Mobile

Ads, Digital Marketing, DMP,

Media

Monitoring, Alerting and Security

Retails

Internet of Things (IoT)

Applications

BigQuery Analytic Service in the Cloud

BigQuery

R and Pandas

Microsoft Excel

Google Spreadsheet

Hadoop/Hive

Spark

Adwords

DoubleClick

Google Analytics

Event Logs,

Databases

IoT Devices

Analyze Export

BI Tools

Import

Import, Analyze and Export

Tableau

Demo

BIME Demo

BigQuery + BI

Why BigQuery is so fast?

Column Oriented Storage

Record Oriented Storage Column Oriented Storage

Less bandwidth, More compression

select top(title), count(*)

from publicdata:samples.wikipedia

Massively Parallel Processing

Scanning 1 TB in 1 sectakes 5,000 disks

Each query runs on thousands of servers

Fast aggregation by tree

structureMixer 0

Mixer 1 Mixer 1

Shard Shard Shard Shard

ColumnIO on Colossus SELECT state, year

COUNT(*)

GROUP BY state

WHERE year >= 1980 and year < 1990

ORDER BY count_babies

DESC

LIMIT 10

COUNT(*)

GROUP BY state

Inside BQ: Big JOIN

Big JOIN: executed with shuffling

- Both tables can be > 8MB

- BQ shuffler doesn’t sort; just hash partitioning

From: Google BigQuery Analytics

Real-time Streaming Import

with Fluentd + BigQuery

“I want a real-time dashboard for collecting the votes and system stats from 200 servers”

BigQuery Streaming

Low cost: $0.01

per 100,000 rows

Real time

availability of data

100,000 rows per

second x tables

Slideshare uses Fluentd for collecting logs from >500 servers."We take full advantage of its extendable plugin architecture and use it as a message bus that collects data from

hundreds of servers into multiple backend systems." Sylvain Kalache, Operations Engineer

Why Fluentd? Because it’s super easy to use, and has extensive plugins written by active community.

Now Fluentd logs can be imported to BigQuery really easy, ~1M rows/s

Search “fluentd bigquery” on GitHub

Google Spreadsheet

IoT Example: RasPi > BigQuery > Spreadsheet

Real-time KPI Analytics with

Lambda Architecture

Lambda Architecture is:A complementary pair of:

- in-memory real-time processing

- large HDD/SSD batch processing

Proposed by Nathan

Marz

ex. Twitter

Summingbird

Slow, but large and persistent.

Fast, but small and volatile.

Norikra: an open source stream processing toolProduction use at LINE, the largest asian SNS with 500M users, for massive log

analysisSuper easy to use: requires no heavy-weighted cluster set-up

Real-time KPI analysis with SQL-based in-memory continuous query

Proposed Solution: Lambda Architecture

Proposed Solution: Lambda Architecture

Fluentd: event log collection from various event sources

Norikra: easy, scalable real time stream processing

BigQuery: scalable query engine for large datasets

1

2

3

Google Spreadsheet: flexible dashboard with charts

Docker: repeatable deployment in 10 minutes

4

5

● Gaming: How many new users has purchased the first item in last 10 minutes?

● Media: How many people hit the vote button during the live TV program?

● Retail: What is the current total revenue of all stores nationwide?

● Ads: What is the conversion rate of impressions/clicks to purchase?

● Co-relate system resource usage with access/application logs

● Real-time DoS or cheating detection

● Send e-mail notification from Apps Script triggered by Norikra

Real-time KPI Dashboard

Real-time Monitoring and Alerting

Applications

Easy real-time SQL-based KPI analytics

at 1M+ rows/sec by Norikra

Easy real-time streaming import

at 1M+ rows/sec by BigQuery + Fluentd

Search “lambda dashboard” on GitHub

Solution Benefits

Real-time dashboard with Google

SpreadsheetDeployable within 10 min with Docker

top related