aws summit seoul 2015 - 게임 서비스 혁신을 위한 데이터 분석

43
SEOUL © 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Upload: amazon-web-services-korea

Post on 16-Jul-2015

760 views

Category:

Technology


19 download

TRANSCRIPT

SEOUL

© 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

© 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

AWS 로게임분석하기

김일호 - Solution Architect, Amazon Web Services

Agenda

• 실시간데이터분석– 데이터추출및저장

– 데이터처리

• GREE International– 게임데이터분석

– 데이터분석아키텍쳐

– 팁, 참고사항

• 주요요약

실시간데이터분석이란?

실시간데이터추출

• 높은확장성

• 데이터안정성

• 리소스유연성

• 동시또는여러번의데이터읽기가능

지속적인데이터처리

• 스트림데이터의로드발란싱

• 가용성, 체크포인트 / 리플레이

• 리소스유연성

• 다수의어플리케이션의병령처리가능

지속적인데이터흐름및처리

낮은 end-to-end 응답성

실시간워크로드지속

+

데이터추출

Global top-10

foo-analysis.com

단순한예제...

Global top-10

(당연히) 워크로드분산…

foo-analysis.com

Global top-10

Local top-10

Local top-10

Local top-10

또는확장이가능한데이터중개구성…

foo-analysis.com

Global top-10

K

I

N

E

S

I

S

Data

Record

StreamShard

Partition Key

Worker

My top-10

Data RecordSequence Number

14 17 18 21 23

Amazon Kinesis –관리형데이터스트림저장소

foo-analysis.com

AW

S E

nd

po

int

S3

DynamoDB

Redshift

Data

Sources

Availability

Zone

Availability

Zone

Data

Sources

Data

Sources

Data

Sources

Data

Sources

Availability

Zone

Shard 1

Shard 2

Shard N

[Data

Archive]

[Metric

Extraction]

[Sliding Window

Analysis]

[Machine

Learning]

App. 1

App. 2

App. 3

App. 4

EMR

Amazon Kinesis –다양한목적의데이터중개

Kinesis – Stream and Shards

•Stream: 데이터를모으고저장할

수있는 Kinesis stream 개체

•Shards: 처리용량단위

•Put - 1MB/sec OR 1000 TPS

•Get - 2 MB/sec OR 5 TPS

•Shards를늘리거나줄여스케일

가능

• 24시간데이터보관(Window)

Kinesis Stream 용량결정하기

만약 2 개의프로듀서(생성), 각각 2KB 레코드크기로 500 TPS 데이터전송

2 개의 shard 로서비스를시작할수있으며, 전체입력 2MB/s, 출력 4 MB/s 의용량

Shard

Shard

2 KB * 500 TPS = 1000KB/s

2 KB * 500 TPS = 1000KB/s

Application

Producers

Kinesis Stream 용량결정하기

만약 3 개의데이터컨슈머(처리) 어플리케이션이동작한다면

Shard를스트림에간단히! 추가하여로드를분산가능

Shard

Shard

2 KB * 500 TPS = 1000KB/s

2 KB * 500 TPS = 1000KB/s

Application

Application

ApplicationProducers Shard

Amazon Kinesis –분산스트림

• 배치및연속실시간처리모두가능

• 데이터를잃지않으면서간단히스트림용량확장및축소

• 다양한목적의어플리케이션들이 24시간내의데이터를지속적으로접근하여처리가능

• GB/sec 단위로스케일확장가능하며데이터를안전하게저장– 레코드는여러 Availability zones 에저장

• 동시에여러어플리케이션으로데이터처리가능– RDBMS, S3, Data Warehouse

데이터처리

Batch

Micro

Batch

Real

Time

실시간분석패턴…

Batch

AnalysisDW

Hadoop

Notifications

& Alerts

Dashboards/

visualizations

APIsStreaming

Analytics

Data

Streams

Deep Learning

Dashboards/

visualizations

Spark

Storm

KCL

Data

Archive

실시간분석방식

• Streaming– 지속적으로수초내에이벤트발생, (예,Transaction log를분석하여금융사기를검출)

• Micro-batch– 수분내에비지니스임팩이있을수있는정보획득, (타지역의

Transaction log 분석, 특이점검출)Kinesis

Client

Library

Kinesis Client Library (KCL)

• 다수의 Shard로부터분산하여데이터를읽을수있도록도와줌

• Failure 예외처리

• Shard 를유연하게조절

• Worker 스케일,

Checkpoint 등의분산처리지원

KCL Design Components

• Worker:- 각각의어플리케이션인스턴스에실행되는단위프로세스

• Record processor:- Kinesis stream의샤드에서데이터를실제읽어처리하는단위프로세스

• Check-pointer: 샤드에서레코드가이미처리상태, 여부등을추적

만약 Worker 에문제가발생하였을경우마지막처리중이던레코드부터다시처리를지속할수있도록 KCL은프로세스를재시작함

Amazon Kinesis Connector Library

• Amazon S3

– S3로파일을생성하여아카이브

– 레코드를모아순서대로파일명명하여저장

• Amazon Redshift

– Micro-batching 직접 Redshift DW로저장(Manifest 지원)

– 원하는포맷으로메시지변환

• Amazon DynamoDB

– BatchPut API로직접 DB 테이블에입력

– 원하는포맷으로메시지변환

• Elasticsearch

– Elasticsearch cluster로데이터직접입력

– 원하는포맷으로메시지변환

S3 Dynamo DB Redshift

Kinesis

데이터를직접

Kinesis에서읽어 Hive,

Pig, Streaming,

Cascading 방식으로처리

실시간데이터소스를바로읽어배치

프로세싱동시에다른어플리케이션에서도

데이터를읽어처리가능

EMR, Kinesis 통합

DStream

RDD@T1 RDD@T2

Messages

Receiver

Spark Streaming –기본개념

• 메시지들을 Discretized Streams of Dstreams로추상화

• RDDs 단위순서로처리

http://spark.apache.org/docs/latest/streaming-kinesis-integration.html

Apache Storm: 기본개념

• Streams: 순서에제한없는데이터 tuples

• Spout: 스트림소스

• Bolts :입력스트림처리및새로운결과스트림생성

• Topologies : Spouts, Bolts 연계구성

https://github.com/awslabs/kinesis-storm-spout

Batch

Micro

Batch

Real

Time

전체아키텍쳐

Producer Amazon

Kinesis

App Client

EMRS3

KCL

DynamoDB

Redshift

BI Tools

KCL

KCL

GREE 게임데티어분석

GREE Headquarters

Tokyo, Japan

GREE International,

Inc.

SEOUL, CA

GREE Canada

Vancouver, BC

QUICK FACTS

6Continents playing GREE games

1,882Employees Worldwide

13Games made in North America

2004

2011

2013

MILESTONES GAME STATS - 4 titles in top 100 grossing*

Crime City (Studios)

Reached Top 10 Grossing in 140 countries

Top 100 Grossing in 19 countries, over 3 years

since launch

*As of Sep. 2014 – Source: App Annie

A Global Gaming Powerhouse

Knights & Dragons (Publishing)

Reached Top 10 Grossing in 41 countries

Top 100 Grossing in 22 countries

Ad Clicks

Downloads

Perf Data

Attribution

Campaign Performance

SC Balance

HC Balance

IAP

Player Targeting

GREE 데이터분석흐름

Data Collection

데이터소스

• Mobile Devices

• Game Servers

• Ad Networks

데이터크기및증가

• 500G+/day

• 500M+ events/day

• Size of event ~ 1 KB

분석데이터내용

{"player_id":"323726381807586881","player_level":169,"device":"iPhone 5","version":"iOS 7.1.2”,"platfrom":"ios","client_build":"440”,"db":”mw_dw_ios","table":"player_login","uuid":"1414566719-rsl3hvhu7o","time_created":"2014-10-29 00:11:59”}

Key Requirements

• 항시데이터전송보장

• Zero 데이터손실

• Zero 데이터회손

• 쉽게처리프로세싱추가

• 준실시간이분석

• 실시간분석

• 관리운영최소화

데이터분석아키텍쳐

Game DB

Game

ServersKinesis

Amazon

S3

Amazon

S3Amazon

Redshift

S3

Consumer

Amazon

EMR

DSV

JSON

분석아키텍쳐

DashboardReal-time

Stats

Consumer

ElastiCache

(Redis)

Sender

Kinesis

Stream

Shard 1

Shard 2

Shard 3

Shard n

Describe Stream

Sync Shards

Analytics

Files

Send

PutRecord

Read Buffer

Amazon Kinesis Sender

Compress

50KB

데이터생성및전송시고려할점

• 단일스트림 VS 게임마다다른스트림

• 일괄전송 VS 이벤트마다전송

• 압축 VS 비압축

• PartitionKey VS ExplicitHashKey

Consumer – DSV 포맷으로 S3 에저장과정

Kinesis

Stream

Shard 1

Shard 2

Shard n

S3File Metadata DB

Decompress De-Dupe

BufferDSV Transformation

Validation Target Table

Compress

Size/

Timeout

Record

Consumer

Kinesis Client Library

Record Processor

Record Processor

Consumer

Kinesis Client Library

Record Processor

Auto Scaling Group

Amazon Redshift 로바로데이터적제

Amazon S3

File Metadata DBAmazon

Redshift

Update Status

Transaction

Create Manifest Execute COPY

Create Manifest Execute COPY

Status

Create Manifest Execute COPY

Consumer –실시간 Stats 보드

Kinesis

Stream

Shard 1

Shard 2

Shard n

Decompress De-Dupe

Target TableRecordConsumer

Kinesis Client Library

Record Processor

Record Processor

Consumer

Kinesis Client Library

Record Processor

Auto Scaling Group

ConfigurationMetric, Segment &

Value, Timeslot

Filter Events

ElastiCache

(Redis)Dashboard

팁, 참고사항

팁, 참고사항

Sender

• 데이터생성과전송을분리

• 압축사용및배치로전송

• PutRecord HTTP:5X 데이터중복전송가능성

• ProvisionedThroughputExceeded 모니터링필요

팁, 참고사항

Consumer

• KCL꼭사용!

• 워크로드를모니터링하고오토스케일링사용

Overall

• 충분한용량의샤드구성

• 안전한서비스또는어플리케이션종료구성

• AWS best practices 따라에러처리(재시도) 및예외

처리구성

주요요약

주요요약

Kinesis

• Data available for processing within seconds

• Robust API, KCL, and Connector libraries

AWS

• Managed

• Scalable

• Cost effective

• Quick to get up and running

SEOUL