aws summit seoul 2015 - 게임 서비스 혁신을 위한 데이터 분석
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
AWS 로게임분석하기
김일호 - Solution Architect, Amazon Web Services
실시간데이터분석이란?
실시간데이터추출
• 높은확장성
• 데이터안정성
• 리소스유연성
• 동시또는여러번의데이터읽기가능
지속적인데이터처리
• 스트림데이터의로드발란싱
• 가용성, 체크포인트 / 리플레이
• 리소스유연성
• 다수의어플리케이션의병령처리가능
지속적인데이터흐름및처리
낮은 end-to-end 응답성
실시간워크로드지속
+
Global top-10
K
I
N
E
S
I
S
Data
Record
StreamShard
Partition Key
Worker
My top-10
Data RecordSequence Number
14 17 18 21 23
Amazon Kinesis –관리형데이터스트림저장소
foo-analysis.com
AW
S E
nd
po
int
S3
DynamoDB
Redshift
Data
Sources
Availability
Zone
Availability
Zone
Data
Sources
Data
Sources
Data
Sources
Data
Sources
Availability
Zone
Shard 1
Shard 2
Shard N
[Data
Archive]
[Metric
Extraction]
[Sliding Window
Analysis]
[Machine
Learning]
App. 1
App. 2
App. 3
App. 4
EMR
Amazon Kinesis –다양한목적의데이터중개
Kinesis – Stream and Shards
•Stream: 데이터를모으고저장할
수있는 Kinesis stream 개체
•Shards: 처리용량단위
•Put - 1MB/sec OR 1000 TPS
•Get - 2 MB/sec OR 5 TPS
•Shards를늘리거나줄여스케일
가능
• 24시간데이터보관(Window)
Kinesis Stream 용량결정하기
만약 2 개의프로듀서(생성), 각각 2KB 레코드크기로 500 TPS 데이터전송
2 개의 shard 로서비스를시작할수있으며, 전체입력 2MB/s, 출력 4 MB/s 의용량
Shard
Shard
2 KB * 500 TPS = 1000KB/s
2 KB * 500 TPS = 1000KB/s
Application
Producers
Kinesis Stream 용량결정하기
만약 3 개의데이터컨슈머(처리) 어플리케이션이동작한다면
Shard를스트림에간단히! 추가하여로드를분산가능
Shard
Shard
2 KB * 500 TPS = 1000KB/s
2 KB * 500 TPS = 1000KB/s
Application
Application
ApplicationProducers Shard
Amazon Kinesis –분산스트림
• 배치및연속실시간처리모두가능
• 데이터를잃지않으면서간단히스트림용량확장및축소
• 다양한목적의어플리케이션들이 24시간내의데이터를지속적으로접근하여처리가능
• GB/sec 단위로스케일확장가능하며데이터를안전하게저장– 레코드는여러 Availability zones 에저장
• 동시에여러어플리케이션으로데이터처리가능– RDBMS, S3, Data Warehouse
Batch
Micro
Batch
Real
Time
실시간분석패턴…
Batch
AnalysisDW
Hadoop
Notifications
& Alerts
Dashboards/
visualizations
APIsStreaming
Analytics
Data
Streams
Deep Learning
Dashboards/
visualizations
Spark
Storm
KCL
Data
Archive
실시간분석방식
• Streaming– 지속적으로수초내에이벤트발생, (예,Transaction log를분석하여금융사기를검출)
• Micro-batch– 수분내에비지니스임팩이있을수있는정보획득, (타지역의
Transaction log 분석, 특이점검출)Kinesis
Client
Library
Kinesis Client Library (KCL)
• 다수의 Shard로부터분산하여데이터를읽을수있도록도와줌
• Failure 예외처리
• Shard 를유연하게조절
• Worker 스케일,
Checkpoint 등의분산처리지원
KCL Design Components
• Worker:- 각각의어플리케이션인스턴스에실행되는단위프로세스
• Record processor:- Kinesis stream의샤드에서데이터를실제읽어처리하는단위프로세스
• Check-pointer: 샤드에서레코드가이미처리상태, 여부등을추적
만약 Worker 에문제가발생하였을경우마지막처리중이던레코드부터다시처리를지속할수있도록 KCL은프로세스를재시작함
Amazon Kinesis Connector Library
• Amazon S3
– S3로파일을생성하여아카이브
– 레코드를모아순서대로파일명명하여저장
• Amazon Redshift
– Micro-batching 직접 Redshift DW로저장(Manifest 지원)
– 원하는포맷으로메시지변환
• Amazon DynamoDB
– BatchPut API로직접 DB 테이블에입력
– 원하는포맷으로메시지변환
• Elasticsearch
– Elasticsearch cluster로데이터직접입력
– 원하는포맷으로메시지변환
S3 Dynamo DB Redshift
Kinesis
데이터를직접
Kinesis에서읽어 Hive,
Pig, Streaming,
Cascading 방식으로처리
실시간데이터소스를바로읽어배치
프로세싱동시에다른어플리케이션에서도
데이터를읽어처리가능
EMR, Kinesis 통합
DStream
RDD@T1 RDD@T2
Messages
Receiver
Spark Streaming –기본개념
• 메시지들을 Discretized Streams of Dstreams로추상화
• RDDs 단위순서로처리
http://spark.apache.org/docs/latest/streaming-kinesis-integration.html
Apache Storm: 기본개념
• Streams: 순서에제한없는데이터 tuples
• Spout: 스트림소스
• Bolts :입력스트림처리및새로운결과스트림생성
• Topologies : Spouts, Bolts 연계구성
https://github.com/awslabs/kinesis-storm-spout
Batch
Micro
Batch
Real
Time
전체아키텍쳐
Producer Amazon
Kinesis
App Client
EMRS3
KCL
DynamoDB
Redshift
BI Tools
KCL
KCL
• Best Practices for Micro-Batch Loading on Amazon Redshift
• Implement a Real-time, Sliding-Window Application Using Amazon Kinesis and Apache Storm
• Visualizing Real-time, Geotagged Data with Amazon Kinesis
GREE Headquarters
Tokyo, Japan
GREE International,
Inc.
SEOUL, CA
GREE Canada
Vancouver, BC
QUICK FACTS
6Continents playing GREE games
1,882Employees Worldwide
13Games made in North America
2004
2011
2013
MILESTONES GAME STATS - 4 titles in top 100 grossing*
Crime City (Studios)
Reached Top 10 Grossing in 140 countries
Top 100 Grossing in 19 countries, over 3 years
since launch
*As of Sep. 2014 – Source: App Annie
A Global Gaming Powerhouse
Knights & Dragons (Publishing)
Reached Top 10 Grossing in 41 countries
Top 100 Grossing in 22 countries
Ad Clicks
Downloads
Perf Data
Attribution
Campaign Performance
SC Balance
HC Balance
IAP
Player Targeting
GREE 데이터분석흐름
Data Collection
데이터소스
• Mobile Devices
• Game Servers
• Ad Networks
데이터크기및증가
• 500G+/day
• 500M+ events/day
• Size of event ~ 1 KB
분석데이터내용
{"player_id":"323726381807586881","player_level":169,"device":"iPhone 5","version":"iOS 7.1.2”,"platfrom":"ios","client_build":"440”,"db":”mw_dw_ios","table":"player_login","uuid":"1414566719-rsl3hvhu7o","time_created":"2014-10-29 00:11:59”}
Game DB
Game
ServersKinesis
Amazon
S3
Amazon
S3Amazon
Redshift
S3
Consumer
Amazon
EMR
DSV
JSON
분석아키텍쳐
DashboardReal-time
Stats
Consumer
ElastiCache
(Redis)
Sender
Kinesis
Stream
Shard 1
Shard 2
Shard 3
Shard n
Describe Stream
Sync Shards
Analytics
Files
Send
PutRecord
Read Buffer
Amazon Kinesis Sender
Compress
50KB
Consumer – DSV 포맷으로 S3 에저장과정
Kinesis
Stream
Shard 1
Shard 2
Shard n
S3File Metadata DB
Decompress De-Dupe
BufferDSV Transformation
Validation Target Table
Compress
Size/
Timeout
Record
Consumer
Kinesis Client Library
Record Processor
Record Processor
Consumer
Kinesis Client Library
Record Processor
Auto Scaling Group
Amazon Redshift 로바로데이터적제
Amazon S3
File Metadata DBAmazon
Redshift
Update Status
Transaction
Create Manifest Execute COPY
Create Manifest Execute COPY
Status
Create Manifest Execute COPY
Consumer –실시간 Stats 보드
Kinesis
Stream
Shard 1
Shard 2
Shard n
Decompress De-Dupe
Target TableRecordConsumer
Kinesis Client Library
Record Processor
Record Processor
Consumer
Kinesis Client Library
Record Processor
Auto Scaling Group
ConfigurationMetric, Segment &
Value, Timeslot
Filter Events
ElastiCache
(Redis)Dashboard
팁, 참고사항
Sender
• 데이터생성과전송을분리
• 압축사용및배치로전송
• PutRecord HTTP:5X 데이터중복전송가능성
• ProvisionedThroughputExceeded 모니터링필요
팁, 참고사항
Consumer
• KCL꼭사용!
• 워크로드를모니터링하고오토스케일링사용
Overall
• 충분한용량의샤드구성
• 안전한서비스또는어플리케이션종료구성
• AWS best practices 따라에러처리(재시도) 및예외
처리구성
주요요약
Kinesis
• Data available for processing within seconds
• Robust API, KCL, and Connector libraries
AWS
• Managed
• Scalable
• Cost effective
• Quick to get up and running