media service on a cloud :: 콘텐츠연합플랫폼 :: aws media day 2016

39
Project Westworld POOQ Platform Evolution Story

Upload: amazon-web-services-korea

Post on 07-Jan-2017

517 views

Category:

Technology


1 download

TRANSCRIPT

Project

Westworld

POOQ Platform Evolution Story

Who Am I?• 조휘열

• 콘텐츠연합플랫폼플랫폼운영실장

• 서비스기획/서버&클라이언트개발/플랫폼&서비스운영담당

• 개발경력 26년

[email protected]

In the beginning• May, 2012 - CAP(Contents Alliance Platform Co., Ltd.) Formed

Joint venture between MBC & SBS

• July, 2012 – POOQ service open to public

• September, 2012 – POOQ became paid subscription service

What is POOQ?

POOQ 1.0• Most of platform comes from iMBC

• Hosted at Nonhyun IDC (LG)

• Typical Web App Structure

ASP.Net(PC Web Service)

MS SQL(Data

Storage)

Ingestion Engine

MAPI(Mobile API

Server)

ASP.Net(Backoffice)

Houston, we have problem• With number of users grow fast, problem start rise

• Occasional heavy load – could not stand over 80,000 concurrent users

• Required server reset

• Poor user experience

Evidence - 2013/10/19

Evidence - 2013/10/20

Evidence – 2013/11/8

Evidence - 2014/4/24

Why?• IDC

No rapid platform growth

Everything should planned month ahead

• Old technology

RDBMS backed, real-time query design

Developed cache on ASP.net but it was complex and not efficient

Cannot increase number of servers (of RDBMS)

Everybody hates “Sharding”

New Game Plan• New type of service: “Emergency Platform”

RDBMS-less, Authentication-less

Operate when primary platform failed

• From IDC to Cloud

Rapid Scalability is the name of game

• New platform design

Unified API

Personalization

Scalable Performance

Emergency Platform• Preparation for Primary Platform Failure

• Collect live & VOD information at booting

• Does not rely on any regular platform servers

• No authentication – When primary server failed, everybody can watch VOD!

ASP.Net(PC Web

Service+MAPI emulation)

External Data Source

(live & VOD info)

Emergency PlatformFrom IDC to Cloud

New Platform Design

From IDC to Cloud• Immigration is decided, problem is to where?

• Two options were considered

KT ucloud

AWS Tokyo

• Regulatory issue makes a complication for AWS option

“One cannot move customer’s personal record outside of country without consent”

• KT ucloud was cheaper, too

• So, destination : KT ucloud

• Problem was, we really did not know about KT ucloud

No good document or books exist as well

Though cloud is almost same Big Mistake

Emergency Platform

From IDC to CloudNew Platform Design

New Platform Design• Unified API

• Personalization

• Scalable Performance

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

Personalization

Scalable

New Platform Design• Unified API model

Eliminate duplicated resource allocation

Same behavior on all client

• Requirements

Restful API

JSON output

Specification & Documentation

Testing

Auto server generation from spec

ASP.Net

MAPI

Unified API

Emergency Platform

From IDC to Cloud

New Platform Design

Unified APIPersonalization

Scalable

Unified API• Swagger (http://swagger.io/)

Restful API definition with .yaml

Free documentation, testing tools available

Auto-generate client & server for several languages

Emergency Platform

From IDC to Cloud

New Platform Design

Unified APIPersonalization

Scalable

Unified API - Swagger

Emergency Platform

From IDC to Cloud

New Platform Design

Unified APIPersonalization

Scalable

Unified API-Swagger

API Design(.yaml)

Validate & Test

(Swagger editor)

Commit(To GitHub)

Server Code Gen

(Custom Tool)

API Doc Site

Publishing

Emergency Platform

From IDC to Cloud

New Platform Design

Unified APIPersonalization

Scalable

Unified API-Swagger

Emergency Platform

From IDC to Cloud

New Platform Design

Unified APIPersonalization

Scalable

Unified API-Auto Code Gen

Emergency Platform

From IDC to Cloud

New Platform Design

Unified APIPersonalization

Scalable

New Platform Design• Personalization

Continues viewing – Across N-Screen

Popular content listing

Your program listing

All based on “Bookmark” concept

Client send streaming status to server

Every 10 second

UDP and HTTP based protocol support

Heavy load on server side

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

PersonalizationScalable

Continues Viewing

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

PersonalizationScalable

Bookmark Engine

Bookmark Receiver

Bookmark ReceiverBookmark

ReceiverClient ActiveMQ

Bookmark Receiver

Bookmark

Cassandra

Data Store

API Server

Apache SparkMongo DB

Service About 17TB

Per 2 Month

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

PersonalizationScalable

Scalable Design• All participant server must be able to increase performance by

increasing number of VMs

Rules out MSSQL as online database

Active/Active is not enough

MongoDB is choice of database (as mostly read-cache)

Cassandra NoSQL as data logging server

• Single VM failure should not affect whole platform’s functionality

Utilize lots of Load Balancer

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

Personalization

Scalable

Selection of NoSQL

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

Personalization

Scalable

Cassandra Performance

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

Personalization

Scalable

Scalable Design

Client

Cassandra

API Server

Apache SparkMongo DB

Service

MS SQLPentaho

Kettle

Batch Update

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

Personalization

Scalable

MongoDB Scale Out

Mongo DB(Master, WRITE)

Mongo DB(Slave, READ)

Mongo DB(Slave, READ)

Mongo DB(Slave, READ)

Mongo DB(Slave, READ)

Mongo DB(Slave, READ)

Up To 50 replica members

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

Personalization

Scalable

MongoDB

(Slave)

Scalable Design

Emergency Platform

From IDC to Cloud

New Platform Design

Unified API

Personalization

Scalable

Web Servers

(Static Page)

LB Group

Web Servers

(Static Page)Web Servers

(Static Page)Web Servers

(Static Page)

Web Servers

(Static Page)

LB Group

Web Servers

(Static Page)Web Servers

(Static Page)API Servers

MongoDB

(Slave)

MongoDB

(Master)

Cassandra

Ring

Node

Node

Node

Node

Node

Node

Web Servers

(Static Page)

LB Group

Web Servers

(Static Page)Web Servers

(Static Page)Bookmark

Collectors

The Result?• POOQ obtain exclusive Internet streaming rights on Premier12 Baseball Game

• 19th November, 2015 - Premier12, Korea vs. Japan

The Result• 551G Network

• 283,577 Users

• Platform was stable

CPU Utilization was 20%

The Road So Far…• Built solid platform foundation

• With collaboration of fantastic developer team

• Confidence on handling large user interaction

• Still far way to go, though

Lesson Learned (About Cloud)• Very efficient, Will recommend to almost any platform

• Enjoying decent support in general

• VMs are really slower than you think

• Not all VMs are equal

• Sometimes your bottleneck is not number of VMs

• Long way from perfection/shit happen

Migration toward AWS• Project Halloween

• Winter of 2016 ~ Spring of 2017

Auto-Scale• Metric Based & Schedule Based

GPU InstancesMachine Learning & Deep Learning

Gadgets• Lambda

• Cloud Watch & 3rd Party tools

• RDS

• Marketplace

Q&A