media service on a cloud :: 콘텐츠연합플랫폼 :: aws media day 2016
TRANSCRIPT
Who Am I?• 조휘열
• 콘텐츠연합플랫폼플랫폼운영실장
• 서비스기획/서버&클라이언트개발/플랫폼&서비스운영담당
• 개발경력 26년
In the beginning• May, 2012 - CAP(Contents Alliance Platform Co., Ltd.) Formed
Joint venture between MBC & SBS
• July, 2012 – POOQ service open to public
• September, 2012 – POOQ became paid subscription service
POOQ 1.0• Most of platform comes from iMBC
• Hosted at Nonhyun IDC (LG)
• Typical Web App Structure
ASP.Net(PC Web Service)
MS SQL(Data
Storage)
Ingestion Engine
MAPI(Mobile API
Server)
ASP.Net(Backoffice)
Houston, we have problem• With number of users grow fast, problem start rise
• Occasional heavy load – could not stand over 80,000 concurrent users
• Required server reset
• Poor user experience
Why?• IDC
No rapid platform growth
Everything should planned month ahead
• Old technology
RDBMS backed, real-time query design
Developed cache on ASP.net but it was complex and not efficient
Cannot increase number of servers (of RDBMS)
Everybody hates “Sharding”
New Game Plan• New type of service: “Emergency Platform”
RDBMS-less, Authentication-less
Operate when primary platform failed
• From IDC to Cloud
Rapid Scalability is the name of game
• New platform design
Unified API
Personalization
Scalable Performance
Emergency Platform• Preparation for Primary Platform Failure
• Collect live & VOD information at booting
• Does not rely on any regular platform servers
• No authentication – When primary server failed, everybody can watch VOD!
ASP.Net(PC Web
Service+MAPI emulation)
External Data Source
(live & VOD info)
Emergency PlatformFrom IDC to Cloud
New Platform Design
From IDC to Cloud• Immigration is decided, problem is to where?
• Two options were considered
KT ucloud
AWS Tokyo
• Regulatory issue makes a complication for AWS option
“One cannot move customer’s personal record outside of country without consent”
• KT ucloud was cheaper, too
• So, destination : KT ucloud
• Problem was, we really did not know about KT ucloud
No good document or books exist as well
Though cloud is almost same Big Mistake
Emergency Platform
From IDC to CloudNew Platform Design
New Platform Design• Unified API
• Personalization
• Scalable Performance
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
Personalization
Scalable
New Platform Design• Unified API model
Eliminate duplicated resource allocation
Same behavior on all client
• Requirements
Restful API
JSON output
Specification & Documentation
Testing
Auto server generation from spec
ASP.Net
MAPI
Unified API
Emergency Platform
From IDC to Cloud
New Platform Design
Unified APIPersonalization
Scalable
Unified API• Swagger (http://swagger.io/)
Restful API definition with .yaml
Free documentation, testing tools available
Auto-generate client & server for several languages
Emergency Platform
From IDC to Cloud
New Platform Design
Unified APIPersonalization
Scalable
Unified API - Swagger
Emergency Platform
From IDC to Cloud
New Platform Design
Unified APIPersonalization
Scalable
Unified API-Swagger
API Design(.yaml)
Validate & Test
(Swagger editor)
Commit(To GitHub)
Server Code Gen
(Custom Tool)
API Doc Site
Publishing
Emergency Platform
From IDC to Cloud
New Platform Design
Unified APIPersonalization
Scalable
Unified API-Swagger
Emergency Platform
From IDC to Cloud
New Platform Design
Unified APIPersonalization
Scalable
Unified API-Auto Code Gen
Emergency Platform
From IDC to Cloud
New Platform Design
Unified APIPersonalization
Scalable
New Platform Design• Personalization
Continues viewing – Across N-Screen
Popular content listing
Your program listing
All based on “Bookmark” concept
Client send streaming status to server
Every 10 second
UDP and HTTP based protocol support
Heavy load on server side
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
PersonalizationScalable
Continues Viewing
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
PersonalizationScalable
Bookmark Engine
Bookmark Receiver
Bookmark ReceiverBookmark
ReceiverClient ActiveMQ
Bookmark Receiver
Bookmark
Cassandra
Data Store
API Server
Apache SparkMongo DB
Service About 17TB
Per 2 Month
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
PersonalizationScalable
Scalable Design• All participant server must be able to increase performance by
increasing number of VMs
Rules out MSSQL as online database
Active/Active is not enough
MongoDB is choice of database (as mostly read-cache)
Cassandra NoSQL as data logging server
• Single VM failure should not affect whole platform’s functionality
Utilize lots of Load Balancer
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
Personalization
Scalable
Selection of NoSQL
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
Personalization
Scalable
Cassandra Performance
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
Personalization
Scalable
Scalable Design
Client
Cassandra
API Server
Apache SparkMongo DB
Service
MS SQLPentaho
Kettle
Batch Update
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
Personalization
Scalable
MongoDB Scale Out
Mongo DB(Master, WRITE)
Mongo DB(Slave, READ)
Mongo DB(Slave, READ)
Mongo DB(Slave, READ)
Mongo DB(Slave, READ)
Mongo DB(Slave, READ)
Up To 50 replica members
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
Personalization
Scalable
MongoDB
(Slave)
Scalable Design
Emergency Platform
From IDC to Cloud
New Platform Design
Unified API
Personalization
Scalable
Web Servers
(Static Page)
LB Group
Web Servers
(Static Page)Web Servers
(Static Page)Web Servers
(Static Page)
Web Servers
(Static Page)
LB Group
Web Servers
(Static Page)Web Servers
(Static Page)API Servers
MongoDB
(Slave)
MongoDB
(Master)
Cassandra
Ring
Node
Node
Node
Node
Node
Node
Web Servers
(Static Page)
LB Group
Web Servers
(Static Page)Web Servers
(Static Page)Bookmark
Collectors
The Result?• POOQ obtain exclusive Internet streaming rights on Premier12 Baseball Game
• 19th November, 2015 - Premier12, Korea vs. Japan
The Road So Far…• Built solid platform foundation
• With collaboration of fantastic developer team
• Confidence on handling large user interaction
• Still far way to go, though
Lesson Learned (About Cloud)• Very efficient, Will recommend to almost any platform
• Enjoying decent support in general
• VMs are really slower than you think
• Not all VMs are equal
• Sometimes your bottleneck is not number of VMs
• Long way from perfection/shit happen