초보자를 위한 분산 캐시 이야기
TRANSCRIPT
초보자를 위한 분산 캐시 이야기
[email protected] http://charsyam.wordpress.com
잉여서버개발자 In NHN
관심분야!!!
Cloud, Big Data
특기!!!
발표 날로 먹기!!!
What is Cache?
Cache is a component
that transparently stores data so
that future requests for that
data can be served faster.
In Wikipedia
Cache 는 나중에 요청올 결과를 미리 저장해두었다가 빠르게 서비스해 주는 것
Lots of Data
Cache
Cache
Lots of Data
CPU Cache
Browser Cache
Why is Cache?
Use Case: Login
Use Case: Login
Common Case
Use Case: Login
Common Case Read From DB
Select * from Users where id=‘charsyam’;
일반적인 DB구성
Master
Slave
REPLICATION/FailOver
일반적인 DB구성
Master
Slave
REPLICATION/FailOver
모든 Traffic은 Master 가 처리 Slave는 장애 대비
이러니깐 부하가!!!
선택의 기로!!!
Scale UP Vs
Scale OUT
Scale UP
초당 1000 TPS
초당 3000 TPS
3배 처리 가능한 서버를 투입
Scale OUT
초당 1000 TPS
초당 2000 TPS
초당 3000 TPS
Scale Out을 선택!!!
Scale Out을 선택!!! 돈만 많으면 Scale Up도 좋습니다.
Request 분석
읽기 70%, 쓰기 30%
분석 결론
읽기 70%, 읽기를 분산하자!!!
One Write Master
+ Multi Read Slave
Client
Master
Slave
ONLY WRITE
Slave Slave
Only READ
REPLICATION
Eventual Consistency
Ex) Replication Master
Slave
REPLICATION
Slave에 부하가 있거나, 다른 이유로, 복제가 느려질 수 있다. 그러나 언젠가는 같아진다.
Login 정보 중에 잠시라도 서로 다르면 안되는 경우는?
Login 정보 중에 잠시라도 서로 다르면 안되는 경우는? => 다시 처음으로!!!
초반에는 행복했습니다.
Client
Master
Slave Slave Slave
REPLICATION
Slave Slave
부하가 더 커지니!!!
Client
Master
Slave
REPLICATION
Slave Slave Slave Slave Slave
Slave Slave Slave Slave Slave Slave
성능 향상이 미비함!!! WHY?
머신의 I/O는 Zero섬
Partitioning
Client
PART 1
Web Server
DBMS
PART 2
Web Server
DBMS
Scalable Partitioning
Paritioning 성능
Paritioning 성능 관리이슈
Paritioning 성능 관리이슈 비용
간단한 해결책
DB 서버 Disk는 SSD 메모리도 데이터보다 많이
DB 서버 Disk는 SSD 메모리도 데이터보다 많이
=> 돈돈돈!!!
Why is Cache?
Use Case: Login
Use Case: Login
Read From Cache
Get charsyam
Use Case: Login
Apply Cache For Read
General Cache Layer
Cache
DBMS
Storage Layer
Application Server
READ
WRITE
WRITE UPDATE
Type 1: 1 Key – N Items
KEY
Profile:User_ID
Value Profile - LastLoginTime - UserName - Host - Name
Type 2: 1 Key – 1 Item
KEY name:User_ID
Value
UserName
LastLoginTime:User_ID LastLoginTime
Pros And Cons Type 1: 1 Key – N Items
Pros: Just 1 get.
Pros And Cons
Type 1: 1 Key – N Items
Pros: Just 1 get.
Cons: if 1 item is changed. Need Cache Update And Race Condition
Pros And Cons
Type 2: 1 Key – 1 Item
Pros: if 1 item is changed, just change that item.
Pros And Cons
Type 2: 1 Key – 1 Item
Pros: if 1 item is changed, just change that item. Cons: Some Items can be removed
변화하지 않는 데이터
변화하는 데이터
변화하지 않는 데이터
변화하는 데이터
Divide
Don’t Try Update After didn’t Read DB
Value Profile - LastLoginTime - UserName - Host - Name
Don’t Try Update After didn’t Read DB
Value Profile - LastLoginTime - UserName - Host - Name
EVENT: Update Only LastLoginTime
Don’t Try Update After didn’t Read DB
Value Profile - LastLoginTime - UserName - Host - Name
EVENT: Update Only LastLoginTime
Read Cache: User Profile
Don’t Try Update After didn’t Read DB
Value Profile - LastLoginTime - UserName - Host - Name
EVENT: Update Only LastLoginTime
Read Cache: User Profile
Update Data & DB
Just Save Cache
Don’t Try Update After didn’t Read DB
Value Profile - LastLoginTime - UserName - Host - Name
EVENT: Update Only LastLoginTime
Read Cache: User Profile
Update Data & DB
Just Save Cache
It Makes Race Condition
Need Global Lock
Need Global Lock
=> 오늘은 PASS!!!
How to Test
Using Real Data
In 25 million User ID
100,000 Request
Reproduct From Log
Test Result
136 vs 1613 seconds
Memcache VS Mysql
Result of Applying Cache
Select Query
CPU utility
균등한 속도!!!
부하 감소!!!
What is Hard?
Key and Value – SYNC Easy
1. Update To DB
2. Fail To Transaction
Fail!!!
Key and Value – SYNC HARD
1. Update To DB
2. Update to Cache Fail!!!
Key and Value – How
1. Update To DB
2. Update to Cache Fail!!!
RETRY BATCH DELETE
Data! The most important thing
If data is not important
Cache Updating is not important
Login Count
Last Login Time
ETC
BUT!
If data is important
Cache Updating is important
Server Address
Data Path
ETC
HOW!
RETRY! Retry, Retry, Retry Solve Over 9x%. Or Delete Cache!!!
Batch! Queuing Service
Error Log Queue
Error Log
Error Log
Error Log
Error Log
Batch Processor
Cache Server
Error Log Queue
Error Log
Error Log
Error Log
Error Log
Batch Processor
Cache Server
Error Log Queue
Error Log
Error Log
Error Log
Error Log
Batch Processor
Cache Server
UPDATE
Caches
Memcache
Redis
Memcache
Atomic Operation
Memcache
Atomic Operation
Key:Value
Memcache
Atomic Operation
Key:Value Single Thread
Memcache
Over 100,000 TPS Processing
Memcache
Max size of item 1 MB
Memcache
LRU, ExpireTime
Redis
Key:Value
Redis
ExpireTime Replication Snapshot
Redis
Key:Value
Collection
List Sorted Set Hash
주의 사항
Item Size
1~XXX KB, Item 사이즈는 적을수록 좋다.
Cache In
Global Service
Memcached In Facebook • Facebook and Google and Many Companies
• Facebook – 하루 Login 5억명(한달에 8억명)(최신, 밑에는 작년 2010/04자료)
– 활성 사용자 7,000만
– 사용자 증가 비율 4일에 100만명
– Web 서버 10,000 대, Web Request 초당 2000만번
– Memcached 서버 805대 -> 15TB, HitRate: 95%
– Mysq server 1,800 대 Master/Slave(각각, 900대)
• Mem: 25TB, SQL Query 초당 50만번
Cache In Twitter(Old) API WEB
DB DB
Page Cache
Cache In Twitter(NEW) API WEB
DB DB DB
Page Cache
Fragment Cache
Row Cache
Vector Cache
Redis In Wonga
Using Redis for DataStore Write/Read
Redis In Wonga
Flash Client Ruby Backend
NoCache In Wonga
1 million daily Users
200 million daily HTTP Requests
NoCache In Wonga
1 million daily Users
200 million daily HTTP Requests
100,000 DB Operation Per Sec
40,000 DB update Per Sec
NoCache In Wonga
First Scale Out
First Setting – 3 Month
More Traffic
MySQL hiccups – DB Problems Began
Analysis About DBMS
ActiveRecord’s status Check caused 20% extra DB
60% of All Updates were done on ‘tiles’ table
Applying Sharding 2xD
Result of Applying Sharding 2xD
Doubling MySQL
Result of Doubling MySQL
50,000 TPS on EC2
Wonga Choose Redis! But some failure
=> 오늘은 PASS!!!
Result!!!
Consistent Hashing
Proxy
Server
Server
Server
Server
Server
User Request
K = 10000 N = 5
Origin
Proxy
Server
Server
Server
Server
Server
User Request
K = 10000 N = 4
FAIL : Redistribution about 2000 Users
Proxy
Server
Server
Server
Server
Server
User Request
K = 10000 N = 5
RECOVER: Redistribution about 2500 Users
A
Add A,B,C Server
A
B
Add A,B,C Server
A
B
C
Add A,B,C Server
A
B
C
1
Add Item 1
A
B
C
1
2
Add Item 2
A
B
C
1
2
3
4
5
Add Item 3,4,5
A
B
C 2
3
4
5
Fail!! B Server
A
B
C
1
2
3
4
5
Add Item 1 Again -> Allocated C Server
A
B
C
1
2
3
4
5
1
Recover B Server -> Add Item 1
Thank You!
A
B
C
Real Implementation
A+1
A+2
A+3 B+1
B+2
C+1
C+2
A+4
C+3
B+3