pinterest的数据库分片架构

39

Upload: tommy-chiu

Post on 17-May-2015

1.274 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Pinterest的数据库分片架构

大会官方网站与资料下载地址:www.archsum m it.com

感谢您参加本次ArchSummit全球架构师峰会!

Page 2: Pinterest的数据库分片架构

分享产品与技术 分享腾讯与互联网

更多精彩内容

架构之美:开放环境下的网络架构

QQ空间技术架构之峥嵘岁月

QQ基础数据库架构演变之路

查看更多

Page 3: Pinterest的数据库分片架构

Sharding

Marty WeinerMetropolis

Evrhet MilamBatcave

12年8月10⽇日星期五

Page 4: Pinterest的数据库分片架构

Mar 2010 Jan 2011 Jan 2012

Scaling PinterestScaling Pinterest

Page Views / Day

Mar 2010 Jan 2011 Jan 2012 May 2012

· Amazon EC2 + S3 + CloudFront· 2 NGinX, 16 Web Engines + 2 API Engines· 5 Functionally Sharded MySQL DB + 9 read slaves· 4 Cassandra Nodes· 15 Membase Nodes (3 separate clusters)· 8 Memcache Nodes· 10 Redis Nodes· 3 Task Routers + 4 Task Processors· 4 Elastic Search Nodes· 3 Mongo Clusters· 3 Engineers

12年8月10⽇日星期五

Page 5: Pinterest的数据库分片架构

Scaling Pinterest

Clustering

Sharding

· Data distributed automatically

· Data can move

· Rebalances to distribute capacity

· Nodes communicate with each other

12年8月10⽇日星期五

Page 6: Pinterest的数据库分片架构

Scaling Pinterest

Clustering

Sharding

· Data distributed manually

· Data does not move

· Split data to distribute load

· Nodes are not aware of each other

12年8月10⽇日星期五

Page 7: Pinterest的数据库分片架构

Our Transition1 DB + Foreign Keys + Joins1 DB + Denormalized + Cache

Several functionally sharded DBs + Read slaves + Cache

1 DB + Read slaves + Cache

ID sharded DBs + Backup slaves + Cache

Scaling Pinterest

12年8月10⽇日星期五

Page 8: Pinterest的数据库分片架构

How we sharded

Scaling Pinterest

12年8月10⽇日星期五

Page 9: Pinterest的数据库分片架构

Sharded Server Topology

Initially, 8 physical servers, each with 512 DBs

Scaling Pinterest

db00001db00002

.......db00512

db00513db00514

.......db01024

db03584db03585

.......db04096

db03072db03073

.......db03583

12年8月10⽇日星期五

Page 10: Pinterest的数据库分片架构

High Availability

Multi Master replication for hot swappingAll writes/reads go to only 1 master

Scaling Pinterest

db00001db00002

.......db00512

db00513db00514

.......db01024

db03584db03585

.......db04096

db03072db03073

.......db03583

12年8月10⽇日星期五

Page 11: Pinterest的数据库分片架构

Configuration

Scaling Pinterest

· Initially 8 masters, each with 512 DBs

· Now at 80 masters, some 32 DBs and some 64 DBs

· Over 10 TB

· All tables are InnoDB

12年8月10⽇日星期五

Page 12: Pinterest的数据库分片架构

Configuration

Scaling Pinterest

· Ubuntu 11.10 on AWS managed by Puppet

· m1.xlarge using 4 stripe-raided ephemeral drives

· cc2.8xlarge using 4 stripe-raided ephemeral drives

· Soon: hi1.4xlarge (SSD)

12年8月10⽇日星期五

Page 13: Pinterest的数据库分片架构

ID Structure

· A lookup data structure has physical server to shard ID range (cached by each app server process)

· Shard ID denotes which shard

· Type denotes object type (e.g., pins)

· Local ID denotes position in table

Shard ID Local ID

64 bits

Scaling Pinterest

Type

12年8月10⽇日星期五

Page 14: Pinterest的数据库分片架构

Lookup Structure

Scaling Pinterest

sharddb003a

{“sharddb001a”: ( 1, 512), “sharddb002b”: ( 513, 1024), “sharddb003a”: (1025, 1536), ... “sharddb008b”: (3585, 4096)}

DB01025 users

users

user_has_boards

boards

1 json

2 json

3 json

12年8月10⽇日星期五

Page 15: Pinterest的数据库分片架构

· New users are randomly distributed across shards

· Boards, pins, etc. try to be collocated with user

· Local ID’s are assigned by auto-increment

· Enough ID space for 65536 shards, but only first 4096 opened initially. Can expand horizontally.

Scaling Pinterest

ID Generation

mysql_query(“USE pbdata%d” % shard_id)local_id = mysql_query(“INSERT INTO users (body) VALUES (%s)”, json)full_id = shard_id << 46 | USER_TYPE << 10 | local_id

12年8月10⽇日星期五

Page 16: Pinterest的数据库分片架构

Objects and Mappings· Object tables (e.g., pin, board, user, comment)

· Local ID MySQL blob (JSON / Serialized thrift)

· Mapping tables (e.g., user has boards, pin has likes)

· Full ID Full ID (+ sequence ID / timestamp)

· Naming schema is noun_verb_noun

· Queries are PK or index lookups (no joins)

· Data DOES NOT MOVE

· All tables exist on all shards

· No schema changes required (index = new table)

Scaling Pinterest

12年8月10⽇日星期五

Page 17: Pinterest的数据库分片架构

Loading a Page· Rendering user profile

· Most of these calls will be a cache hit

· Omitting offset/limits and mapping sequence id sort

SELECT body FROM users WHERE id=<local_user_id>SELECT board_id FROM user_has_boards WHERE user_id=<user_id>SELECT body FROM boards WHERE id IN (<board_ids>)SELECT pin_id FROM board_has_pins WHERE board_id=<board_id>SELECT body FROM pins WHERE id IN (pin_ids)

Scaling Pinterest

12年8月10⽇日星期五

Page 18: Pinterest的数据库分片架构

Increased load on DB?

To increase capacity, a server is replicated and the new replica becomes responsible for some DBsScaling Pinterest

db00001db00002db00003db00004

sharddb001a

sharddb001b

{“sharddb001a”: ( 1, 4), “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32)}

12年8月10⽇日星期五

Page 19: Pinterest的数据库分片架构

Increased load on DB?

To increase capacity, a server is replicated and the new replica becomes responsible for some DBsScaling Pinterest

db00001db00002db00003db00004

sharddb001a

sharddb001b

sharddb009a

sharddb009b

db00001db00002db00003db00004

{“sharddb001a”: ( 1, 4), “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32)}

12年8月10⽇日星期五

Page 20: Pinterest的数据库分片架构

Increased load on DB?

To increase capacity, a server is replicated and the new replica becomes responsible for some DBsScaling Pinterest

db00001db00002db00003db00004

sharddb001a

sharddb001b

sharddb009a

sharddb009b

db00001db00002db00003db00004

{“sharddb001a”: ( 1, 2), “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32), “sharddb009a”: ( 3, 4)}

12年8月10⽇日星期五

Page 21: Pinterest的数据库分片架构

Increased load on DB?

To increase capacity, a server is replicated and the new replica becomes responsible for some DBsScaling Pinterest

db00001db00002db00003db00004

sharddb001a

sharddb001b

sharddb009a

sharddb009b

db00001db00002db00003db00004

{“sharddb001a”: ( 1, 2), “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32), “sharddb009a”: ( 3, 4)}

12年8月10⽇日星期五

Page 22: Pinterest的数据库分片架构

Increased load on DB?

To increase capacity, a server is replicated and the new replica becomes responsible for some DBsScaling Pinterest

db00001db00002

sharddb001a

sharddb001b

sharddb009a

sharddb009b

db00003db00004

{“sharddb001a”: ( 1, 2), “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32), “sharddb009a”: ( 3, 4)}

12年8月10⽇日星期五

Page 23: Pinterest的数据库分片架构

Scripting· Must get old data into your shiny new shard

· 500M pins, 1.6B follower rows, etc

· Build a scripting farm with Pyres + Redis

· Spawn more workers and complete the task faster

· Can push millions of jobs onto the queue

Scaling Pinterest

12年8月10⽇日星期五

Page 24: Pinterest的数据库分片架构

Sharding Alternatives

Scaling Pinterest

12年8月10⽇日星期五

Page 25: Pinterest的数据库分片架构

ID Generation

Scaling Pinterest

· ID generated by shard + type + auto-increment

· Pinterest, Facebook

· Pro: ID generation scales with size of shard

· Pro: No routing services which can fail

· Pro: No holes in ID space

· Pro: Type verification built-in

· Con: No global ordering (requires sequence ID)

· Con: Can’t move parts of databases around12年8月10⽇日星期五

Page 26: Pinterest的数据库分片架构

ID Generation

Scaling Pinterest

· ID contains timestamp + shard

· Instagram

· Pro: Global ordering

· Pro: No routing services which can fail

· Con: No type verification

· Con: Can’t move parts of databases around

· Con: Not much space for shard ID’s

12年8月10⽇日星期五

Page 27: Pinterest的数据库分片架构

ID Generation

Scaling Pinterest

· ID generated by service

· Twitter, Etsy

· Requires: ID routing service

· Pro: Possible global ordering

· Pro: Can move individual data elements

· Pro: Can perform type-checking

· Con: ID routing service can fail

· Con: ID routing service can be unwieldy and complicated

12年8月10⽇日星期五

Page 28: Pinterest的数据库分片架构

Problems

Scaling Pinterest

12年8月10⽇日星期五

Page 29: Pinterest的数据库分片架构

Hotspots

Scaling Pinterest

· Our board_followedby_users tables were unbalanced

· Rearchitected to have board_followedby_user_shard map boards to shards containing subsets of the board_followedby_users data.

· Pro: Data evenly distributed

· Con: May have to hit multiple shards

12年8月10⽇日星期五

Page 30: Pinterest的数据库分片架构

No Joins

Scaling Pinterest

· Home is user_follows_board JOIN board_has_pins

· Each user has a home feed redis list

· New pin ID placed in each following user’s list

for user_id in board_followedby_users: redisconn.lpush(“home:%s” % user_id, new_pin_id)

12年8月10⽇日星期五

Page 31: Pinterest的数据库分片架构

No Automatic Failover

Scaling Pinterest

· Some fraction of users see timeouts if a DB fails

· Manually change settings and re-deploy

· Needs to be automated, especially with 128+ DBs

· Moving to automation with Zookeeper

12年8月10⽇日星期五

Page 32: Pinterest的数据库分片架构

Caching

Scaling Pinterest

12年8月10⽇日星期五

Page 33: Pinterest的数据库分片架构

Caching

Scaling Pinterest

· Redis lists for persistently caching homefeeds

· lpush homefeed:82363 7233494

· lrange homefeed:82363 0 49

· Memcache for caching lists

· MessagePack to serialize data

· Append to add new data

· Never delete from middle

· Memcache for caching objects

12年8月10⽇日星期五

Page 34: Pinterest的数据库分片架构

Caching

Scaling Pinterest

· Caches separated by pools

· Example: pin objects, board_has_pins, etc

· Better stats

· Easier to scale and manage

12年8月10⽇日星期五

Page 35: Pinterest的数据库分片架构

Caching with decorators

def user_get_many(self, user_ids):cursor = get_mysql_conn().cursorcursor.execute(“SELECT * FROM users WHERE id in (%s)”, user_ids)return cursor.fetchall()

Scaling Pinterest

@mc_objects(USER_MC_CONN, format = “user:%d”,version=1,serialization=simplejsonexpire_time=0)

12年8月10⽇日星期五

Page 36: Pinterest的数据库分片架构

Caching with decorators

def get_board_pins(self, board_id, offset, limit):cursor = get_mysql_conn().cursorcursor.execute(“SELECT pin_id FROM bp WHERE id =%s OFFSET

%d LIMIT %d”, board_id, offset, limit)return cursor.fetchall()

Scaling Pinterest

@paged_list(BOARD_MC_CONN, format = “b:p:%d”,version=1,expire_time=24*60*60)

12年8月10⽇日星期五

Page 37: Pinterest的数据库分片架构

We are [email protected]

Scaling Pinterest

12年8月10⽇日星期五

Page 39: Pinterest的数据库分片架构

杭 州 站 ·2012年 10月 25日 ~27日大会官网:www.qconhangzhou.com