Self Introduction
• Joined Mercari in July,2014
• SRE (Site Reliability Engineer)
• Role
• Development Productivity
What is SRE?• Site Reliability Engineer
• The Role/position introduced in Google “Software Engineers responsible for ensuring that all of Google’s services are super reliable and super fast, all of the time.”
• Mercari SRE team members responsible for
• Availability
• Performance
• Construction and operation log analytics platform
• Server provisioning, deployment
• Security
• Development of the development environment
Infrastructure Overview
• JP
• SAKURA Internet Ishikari DC dedicated server + cloud
• US
• AWS Oregon
• Shared
• Akamai
• Amazon Route53, S3, CloudFront
• Google BigQuery
app
Infrastructure 2016
lb nat
internet
lb_pascal
DB
memcached
batch
Q4M
Worker
lb_push
push
lb_search
search
deploybase monitor dns logview cep
global
private
logbatchlog
lb_general
Softwares (2016)• nginx
• PHP 5.6
• Apache + mod_php
• Go
• Node
• MySQL
• Q4M
• memcached
• Solr
• Gaurun
• fluentd
• Norikra
• Kibana
• Zabbix
• kurado
• etc..
Improve? or Crisis!
• Continuous Increase in Access
• Continuous Increase in Data Volume
• Growth of Specifications
• Unstable Deployment
Continuous Increase in Access
https://pixabay.com/en/traffic-rush-hour-rush-hour-urban-843309/
Continuous Increase in AccessProblem
• Lack of CPU Resources
• Slow down response time
• Lack of network bandwidth
• Network congestion
Improvement:Introduce dedicated server
• BEFORE
• SAKURA Cloud
• (Ex) AFTER
• CPU: Xeon 6Core x 2
• Mem: 32G
• DISK: 240GB SSD
Improvement:Introduce lb based on nginx• BEFORE
• All httpd server was faced on the internet
• DNS Round Robin
• AFTER
• nginx!
• Reverse Proxy, TLS, SPDY Terminator
internet
lb lb lb lb
DNS RR
©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
User Users Client Multimedia Corporate data center
Traditional server
Mobile Client
Internet AWS Management Console
IAM Add-on Example:IAM Add-on
Amazon Mechanical Turk
On-Demand Workforce
Human Intelligence Tasks (HIT)
Assignment/Task
RequesterWorkersAmazon Mechanical Turk
Non-Service Specific
Improvement:Continuous application tuning
• MySQL index tuning
• (Ex.) 2-dimensional large array -> convert 2nd tier to text data and parse
Of course, There is no silver bullet.
Improvement:Continuous application tuning
require_once(‘master_data.php’); was slow!!
Large
http://www.slideshare.net/kazeburo/big-master-data-php-blt-1
Improvement:Continuous application tuning
http://www.slideshare.net/kazeburo/big-master-data-php-blt-1
Problem:Increasing DB historical table
records• Increasing DB historical table records
• Shortage of DISK capacity
• Slow down item search throughput
• Increasing access log
• Customer service tune around time be too slow
• DB table are partitioned into multiple servers
• Slave servers are only in main cluster
• Using DNS RR
Improvement:Server partitioning (MySQL)
Master
Slave Slave BackupSlave Backup
Master
Backup
Master
Backup
Master AnonDB
Main todolists l2-db cs-tool anon-db
Improvement:Server partitioning (Solr)
• solr
• Master - Slave
• latest & all cluster
• nginx
• load balancer
• Lua controls cluster access
lb_search
app
SolrMaster
double write更新は両方に
SolrSlave
SolrSlave
Worker
SolrMaster
SolrSlave
SolrSlave
latest cluster直近N日
all index cluster全商品
latestを先に検索し件数が足りなければall
Improvement
app
Worker Batch
access_logapplication_logapp_error_logerror_logphp_log...
log
©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
AWS Simple Icons
Check to make sure you have the most recent set of AWS Simple Icons.This version was last updated 12/1/2011(v1.4) Find the most recent set at aws.amazon.com/architecture/icons
Always use Icon labels – Be sure to always include a label below the icon or on the group in Arial. The only exception is in complex diagrams, you have the option to create a key.
Non-AWS Technology – Any server or other non-AWS technology in an architecture diagram should be represented with they grey server (see Slide 6).
Usage Guidelines
Traditional server
Elastic LoadBalancer
DEC
01Creating diagrams – Model your diagrams after the usage examples (Slides 8 and 9). Try to use direct lines (rather than ‘criss-cross’), use adequate whitespace, and remember to label all icons.
Product Icons – The first icon in most service sets is a product icon. This should be used to represent the service on a more general level when you will not be going into as much depth.Amazon Elastic
Compute Cloud (EC2)
BigQuery
nat
logview
kibana: Log Viewer
cep
©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
AWS Simple Icons
Check to make sure you have the most recent set of AWS Simple Icons.This version was last updated 12/1/2011(v1.4) Find the most recent set at aws.amazon.com/architecture/icons
Always use Icon labels – Be sure to always include a label below the icon or on the group in Arial. The only exception is in complex diagrams, you have the option to create a key.
Non-AWS Technology – Any server or other non-AWS technology in an architecture diagram should be represented with they grey server (see Slide 6).
Usage Guidelines
Traditional server
Elastic LoadBalancer
DEC
01Creating diagrams – Model your diagrams after the usage examples (Slides 8 and 9). Try to use direct lines (rather than ‘criss-cross’), use adequate whitespace, and remember to label all icons.
Product Icons – The first icon in most service sets is a product icon. This should be used to represent the service on a more general level when you will not be going into as much depth.Amazon Elastic
Compute Cloud (EC2)
Mackerel
©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
AWS Simple Icons
Check to make sure you have the most recent set of AWS Simple Icons.This version was last updated 12/1/2011(v1.4) Find the most recent set at aws.amazon.com/architecture/icons
Always use Icon labels – Be sure to always include a label below the icon or on the group in Arial. The only exception is in complex diagrams, you have the option to create a key.
Non-AWS Technology – Any server or other non-AWS technology in an architecture diagram should be represented with they grey server (see Slide 6).
Usage Guidelines
Traditional server
Elastic LoadBalancer
DEC
01Creating diagrams – Model your diagrams after the usage examples (Slides 8 and 9). Try to use direct lines (rather than ‘criss-cross’), use adequate whitespace, and remember to label all icons.
Product Icons – The first icon in most service sets is a product icon. This should be used to represent the service on a more general level when you will not be going into as much depth.Amazon Elastic
Compute Cloud (EC2)
Slack
Norikra: Stream Processing
Improvements:
• Deploy many times per day, instead of once a week
• Google Calendar & chat both based deployment
Improvement:Scheduled,automated deploy
http://tech.mercari.com/entry/2015/10/15/183000
Unstable Deployment
http://popsych.org/wp-content/uploads/2015/05/jenga-tower.jpg
Problem:Each deploy, get 50x responses
• Cause
• Inconsistence of PHP Opcache
• Result
• Negative customer feedback
Improvement:ngx_dynamic_upstream + rsync
deploybase
App
YES!!!
App
App
App
App
App
App
Worker
Worker
Batch
lblblb• ngx_dynamic_upstream
• Dynamic attach and detach app. server to lb
• Using —rsync-path
• detach from lb
• rsync
• attach lb
Conclusion
http://s0.geograph.org.uk/geophotos/02/95/15/2951585_5b854214.jpg
Preface
• Big “Master” Data (http://www.slideshare.net/kazeburo/big-master-data-php-blt-1)
• ngx_dynamic_upstream (https://github.com/cubicdaiya/ngx_dynamic_upstream)
• 大人のスタートアップは大人のリリースができる。そう、ChatOpsならね。(http://tech.mercari.com/
entry/2015/10/15/183000)