sre in startup

28
SRE in startup Zonky 17.1.2017 Ladislav Prskavec, Apiary [email protected] @abtris 1

Upload: ladislav-prskavec

Post on 21-Jan-2017

75 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: SRE in Startup

SRE in startupZonky 17.1.2017

Ladislav Prskavec, [email protected]

@abtris

1

Page 2: SRE in Startup

What is SRE?

2

Page 3: SRE in Startup

"What happens when a software engineer is tasked with what used to be called operations."

» Ben Treynor Sloss, Vice President, Google Engineering, founder of Google SRE

3

Page 4: SRE in Startup

"Our work is like being part of the world's most intense pit crew. We change the tires of a race car as it's going 100 mph."

» Andrew Widdowson, Site Reliability Engineer, Mountain View

4

Page 5: SRE in Startup

In general, an SRE team is responsible for:

» availability

» latency

» performance

» efficiency

» change management

» monitoring

» emergency response

» capacity planning

5

Page 6: SRE in Startup

6

Page 7: SRE in Startup

If the team agrees on a 99.9% SLA, that gives them an error budget of

0.1%.

7

Page 8: SRE in Startup

8

Page 9: SRE in Startup

RuleIf service is in SLA, launch away- clearly DEV team is doing a good job

If service is not within SLA, launch freeze- Until you earn back enough error budget

9

Page 10: SRE in Startup

Error budget» removes SRE - DEV conflict

» DEV teams make self-police

10

Page 11: SRE in Startup

Common staffing pool» one more SRE = one less Dev

11

Page 12: SRE in Startup

SRE hires only coders» they get bored easily

» speak same language as Dev

12

Page 13: SRE in Startup

50% cap on ops work» if you succeed works scales with traffic

» coding reduce work / traffic ratio

13

Page 14: SRE in Startup

Keep Dev in rotation» 5% ops handled by devs

14

Page 15: SRE in Startup

Speaking of Dev and Ops work» excess operations load (tickets, oncall, etc.)

15

Page 16: SRE in Startup

SRE portability» no requirement to stick with project or SRE

16

Page 17: SRE in Startup

Outages» minimalize impact

» prevent recurrence

17

Page 18: SRE in Startup

Minimalize damage» no NOC

» good diagnostic information

» practice, practice, practice

18

Page 19: SRE in Startup

Prevent recurrence1. Handle event

2. Write post-mortems

3. Reset

19

Page 20: SRE in Startup

Post-mortems philosophy» blameless, focus on process and technology

» create timeline

» get all facts

» create bugs for all followup work

20

Page 21: SRE in Startup

How are specific SRE in startup?

21

Page 22: SRE in Startup

1:10

22

Page 23: SRE in Startup

Horizontal team

23

Page 24: SRE in Startup

SaaS oriented

24

Page 25: SRE in Startup

Oncall culture

25

Page 26: SRE in Startup

It's cool work

26

Page 28: SRE in Startup

"May the Queries Flow,And the Pagers Remain Silent"

SRE Benediction

28