antifragile, microservices and devops - a study

Post on 21-Apr-2017






Click to see full reader


Anti-fragility, Microservices & DevOps- A Study

By William


• The Principle of Anti-fragility• Microservices Architecture• The Principle of DevOps

Topic:What’s the Antonym of Fragile?

• Robust?• Anti-fragile


Shatters when exposed to even a small stressor.


The Problem of Robust

• Robust is just Fragile with a thicker skin…• Encourages a defensive, static mindset• Resistant to change?• Vulnerable to “Black Swan” events…– Something we haven’t anticipated– A failure mode we can’t have foreseen– A cascade of errors that we did not plan for

Black Swans


When exposed to stress it gets stronger


Some things benefit from shocks…volatility, randomness, disorder, and stressors and love adventure, risk, and uncertainty… there is no word for the exact opposite of fragile. Let’s call it antifragile.

Nassim N. Taleb, “Antifragile. Things that gain from disorder”

Triple Prism of Fragile, Robust & Anti-fragile

Fragile Robust Anti-Fragile

Icon Glass Medieval Castle


Methodology “Spaghetti” ITIL DevOps

Attitude to change

Fear Change Resist Change Embrace Change

Response to Change

Break Repel Adapt

Rate of Change

Ideally never! Slow Rapid

Change initiated by

Needs CEO approval

Change Management Board

User-initiated(via automation)

Focuses on Survival Process Business Value

Is the System in Your Company

• Fragile?• Robust??• Anti-fragile???

Anti-fragile Microservices Architecture

Microservices Architecture – A Case in Practice

Service Dependency

Single Dependency Delay Causing Blocking of User Request

All User Requests will be Blocked at Peak Hour(Cascading Failure)

Circuit Breaker & Bulkhead Isolation Pattern

Cross IDC Active - Active


DC Aware Gateway

SOA Edge ServiceServiceRegistry

Peer Sync


Invoke Invoke


DC 1 DC 2

SOA Middle Tier Service

DC Aware Gateway

SOA Edge Service

SOA Middle Tier Service

ServiceRegistryDC Aware

ClientDC Aware



Invoke Invoke

Lookup Lookup

Register Register

Lookup Lookup


Fallback Invocation

Fallback Invocation

Building Distributed System is Extremely Hard

• Even Harder to Test Sufficiently– Massive data sets and changing shape– Internet-scale traffic– Complex interaction and information flow– Asynchronous nature– 3rd party services– All while innovating and building features

Prohibitively expensive, if not impossible, for most large-scale systems.

There is another Way

• Assume everything will fail• Cause failure to validate resiliency• Test design assumption by stressing them• Don’t wait for random failure. Remove its

uncertainty by forcing it periodically.

What Netflix has Done – Embrace Chaos!

“One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.”

Netflix Simian Army

Representative Anti-fragile Organization

The Netflix cloud architecture is anti-fragile… The Netflix culture is anti-fragile… Getting stronger through failure is the basis of anti-fragility. Avoiding failure at all costs… makes you brittle and vulnerable…

Adrian Cockroft, “Looking back at 2012 with pointers to 2013”

Architecture for ImperfectionA highly agile and highly available service constructed from ephemeral and often broken components. It is a service-oriented architecture built on micro-services, none of which are essential to the operation of the whole.The software is written to run across three Amazon datacenters, and will tolerate the loss of any one. We can lose a third of our infrastructure without our customers noticing and calling customer services, it’s no idle claim, Netflix even tests this aspect of its infrastructure. A few weeks ago the team deliberately killed one of the three zones, knocking out 3000 servers in one fell swoop, just to prove that we could do it.By Adrian Cockcroft, from “Netflix, HANA and the meaning of cloud”

Netflix Global Active – Active Cloud Architecture

What on Earth is DevOps

Devops means giving a sh*t about your job enough to not pass the buck.Devops means giving a sh*t about your job enough to want to learn all the parts and not just your little world.Developers need to understand infrastructure.Operations people need to understand code.- John E. Vincent(@Lusis)

The First Way

Silo vs. System Thinking, focus on the end to end value flow.

The Second Way

System improvement via visibility, feedback and data driven decisions

The Third Way

Embrace ChangeBe willing to ExperimentLearn from your mistakes

Microservices Organizational Structure

Take Away

1. Obsessive protection of system against extremely rare events makes it more fragile.

2. Monoculture is fragile, diversity is anti-fragile.

3. If it hurts, do it more often, and bring the pain forward.

4. To create anti-fragile system, stress to them continuously so we are forced to simplify and automate.

Reading for System and Architectural Thinking – recommended by Adrian Cockroft

top related