architecture-aware analysis of concurrent software

40
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Joint work with Sebastian Burckhardt and Milo Martin Intel Haifa Symposium, Sept 2009

Upload: beatrice-antwan

Post on 02-Jan-2016

39 views

Category:

Documents


1 download

DESCRIPTION

Architecture-aware Analysis of Concurrent Software. Rajeev Alur University of Pennsylvania Joint work with Sebastian Burckhardt and Milo Martin. Intel Haifa Symposium, Sept 2009. Shared-memory Multiprocessor. Multi-threaded Software. Concurrent Executions. Bugs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Architecture-aware Analysis of Concurrent Software

Architecture-aware Analysis of Concurrent Software

Rajeev Alur University of Pennsylvania

Joint work with Sebastian Burckhardt and Milo Martin

Intel Haifa Symposium, Sept 2009

Page 2: Architecture-aware Analysis of Concurrent Software

Challenge: Exploiting Concurrency, Correctly

Multi-threaded Software Shared-memory Multiprocessor

Concurrent Executions

Bugs

Page 3: Architecture-aware Analysis of Concurrent Software

Correctness is formalized as a mathematical claim to be proved or falsified rigorously

always with respect to the given specification

Is formal verification of “real” software possible? Verification problem is undecidable

Even approximate versions are computationally intractable (model checking is Pspace-hard)

Use requires great expertise

… and many such hurdles

Verifiersoftware/model

correctnessspecification

yes/proof

no/bug

Page 4: Architecture-aware Analysis of Concurrent Software

1970s – Program verification Proof calculi for proving correctness

Challenge: Finding invariants

Current tools: ACL2, PVS, ESC-Java

Main applications: Microprocessor verification, Correctness of JVM…

1980s – Protocol analysis Automated reachability analysis

Temporal logic model checking

Challenge: State-space explosion

Current tools: SPIN, CADP,..

Main applications: Distributed algorithms, network protocols

1990s – Symbolic model checking

Constraint-based analysis of boolean systems using fixpoints

Efficient data structures (OBDDs)

Bugs in Verilog/VHDL designs

Commercial tools and industrial groups (Cadence, NEC, Intel, Motorola, IBM, …)

Page 5: Architecture-aware Analysis of Concurrent Software

2000s: Model Checking of C code

Phase 1: Given a program P, build an abstract finite-state (Boolean) model A such that set of behaviors of P is a subset of those of A (conservative abstraction)

Phase 2: Model check A wrt specification: this can prove P to be correct, or reveal a bug in P, or suggest inadequacy of A

Shown to be effective on Windows device drivers in Microsoft Research project SLAM

do{ KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){

request = request->Next;KeReleaseSpinLock();nPackets++;

}}while(nPackets!=

nPacketsOld);KeReleaseSpinLock();

Does this code obey the

locking spec?

Page 6: Architecture-aware Analysis of Concurrent Software

Software Model Checking

Tools for verifying source code combine many techniques

Program analysis techniques such as slicing, range analysis

Abstraction

Model checking

Refinement from counter-examples

New challenges for model checking (beyond finite-state reachability analysis)

Recursion gives pushdown control

Pointers, dynamic creation of objects, inheritence….

A very active and emerging research area

Abstraction-based tools: SLAM, BLAST,…

Direct state encoding: F-SOFT, CBMC, CheckFence…

Page 7: Architecture-aware Analysis of Concurrent Software

Concurrency on Multiprocessors

Output not consistent with any interleaved execution! can be the result of out-of-order stores can be the result of out-of-order loads improves performance, but unintuitive

x = 1 y = 1

print y print x

thread 1 thread 2

→ 1 → 0

Initially x = y = 0

Page 8: Architecture-aware Analysis of Concurrent Software

Architectures with Weak Memory Models

A modern multiprocessor does not enforce global ordering of all instructions for performance reasons Each processor has pipelined architecture and executes

multiple instructions simultaneously Each processor has a local cache, and loads/stores to local

cache become visible to other processors and shared memory at different times

Lamport (1979): Sequential consistency semantics for correctness of multiprocessor shared memory (like interleaving)

Considered too limiting, and many “relaxations” proposed In theory: TSO, RMO, Relaxed … In practice: Alpha, Intel IA-32, IBM 370, Sun SPARC, PowerPC

… Active research area in computer architecture

Page 9: Architecture-aware Analysis of Concurrent Software

Programming with Weak Memory Models

Concurrent programming is already hard, shouldn’t the effects of weaker models be hidden from the programmer?

Mostly yes … Safe programming using extensive use of synchronization

primitives Use locks for every access to shared data Compilers use memory fences to enforce ordering

Not always … Non-blocking data structures Highly optimized library code for concurrency Code for lock/unlock instructions

Page 10: Architecture-aware Analysis of Concurrent Software

Non-blocking Queue (MS’96)

boolean_t dequeue(queue_t *queue, value_t *pvalue){ node_t *head; node_t *tail; node_t *next;

while (true) { head = queue->head; tail = queue->tail; next = head->next; if (head == queue->head) { if (head == tail) { if (next == 0) return false; cas(&queue->tail, (uint32) tail, (uint32) next); } else { *pvalue = next->value; if (cas(&queue->head, (uint32) head, (uint32) next)) break; } } } delete_node(head); return true;}

1 2 3

head tail

Queue is being possibly updated concurrently

Atomic compare-and-swap for synchronization

Page 11: Architecture-aware Analysis of Concurrent Software

Software Model Checking for Concurrent Code on Multiprocessors

Why?: Real bugs in real code

Opportunities 10s—100s lines of low-level library C code Hard to design and verify -> buggy Effects of weak memory models, fences …

Challenges Lots of behaviors possible: high level of concurrency How to formalize and reason about weak memory models?

Page 12: Architecture-aware Analysis of Concurrent Software

Talk Outline

Motivation

Relaxed Memory Models

CheckFence: Analysis tool for Concurrent Data Types

Page 13: Architecture-aware Analysis of Concurrent Software

Hierarchy of Abstractions

Programs (multi-threaded) -- Synchronization primitives: locks, atomic

blocks -- Library primitives (e.g. shared queues)

Assembly code -- Synchronization primitives: compare&swap,

LL/SC -- Loads/stores from shared memory

Hardware -- multiple processors -- write buffers, caches, communication bus …

Application level memory model

Architecture level memory model

Hardware Verification

Software Verification

Page 14: Architecture-aware Analysis of Concurrent Software

Shared Memory Consistency Models

Specifies restrictions on what values a read from shared memory can return

Program Order: x <p y if x and y are instructions belonging to the same thread and x appears before y

Sequential Consistency (Lamport 79): There exists a global order < of all accesses such that If x <p y then x < y Each load returns value of most recent, according to <, store

to the same location (or initial value, if no such store exists)

Clean abstraction for programmers, but high implementation cost

Page 15: Architecture-aware Analysis of Concurrent Software

Effect of Memory Model

Ensures mutual exclusion if architecture supports SC memory

Most architectures do not enforce ordering of accesses to different memory locations

Does not ensure mutual exclusion under weaker models

Ordering can be enforced using “fence” instructions Insert MEMBAR between lines 1 and 2 to ensure mutual

exclusion

1. flag1 = 1;2. if (flag2 == 0)

crit. sect.

1. flag2 = 1;2. if (flag1 == 0)

crit. sect.

thread 1 thread 2

Initially flag1 = flag2 = 0

Page 16: Architecture-aware Analysis of Concurrent Software

Relaxed Memory Models

A large variety of models exist; a good starting point:Shared Memory Consistency Models: A tutorial IEEE Computer 96, Adve & Gharachorloo

How to relax memory order requirement? Operations of same thread to different locations need not be

globally ordered

How to relax write atomicity requirement? Read may return value of a write not yet globally visible

Uniprocessor semantics preserved

Typically defined in architecture manuals (e.g. SPARC manual)

Page 17: Architecture-aware Analysis of Concurrent Software

Unusual Effects of Memory Models

Possible on TSO/SPARC Write to A propagated only to local reads to A Reads to flags can occur before writes to flags

Not allowed on IBM 370 Read of A on a processor waits till write to A is complete

flag1 = 1; A = 1; reg1 = A; reg2 = flag2;

thread 1 thread 2

Initially A = flag1 = flag2 = 0

flag2 = 1; A = 2; reg3 = A; reg4 = flag1;

Result reg1 = 1; reg3 = 2; reg2 = reg4 = 0

Page 18: Architecture-aware Analysis of Concurrent Software

Which Memory Model should a Verifier use?

Memory models are platform dependent

We propose a conservative approximation “Relaxed” to capture common effects

Once code is correct for “Relaxed”, it is correct for many models

Tool allows user to specify a memory model using axioms

TSO

PSO

IA-32Alpha

Relaxed

RMO

390SC

Page 19: Architecture-aware Analysis of Concurrent Software

Formalization of Relaxed

Program Order: x <p y if x and y are instructions belonging to the same thread and x appears before y

Execution over a set X of accesses is correct wrt Relaxed if there exists a total order < over X such that

1. If x <p y, and both x and y are accesses to the same address, and y is a store, then x < y must hold

2. For a load l and a store s visible to l, either s and l have same value, or there exists another store s’ visible to l with s < s’

A store s is visible to load l if they are to the same address and either s < l or s <p l (i.e. stores are locally visible)

Constraint-based specification that can be easily encoded in logical formulas

Page 20: Architecture-aware Analysis of Concurrent Software

Talk Outline

Motivation

Relaxed memory models

CheckFence: Analysis tool for Concurrent Data Types

Page 21: Architecture-aware Analysis of Concurrent Software

concurrency libraries with lock-free synchronization

... are simple, fast, and safe to use concurrent versions of queues, sets, maps, etc. more concurrency, less waiting fewer deadlocks

... are notoriously hard to design and verify tricky interleavings routinely escape reasoning and testing exposed to relaxed memory models

code needs to contain memory fences for correct operation!

CheckFence Focus

Page 22: Architecture-aware Analysis of Concurrent Software

Architecture:+ Multiprocessors

+ Relaxed memory models

Concurrent Algorithms:+ Lock-free queues, sets, lists

Computer-Aided Verification:

+ Model checking C code+ Counterexamples

CheckFence

Tool

References[CAV 2006], [PLDI 2007]

Burckhardt thesis

Page 23: Architecture-aware Analysis of Concurrent Software

Non-blocking Queue

The implementation

optimized: no locks. not race-free exposed to memory model

The client program

on multiple processors calls operations

....

... enqueue(1)

... enqueue(2)

....

.... ....

Processor 1

....

...

...a = dequeue()b = dequeue()

Processor 2

void enqueue(int val) { ...}

int dequeue() { ... }

Page 24: Architecture-aware Analysis of Concurrent Software

Non-blocking Queue (MS’96)

boolean_t dequeue(queue_t *queue, value_t *pvalue){ node_t *head; node_t *tail; node_t *next;

while (true) { head = queue->head; tail = queue->tail; next = head->next; if (head == queue->head) { if (head == tail) { if (next == 0) return false; cas(&queue->tail, (uint32) tail, (uint32) next); } else { *pvalue = next->value; if (cas(&queue->head, (uint32) head, (uint32) next)) break; } } } delete_node(head); return true;}

Page 25: Architecture-aware Analysis of Concurrent Software

Correctness Condition Data type implementations must appear sequentially consistent to the client program:

the observed argument and return values must be consistent with some interleaved, atomic execution of the operations.

enqueue(1)dequeue() -> 2

enqueue(2)dequeue() -> 1

enqueue(1)enqueue(2)dequeue() -> 1

dequeue() -> 2

Observation Witness Interleaving

Page 26: Architecture-aware Analysis of Concurrent Software

thread 1

enqueue(X)

thread 2

dequeue() → Y

How To Bound Executions

Verify individual “symbolic tests” finite number of concurrent threads finite number of operations/thread nondeterministic input values

Example

User creates suite of tests of increasing size

Page 27: Architecture-aware Analysis of Concurrent Software

Why symbolic test programs?

1) Make everything finite State is unbounded (dynamic memory allocation)

... is bounded for individual test Checking sequential consistency is undecidable (AMP 96)

... is decidable for individual test

2) Gives us finite instruction sequence to work with State space too large for interleaved system model

.... can directly encode value flow between instructions Memory model specified by axioms

.... can directly encode ordering axioms on instructions

Page 28: Architecture-aware Analysis of Concurrent Software

Bounded Model Checker

Pass: all executions of the test are observationally equivalent to a serial execution

Fail:CheckFence

Memory Model Axioms

Inconclusive: runs out of time or memory

Page 29: Architecture-aware Analysis of Concurrent Software

Tool Architecture

C code

Symbolic Test

Trace

Symbolic test gives exponentially many executions(symbolic inputs, dynamic memory allocation, ordering of instructions).

CheckFence solves for “incorrect” executions.

Memory model

Page 30: Architecture-aware Analysis of Concurrent Software

C code

Symbolic Test

Trace

Symbolic Test

automatic, lazyloop unrolling

automatic specification mining(enumerate correct observations)

construct CNF formula whose solutions correspond precisely to the concurrent executions

Memory model

Page 31: Architecture-aware Analysis of Concurrent Software

thread 1

enqueue(X); enqueue(Y)

thread 2

dequeue() → Z

Specification Mining

Possible Operation-level Interleavings

enqueue(X)

enqueue(Y)

dequeue() -> Z

enqueue(X)

dequeue() -> Z

enqueue(Y)

dequeue() -> Z

enqueue(X)

enqueue(Y)

For each interleaving, obtain symbolic constraint by encoding corresponding executions in SAT solverSpec is disjunction of all possibilities:

Spec: (Z=X) | (Z=null)

To find bugs, check satisfiability of Phi & ~ Specwhere Phi encodes all possible concurrent executions

Page 32: Architecture-aware Analysis of Concurrent Software

Encoding Memory Order

Variables for encoding Use boolean vars for relative order (x<y) of memory

accesses Use bitvector variables Ax and Dx for address and data values

associated with memory access x Encode constraints

encode transitivity of memory order encode ordering axioms of the memory model

Example (for SC): (s1<s2) & (l1<l2) encode value flow

“Loaded value must match last value stored to same address”Example: value must flow from s1 to l1 under following conditions:

((s1<l1)&(As1 = Al1)&((s2<s1)|(l1<s2)|(As2 != Al1))) -> (Ds1= Dl1)

s1 store s2 store

l1 load l2 load

thread 1 thread 2

Page 33: Architecture-aware Analysis of Concurrent Software

Example: Memory Model Bug

Processor 1 links new node into list

Processor 2 reads value at head of list

--> Processor 2 loads uninitialized value

...

3 node->value = 2; ...1 head = node; ...

...

2 value = head->value; ...

Processor 1 reorders the stores! memory accesses happen in order 1 2 3

adding a fence between lines on left side prevents reordering

1 2 3

head

Page 34: Architecture-aware Analysis of Concurrent Software

Type Description LOC

Source

Queue Two-lock queue 80 M. Michael and L. Scott (PODC 1996)Queue Non-blocking

queue98

Set Lazy list-based set 141 Heller et al. (OPODIS 2005)

Set Nonblocking list 174 T. Harris (DISC 2001)

Deque “snark” algorithm 159 D. Detlefs et al. (DISC 2000)

LL/VL/SC CAS-based 74 M. Moir (PODC 1997)

LL/VL/SC Bounded Tags 198

Algorithms Analyzed

Page 35: Architecture-aware Analysis of Concurrent Software

2 known

1 unknown

regular

bugs

Bounded Tags

CAS-based

fixed “snark”

original “snark”

Nonblocking list

Lazy list-based set

Non-blocking queue

Two-lock queue

Description

Deque

LL/VL/SC

LL/VL/SC

Deque

Set

Set

Queue

Queue

Type

Results snark algorithm has 2 known bugs lazy list-based set had a unknown bug

(missing initialization; missed by formal correctness proof [CAV 2006] because of hand-translation of pseudocode)

Page 36: Architecture-aware Analysis of Concurrent Software

# Fences inserted

2 known

1 unknown

regular

bugs

4

1

1

2

1

StoreStore

2

4

Load Load

4

2

3

1

1

DependentLoads

4

3

6

3

2

AliasedLoads

Bounded Tags

CAS-based

fixed “snark”

original “snark”

Nonblocking list

Lazy list-based set

Non-blocking queue

Two-lock queue

Description

Deque

LL/VL/SC

LL/VL/SC

Deque

Set

Set

Queue

Queue

Type

Results snark algorithm has 2 known bugs lazy list-based set had a unknown bug

(missing initialization; missed by formal correctness proof [CAV 2006] because of hand-translation of pseudocode)

Many failures on relaxed memory model• inserted fences by hand to fix them• small testcases sufficient for this purpose

Page 37: Architecture-aware Analysis of Concurrent Software

Typical Tool Performance

Very efficient on small testcases (< 100 memory accesses)Example (nonblocking queue): T0 = i (e | d) T1 = i (e | e | d | d )- find counterexamples within a few seconds- verify within a few minutes- enough to cover all 9 fences in nonblocking queue

Slows down with increasing number of memory accesses in testExample (snark deque):Dq = pop_l | pop_l | pop_r | pop_r | push_l | push_l | push_r | push_r- has 134 memory accesses (77 loads, 57 stores)- Dq finds second snark bug within ~1 hour

Does not scale past ~300 memory accesses

Page 38: Architecture-aware Analysis of Concurrent Software

Summary

Software model checking of low-level concurrent software requires encoding of memory models

Challenge for model checking due to high level of concurrency and axiomatic specifications

Opportunity to find bugs in library code that’s hard to design and verify

CheckFence project at Penn SAT-based bounded model checking for concurrent data

types Bugs in real code with fences

Page 39: Architecture-aware Analysis of Concurrent Software

Research Challenges

What’s the best way to verify C code (on relaxed memory models)? SAT-based encoding seems suitable to capture specifications

of memory models, but many opportunities for improvement Can one develop abstract operational abstract models for

multiprocessor architectures? Proof methods for relaxed memory models

Language-level memory models: Can we verify Java concurrency libraries using the new Java memory model?

Hardware support for transactional memory Current interest in industry and architecture research Can formal verification influence designs/standards?

Page 40: Architecture-aware Analysis of Concurrent Software

Programs (multi-threaded)

System-level code Concurrency libraries

Highly parallel hardware -- multicores, SoCs

Application level concurrency model

Architecture level concurrency model

ComplexEfficient use of parallelism

SimpleUsable by programmers

Architecture-aware

Concurrency Analysis