program analysis via graph reachability thomas reps university of wisconsin pldi 00 tutorial,...

Post on 16-Dec-2015

224 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Program Analysis via Graph Reachability

Thomas Reps

University of Wisconsin

PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000

http://www.cs.wisc.edu/~reps/

PLDI 00 Registration Form

• PLDI 00: …………………….. $ ____

• Tutorial (morning): …………… $ ____

• Tutorial (afternoon): ………….. $ ____

• Tutorial (evening): ……………. $ – 0 –

Applications• Program optimization• Program-understanding and

software-reengineering• Security

– information flow

• Verification– model checking– security of crypto-based protocols for

distributed systems

1987

1993

1994

1995

1997

1998

1996

Slicing&

Applications

DataflowAnalysis Demand

Algorithms

SetConstraints

Structure-TransmittedDependences

CFLReachability

. . . As Well As . . .• Flow-insensitive points-to analysis

• Complexity results– Linear . . . cubic . . . undecidable variants– PTIME-completeness

• Model checking of recursive hierarchical finite-state machines– “infinite”-state systems– linear-time and cubic-time algorithms

. . . And Also

• Analysis of attribute grammars• Security of crypto-based protocols for

distributed systems [Dolev, Even, & Karp 83]

• Formal-language problems– CFL-recognition (given G and , is L(G)?)

– 2DPDA- and 2NPDA-simulation

• Given M and , is L(M)?

• String-matching problems

Unifying Conceptual Modelfor Dataflow-Analysis Literature

• Linear-time gen-kill [Hecht 76], [Kou 77]• Path-constrained DFA [Holley & Rosen 81]• Linear-time GMOD [Cooper & Kennedy 88]• Flow-sensitive MOD [Callahan 88]• Linear-time interprocedural gen-kill

[Knoop & Steffen 93]• Linear-time bidirectional gen-kill [Dhamdhere 94]• Relationship to interprocedural DFA

[Sharir & Pneuli 81], [Knoop & Steffen 92]

Collaborators

• Susan Horwitz

• Mooly Sagiv

• Genevieve Rosay

• David Melski

• David Binkley

• Michael Benedikt

• Patrice Godefroid

Themes

• Harnessing CFL-reachability

• Relationship to other analysis paradigms

• Exhaustive alg. Demand alg.

• Understanding complexity– Linear . . . cubic . . . undecidable

• Beyond CFL-reachability

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Backward Slice

Backward slice with respect to “printf(“%d\n”,i)”

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Backward Slice

Backward slice with respect to “printf(“%d\n”,i)”

int main() {

int i = 1;while (i < 11) {

i = i + 1;}

printf(“%d\n”,i);}

Slice Extraction

Backward slice with respect to “printf(“%d\n”,i)”

Forward Slice

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Forward slice with respect to “sum = 0”

Forward slice with respect to “sum = 0”

Forward Slice

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

What Are Slices Useful For?• Understanding Programs

– What is affected by what?

• Restructuring Programs– Isolation of separate “computational threads”

• Program Specialization and Reuse– Slices = specialized programs– Only reuse needed slices

• Program Differencing– Compare slices to identify changes

• Testing– What new test cases would improve coverage?– What regression tests must be rerun after a change?

Line-Character-Count Program

void line_char_count(FILE *f) {int lines = 0;int chars;BOOL eof_flag = FALSE;int n;extern void scan_line(FILE *f, BOOL *bptr, int *iptr);scan_line(f, &eof_flag, &n);chars = n;while(eof_flag == FALSE){

lines = lines + 1;scan_line(f, &eof_flag, &n);chars = chars + n;

}printf(“lines = %d\n”, lines);printf(“chars = %d\n”, chars);

}

Character-Count Program

void char_count(FILE *f) {int lines = 0;int chars;BOOL eof_flag = FALSE;int n;extern void scan_line(FILE *f, BOOL *bptr, int *iptr);scan_line(f, &eof_flag, &n);chars = n;while(eof_flag == FALSE){

lines = lines + 1;scan_line(f, &eof_flag, &n);chars = chars + n;

}printf(“lines = %d\n”, lines);printf(“chars = %d\n”, chars);

}

Line-Character-Count Program

void line_char_count(FILE *f) {int lines = 0;int chars;BOOL eof_flag = FALSE;int n;extern void scan_line(FILE *f, BOOL *bptr, int *iptr);scan_line(f, &eof_flag, &n);chars = n;while(eof_flag == FALSE){

lines = lines + 1;scan_line(f, &eof_flag, &n);chars = chars + n;

}printf(“lines = %d\n”, lines);printf(“chars = %d\n”, chars);

}

Line-Count Program

void line_count(FILE *f) {int lines = 0;int chars;BOOL eof_flag = FALSE;int n;extern void scan_line2(FILE *f, BOOL *bptr, int *iptr);scan_line2(f, &eof_flag, &n);chars = n;while(eof_flag == FALSE){

lines = lines + 1;scan_line2(f, &eof_flag, &n);chars = chars + n;

}printf(“lines = %d\n”, lines);printf(“chars = %d\n”, chars);

}

Specialization Via Slicing

wc -lc

wc -c wc -l

void line_count(FILE *f);

Not partial evaluation!

Control Flow Graph

Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

T

F

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Flow Dependence Graphint main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 printf(sum) printf(i)

sum = sum + i i = i + i

Flow dependence

p q Value of variableassigned at p may beused at q.

i = 1 while(i < 11)

q is reached from pif condition p istrue (T), not otherwise.

Control Dependence Graph

Control dependence

p qT

p qF

Similar for false (F).

Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

T T

TT T

TTT

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Program Dependence Graph (PDG)int main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

T

TT T

T

Control dependence

Flow dependence

TT

T

Program Dependence Graph (PDG)int main() {

int i = 1;int sum = 0;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

T

TT T

TTT

T

Opposite Order

Same PDG

Backward Sliceint main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

T

TT T

TTT

T

Backward Slice (2)int main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

T

TT T

TTT

T

Backward Slice (3)int main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

T

TT T

TTT

T

Backward Slice (4)int main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

TT

TT T

TTT

Slice Extractionint main() {

int i = 1;while (i < 11) {

i = i + 1;}

printf(“%d\n”,i);} Enter

i = 1 while(i < 11) printf(i)

i = i + iT

TT

TT

CodeSurfer

Browsing a Dependence Graph

Pretend this is your favorite browser

What does clicking on a link do?You geta new page

Or you move to an internal tag

Interprocedural Slice

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

int add(int x, int y) {return x + y;

}

Backward slice with respect to “printf(“%d\n”,i)”

Interprocedural Slice

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

int add(int x, int y) {return x + y;

}

Backward slice with respect to “printf(“%d\n”,i)”

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Interprocedural Slice

int add(int x, int y) {return x + y;

}

Superfluous components included by Weiser’s slicing algorithm [TSE 84]Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

System Dependence Graph (SDG)

Enter main

Call p Call p

Enter p

SDG for the Sum ProgramEnter main

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

Call add Call add

xin = sum yin = i sum = xout xin = i yin= 1 i = xout

Enter add

x = xin y = yin x = x + y xout = x

Interprocedural Backward SliceEnter main

Call p Call p

Enter p

Interprocedural Backward Slice (2)Enter main

Call p Call p

Enter p

Interprocedural Backward Slice (3)Enter main

Call p Call p

Enter p

Interprocedural Backward Slice (4)Enter main

Call p Call p

Enter p

Interprocedural Backward Slice (5)Enter main

Call p Call p

Enter p

Interprocedural Backward Slice (6)Enter main

Call p Call p

Enter p

[

]

)

(

Matched-Parenthesis Path

)(

)[

Interprocedural Backward Slice (6)Enter main

Call p Call p

Enter p

Interprocedural Backward Slice (7)Enter main

Call p Call p

Enter p

Slice ExtractionEnter main

Call p

Enter p

Slice of the Sum ProgramEnter main

i = 1 while(i < 11) printf(i)

Call add

xin = i yin= 1 i = xout

Enter add

x = xin y = yin x = x + y xout = x

CFL-Reachability[Yannakakis 90]

• G: Graph (N nodes, E edges)

• L: A context-free language

• L-path from s to t iff

• Running time: O(N 3)

Lts ,*

Interprocedural Slicingvia CFL-Reachability

• Graph: System dependence graph

• L: L(matched) [roughly]

• Node m is in the slice w.r.t. n iff there

is an L(matched)-path from m to n

Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94]

• CFL-reachability

– System dependence graph: N nodes, E edges

– Running time: O(N 3)

• System dependence graph Special structure

Running time: O(E + CallSites MaxParams3)

( e [

e

]e

[

e ] ] e )

matched | e | [ matched ] | ( matched ) | matched matched

CFL-Reachability

s ts

( e e e e e e[[ [

t

)]] ]

s ts t

Ordinary Graph Reachability

CFL-Reachability via Dynamic Programming

GrammarGraph

BC

A

A B C

s t

Degenerate Case: CFL-Recognition

“(a + b) * c” L(exp) ?

exp id | exp + exp | exp * exp | ( exp )

)( a cb+ *

*a + +)b c

s t

Degenerate Case: CFL-Recognition

“a + b) * c +” L(exp) ?

exp id | exp + exp | exp * exp | ( exp )

CYK: Context-Free Recognition

= “( [ ] ) [ ]”

Is L(M)?

M M M | ( M ) | [ M ] | ( ) | [ ]

CYK: Context-Free Recognition

M M M | ( M ) | [ M ] | ( ) | [ ]

M M M | LPM ) | LBM ] | ( ) | [ ]LPM ( MLBM [ M

Is “( [ ] ) [ ]” L(M)?

( [ ] ) [ ]

{M}

{M} {M}

{LPM}

{ ( } { [ }{ ) } { ] }{ [ } { ] }

length

start

M [ ]LPM ( M

Is “( [ ] ) [ ]” L(M)?

( [ ] ) [ ]

{M}

{M}

{M} {M}

{LPM}

{ (} { [ }{ ) } { ] }{ [ } { ] }

length

start

M? M M M

CYK: Graphs vs. Tables

Is “( [ ] ) [ ]” L(M)?

s t

( [ ] ) [ ]

M M M | LPM ) | LBM ] | ( ) | [ ] LPM ( M LBM [ M

M M

LPMM

M

CFL-Reachability via Dynamic Programming

GrammarGraph

BC

A

A B C

Dynamic Transitive Closure ?!

• Aiken et al.– Set-constraint solvers– Points-to analysis

• Henglein et al.– type inference

• But a CFL captures a non-transitive reachability relation [Valiant 75]

S T

Program Chopping

Given source S and target T, what program points transmit effects from S to T?

Intersect forward slice from S with backward slice from T, right?

Non-Transitivity and Slicing

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

int add(int x, int y) {return x + y;

}

Forward slice with respect to “sum = 0”

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Forward slice with respect to “sum = 0”

Non-Transitivity and Slicing

int add(int x, int y) {return x + y;

}

Non-Transitivity and Slicing

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

int add(int x, int y) {return x + y;

}

Backward slice with respect to “printf(“%d\n”,i)”

Non-Transitivity and Slicing

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

int add(int x, int y) {return x + y;

}

Backward slice with respect to “printf(“%d\n”,i)”

Forward slice with respect to “sum = 0”

Non-Transitivity and Slicing

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

int add(int x, int y) {return x + y;

}

Backward slice with respect to “printf(“%d\n”,i)”

Non-Transitivity and Slicing

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = add(sum,i);i = add(i,1);

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

int add(int x, int y) {return x + y;

}

Chop with respect to “sum = 0” and “printf(“%d\n”,i)”

Non-Transitivity and SlicingEnter main

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

Call add Call add

xin = sum yin = i sum = xout xin = i yin= 1 i = xout

Enter add

x = xin y = yin x = x + y xout = x

( ]

Program Chopping

Given source S and target T, what program points transmit effects from S to T?

S T

“Precise interprocedural chopping”[Reps & Rosay FSE 95]

CF-Recognition vs. CFL-Reachability• CF-Recognition

– Chain graphs– General grammar: sub-cubic time [Valiant75]– LL(1), LR(1): linear time

• CFL-Reachability– General graphs: O(N3)– LL(1): O(N3)– LR(1): O(N3)– Certain kinds of graphs: O(N+E)– Regular languages: O(N+E)

Gen/kill IDFA

GMOD IDFA

Regular-Language Reachability[Yannakakis 90]

• G: Graph (N nodes, E edges)

• L: A regular language

• L-path from s to t iff

• Running time: O(N+E)

• Ordinary reachability (= transitive closure)

– Label each edge with e

– L is e*

Lts ,*

vs. O(N3)

Security of Crypto-Based Protocols for Distributed System

• “Ping-pong” protocols

(1) X —EncryptY(M X) Y

(2) Y —EncryptX(M) X

• [Dolev & Yao 83]–O(N8) algorithm

• [Dolev, Even, & Karp 83]– Less well known than [Dolev & Yao 83]–O(N3) algorithm

[Dolev, Even, & Karp 83]

Id EncryptX Id DecryptX

Id DecryptX Id EncryptX

Id . . .

Id ?

Message SaboteurEY

EY

AX

AZ

Themes

• Harnessing CFL-reachability

• Relationship to other analysis paradigms

• Exhaustive alg. Demand alg.

• Understanding complexity– Linear . . . cubic . . . undecidable

• Beyond CFL-reachability

Relationship to Other Analysis Paradigms

• Dataflow analysis

–reachability versus equation solving

• Deduction

• Set constraints

Dataflow Analysis

• Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution

• Examples– Constant propagation– Reaching definitions– Live variables– Possibly uninitialized variables

Useful For . . .

• Optimizing compilers

• Parallelizing compilers

• Tools that detect possible logical errors

• Tools that show the effects of a proposed modification

Possibly Uninitialized VariablesStart

x = 3

if . . .

y = x

y = w

w = 8

printf(y)

},,.{ yxwV

}{. xVV

VV .VV .

}{. wVV

}{ else }{ then

if .

yVyV

VxV

}{ else }{ then

if .

yVyV

VwV

{w,x,y}

{w,y}

{w,y}

{w,y}

{w}

{w,y}{}

{w,y}

{}

Precise Intraprocedural Analysis

start n

C

ffffpf 121

kkp

)(]MOP[]PathsTo[

pf Cnnp

p

f 1 f 2 f kf 1k

x = 3

p(x,y)

return from p

printf(y)

start main

exit main

start p(a,b)

if . . .

b = a

p(a,b)

return from p

printf(b)

exit p

(

)

]

(

Precise Interprocedural Analysis

start n

C

f 4

f 5

f 3

start q exitq

callq ret

)(]MOMP[]hsTo[MatchedPat

pf Cnnp

p

f 1 f 2 f kf 1k

f 2k

f 3k

( )

[Sharir & Pnueli 81]

Representing Dataflow Functions

Identity Function

VV .f

}{.f bVConstant Function

a b c

a b c

},{}),f({ baba

}{}),f({ bba

Representing Dataflow Functions

}{}){.(f cbVV

}{ else }{ then

if .f

bVbV

VaV

“Gen/Kill” Function

Non-“Gen/Kill” Function a b c

a b c

},{}),f({ caba

},{}),f({ baba

x = 3

p(x,y)

return from p

printf(y)

start main

exit main

start p(a,b)

if . . .

b = a

p(a,b)

return from p

printf(b)

exit p

x y a b

else }{ then

if .f 2

cVbV

a b c}{ else }{then

if .f 1

bVbV

VaV

b ca

Composing Dataflow Functions

}{ else }{then

if .f 1

bVbV

VaV

b ca

else }{ then

if .f 2

cVbV

}),({ff 12 ca }{c

x = 3

p(x,y)

return from p

start main

exit main

start p(a,b)

if . . .

b = a

p(a,b)

return from p

exit p

x y a b

printf(y)

Might b beuninitializedhere?

printf(b) NO!

(

]

Might y beuninitializedhere?

YES!

(

)

matched matched matched

| (i matched )i 1 i CallSites | edge |

stack

) ( (

((

(

)

) )

)

stack

( )

Off Limits!

) (

(

((

(

)

)

)

( )

(

stack

(

(

unbalLeft matched unbalLeft

| (i unbalLeft 1 i CallSites |

stack

Off Limits!

Interprocedural Dataflow Analysisvia CFL-Reachability

• Graph: Exploded control-flow graph

• L: L(unbalLeft)

• Fact d holds at n iff there is an L(unbalLeft)-path

from dnstartmain , to,

Asymptotic Running Time [Reps, Horwitz, & Sagiv 95]

• CFL-reachability– Exploded control-flow graph: ND nodes– Running time: O(N3D3)

• Exploded control-flow graph Special structure

Running time: O(ED3)

Typically: E N, hence O(ED3) O(ND3)

“Gen/kill” problems: O(ED)

Why Bother?“We’re only interested in million-line programs”

• Know thy enemy!– “Any” algorithm must do these operations– Avoid pitfalls (e.g., claiming O(N2) algorithm)

• The essence of “context sensitivity”• Special cases

– “Gen/kill” problems: O(ED)• Compression techniques

– Basic blocks– SSA form, sparse evaluation graphs

• Demand algorithms

Relationship to Other Analysis Paradigms

• Dataflow analysis

–reachability versus equation solving

• Deduction

• Set constraints

The Need for Pointer Analysis

int main() { int sum = 0; int i = 1; int *p = &sum; int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q);}

int add(int x, int y) { return x + y;}

The Need for Pointer Analysis

int main() { int sum = 0; int i = 1; int *p = &sum; int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q);}

int add(int x, int y) { return x + y;}

The Need for Pointer Analysis

int main() { int sum = 0; int i = 1; int *p = &sum; int *q = &i; int (*f)(int,int) = add; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i);}

int add(int x, int y) { return x + y;}

Flow-Sensitive Points-To Analysis

p = &q;

p = q;

p = *q;

*p = q;

p q

pr1

r2

q

r1

r2

q

s1

s2

s3

p

ps1

s2

qr1

r2

p q

pr1

r2

q

r1

r2

q

s1

s2

s3

p

ps1

s2

qr1

r2

Flow-Sensitive Flow-Insensitive

start main

exit main

3

2

1

45

3

2

1

4

5

Flow-Insensitive Points-To Analysis[Andersen 94, Shapiro & Horwitz 97]

p = &q;

p = q;

p = *q;

*p = q;

p q

pr1

r2

q

r1

r2

q

s1

s2

s3

p

ps1

s2

qr1

r2

Flow-Insensitive Points-To Analysis

a = &e; b = a; c = &f;*b = c; d = *a;

a

d

b

cf

e

Flow-Insensitive Points-To Analysis• Andersen [Thesis 94]

– Formulated using set constraints– Cubic-time algorithm

• Shapiro & Horwitz (1995; [POPL 97])– Re-formulated as a graph-grammar problem

• Reps (1995; [unpublished])– Re-formulated as a Horn-clause program

• Melski (1996; see [Reps, IST98])– Re-formulated via CFL-reachability

CFL-Reachability via Dynamic Programming

GrammarGraph

BC

A

A B C

CFL-Reachability = Chain Programs

Grammar

A B C

Graph

BC

a(X,Z) :- b(X,Y), c(Y,Z).

zx

y

A

Base Facts for Points-To Analysis

p = &q;

p = q;

p = *q;

*p = q;

assignAddr(p,q).

assign(p,q).

assignStar(p,q).

starAssign(p,q).

Rules for Points-To Analysis (I)

pointsTo(P,Q) :- assignAddr(P,Q).

pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R).

p = &q; p q

p = q; pr1

r2

q

Rules for Points-To Analysis (II)

pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S).

pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S).

p = *q; r1

r2

q

s1

s2

s3

p

*p = q; ps1

s2

qr1

r2

Creating a Chain Program

pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S).

*p = q; ps1

s2

qr1

r2

pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S).

pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S).

pointsTo(R,P) :- pointsTo(P,R).

Base Facts for Points-To Analysis

p = &q;

p = q;

p = *q;

*p = q;

assignAddr(p,q).

assign(p,q).

assignStar(p,q).

starAssign(p,q).starAssign(q,p).

assignStar(q,p).

assign(q,p).

assignAddr(q,p).

Creating a Chain ProgrampointsTo(P,Q) :- assignAddr(P,Q).

pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R).

pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S).

pointsTo(Q,P) :- assignAddr(Q,P).

pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S).

pointsTo(S,P) :- pointsTo(S,R),pointsTo(R,Q),assignStar(Q,P).

pointsTo(S,R) :- pointsTo(S,Q),starAssign(Q,P),pointsTo(P,R).

pointsTo(R,P) :- pointsTo(R,Q), assign(Q,P).

. . . and now to CFL-Reachability

pointsTo assign pointsTo

pointsTo assignStar pointsTo pointsTo

pointsTo assignAddr

pointsTo assignAddr

pointsTo pointsTo starAssign pointsTo

pointsTo pointsTo pointsTo assignStar

pointsTo pointsTo starAssign pointsTo

pointsTo pointsTo assign

Relationship to Other Analysis Paradigms

• Dataflow analysis

–reachability versus equation solving

• Deduction

• Set constraints

1987

1993

1994

1995

1997

1998

1996

Slicing&

Applications

DataflowAnalysis Demand

Algorithms

SetConstraints

Structure-TransmittedDependences

CFLReachability

Structure-TransmittedDependences Set

Constraints

Structure-Transmitted Dependences [Reps1995]

McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y

w = cons(x,y); v = car(w);

v

w

yx

dep

dep

dephd dep hddep -1

hd tl

hd 1-

Set Constraintsw = cons(x,y); ),cons( YXW

)(cons 11 WV v = car(w);

McCarthy’s Equations Revisited

)(provided ,)),(cons(cons 11 Y IXYX

)}(),(|{))(cons( 2111

1 VIvvconsvVI

Semantics of Set Constraints

)}( and )( |),({)),(cons( 22112121 VIvVIvvvconsVVI

CFL-Reachabilityversus

Set Constraints

• Lazy languages: CFL-reachability is more natural– car(cons(X,Y)) = X

• Strict languages: Set constraints are more natural– car(cons(X,Y)) = X, provided I(Y)

• But . . . SC and CFL-reachability are equivalent! – [Melski & Reps 97]

Solving Set Constraints

aW

),cons( YXW )(cons 1

2 WU

X is “inhabited”

),cons( YXW

X is “inhabited”Y is “inhabited”

)(cons 11 WV

),cons( YXW Y is “inhabited”

W is “inhabited”

W is “inhabited”XV YU

W

Simulating “Inhabited”

aW

dep inhab depinhab

inhab

a

dep dep

inhab

W

YX

Simulating “Inhabited”

hd tlhd tl),cons( YXW

inhabinhab

tlinhab tl hd inhab hdinhab

inhab

V

W

YX

Simulating “Provided I(Y) ”

),cons( YXW )(cons 1

1 WV

hd tlhd tl

inhab

dep

hd tlinhab tl hddep 1-hd 1-

provided I(Y)

Themes

• Harnessing CFL-reachability

• Relationship to other analysis paradigms

• Exhaustive alg. Demand alg.

• Understanding complexity– Linear . . . cubic . . . undecidable

• Beyond CFL-reachability

Exhaustive Versus Demand Analysis

• Exhaustive analysis: All facts at all points

• Optimization: Concentrate on inner loops

• Program-understanding tools: Only some facts

are of interest

Exhaustive Versus Demand Analysis

• Demand analysis:– Does a given fact hold at a given point?– Which facts hold at a given point?– At which points does a given fact hold?

• Demand analysis via CFL-reachability– single-source/single-target CFL-reachability– single-source/multi-target CFL-reachability– multi-source/single-target CFL-reachability

x = 3

p(x,y)

return from p

printf(y)

start main

exit main

start p(a,b)

if . . .

b = a

p(a,b)

return from p

printf(b)

exit p

x y a b

YES!

(

)

NO!

“Semi-exhaustive”:All “appropriate” demands

Might y beuninitializedhere?

Might b beuninitializedhere?

Experimental Results[Horwitz , Reps, & Sagiv 1995]

• 53 C programs (200-6,700 lines)• For a single fact of interest:

– demand always better than exhaustive

• All “appropriate” demands beats exhaustive when percentage of “yes” answers is high– Live variables– Truly live variables– Constant predicates– . . .

A Related Result [Sagiv, Reps, & Horwitz 1996]

• [Uses a generalized analysis technique]• 38 C programs (300-6,000 lines)

– copy-constant propagation– linear-constant propagation

• All “appropriate” demands always beats exhaustive– factor of 1.14 to about 6

Exhaustive Versus Demand Analysis

• Demand algorithms for

– Interprocedural dataflow analysis

– Set constraints

– Points-to analysis

Demand Analysis and LP Queries (I)

• Flow-insensitive points-to analysis– Does variable p point to q?

• Issue query: ?- pointsTo(p, q).• Solve single-source/single-target L(pointsTo)-

reachability problem

– What does variable p point to?• Issue query: ?- pointsTo(p, Q).• Solve single-source L(pointsTo)-reachability problem

– What variables point to q?• Issue query: ?- pointsTo(P, q).• Solve single-target L(pointsTo)-reachability problem

Demand Analysis and LP Queries (II)

• Flow-sensitive analysis– Does a given fact f hold at a given point p?

?- dfFact(p, f).– Which facts hold at a given point p?

?- dfFact(p, F).– At which points does a given fact f hold?

?- dfFact(P, f).

• E.g., flow-sensitive points-to analysis?- dfFact(p, pointsTo(x, Y)).?- dfFact(P, pointsTo(x, y)).etc.

Themes

• Harnessing CFL-reachability

• Relationship to other analysis paradigms

• Exhaustive alg. Demand alg.

• Understanding complexity– Linear . . . cubic . . . undecidable

• Beyond CFL-reachability

Interprocedural Backward SliceEnter main

Call p Call p

Enter p

[

]

)

(

x = 3

p(x,y)

return from p

start main

exit main

start p(a,b)

if . . .

b = a

p(a,b)

return from p

exit p

x y a b

printf(y)printf(b)

y may beuninitialized here

[

])

(

Structure-Transmitted Dependences [Reps1995]

McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y

w = cons(x,y); v = car(w);

v

w

yx

hd tl

hd 1-

Dependences + Matched Paths?

Enter main

Enter p

w=cons(x,y) Call p

w

Call p

v = car(w)

w

w

x y

hd

hd-1

( )

tl

[ ]

Undecidable![Reps, TOPLAS 00]

hd hd-1( )

Interleaved Parentheses!

Themes

• Harnessing CFL-reachability

• Relationship to other analysis paradigms

• Exhaustive alg. Demand alg.

• Understanding complexity– Linear . . . cubic . . . undecidable

• Beyond CFL-reachability

CFL-Reachability via Dynamic Programming

GrammarGraph

BC

A

A B C

Beyond CFL-Reachability:Composition of Linear Functions

x.3x+5x.2x+1

x.6x+11

(x.2x+1) (x.3x+5) = x.6x+11

Beyond CFL-Reachability:Composition of Linear Functions

• Interprocedural constant propagation– [Sagiv, Reps, & Horwitz TCS 96]

• Interprocedural path profiling– The number of path fragments contributed

by a procedure is a function

– [Melski & Reps CC 99]

Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)]

• Non-recursive HFSMs [Alur & Yannakakis 98]

• Ordinary FSMs– T-reachability/circularity queries

• Recursive HFSMs– Matched-parenthesis T-reachability/circularity

• Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity– Single-entry/multi-exit [or multi-entry/single-exit]– Deterministic, multi-entry/multi-exit

T-Cyclicity inHierarchical Kripke Structures

SN/SX SN/MX MN/SX MN/MXnon-rec: O(|k|) non-rec: O(|k|) ? ?rec: O(|k|3) rec: ?

SN/SX SN/MX MN/SX MN/MXO(|k|) O(|k|) O(|k|) O(|k|3)

O(|k||t|) [lin rec] O(|k|) [det]

Recursive HFSMs: Data Complexity

SN/SX SN/MX MN/SX MN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ?

rec: P-time rec: ?

CTL O(|k|) bad ? badCTL* O(|k|2) [L2] bad ? bad

Recursive HFSMs: Data Complexity

SN/SX SN/MX MN/SX MN/MXLTL O(|k|) O(|k|) O(|k|) O(|k|3)

O(|k||t|) [lin rec] O(|k|) [det]

CTL O(|k|) bad O(|k|) badCTL* O(|k|) bad O(|k|) bad

Not Dual Problems!

CFL-Reachability: Scope of Applicability

• Static analysis– Slicing, DFA, structure-transmitted dep.,

points-to analysis

• Verification– Security of crypto-based protocols for

distributed systems [Dolev, Even, & Karp 83]– Model-checking recursive HFSMs

• Formal-language theory– CF-, 2DPDA-, 2NPDA-recognition– Attribute-grammar analysis

CFL-Reachability: Benefits• Algorithms

– Exhaustive & demand

• Complexity– Linear-time and cubic-time algorithms– PTIME-completeness– Variants that are undecidable

• Complementary to– Equations– Set constraints– Types– . . .

Most Significant Contributions: 1987-2000

• Asymptotically fastest algorithms– Interprocedural slicing– Interprocedural dataflow analysis

• Demand algorithms– Interprocedural dataflow analysis [CC94,FSE95]– All “appropriate” demands beats exhaustive

• Tool for slicing and browsing ANSI C– Slices programs as large as 75,000 lines– University research distribution– Commercial product: CodeSurfer

(GrammaTech, Inc.)

Most Significant Contributions: 1987-2000

• Unifying conceptual model– [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88],

[Callahan 88], [Horwitz,Reps,&Binkley 88], . . .

• Identifies fundamental bottlenecks– Cubic-time “barrier”– Litmus test: quadratic-time algorithm?!– PTIME-complete limits to parallelizability

• Existence proofs for new algorithms– Demand algorithm for set constraints– Demand algorithm for points-to analysis

References

• Papers by Reps and collaborators:– http://www.cs.wisc.edu/~reps/

• CFL-reachability– Yannakakis, M., Graph-theoretic methods in

database theory, PODS 90.– Reps, T., Program analysis via graph

reachability, Inf. and Softw. Tech. 98.

References• Slicing, chopping, etc.

– Horwitz, Reps, & Binkley, TOPLAS 90– Reps, Horwitz, Sagiv, & Rosay, FSE 94– Reps & Rosay, FSE 95

• Dataflow analysis– Reps, Horwitz, & Sagiv, POPL 95– Horwitz, Reps, & Sagiv, FSE 95, TR-1283

• Structure dependences; set constraints– Reps, PEPM 95– Melski & Reps, Theor. Comp. Sci. 00

References• Complexity

– Undecidability: Reps, TOPLAS 00?– PTIME-completeness: Reps, Acta Inf. 96.

• Verification– Dolev, Even, & Karp, Inf & Control 82.– Benedikt, Godefroid, & Reps, In prep.

• Beyond CFL-reachability– Sagiv, Reps, Horwitz, Theor. Comp. Sci 96– Melski & Reps, CC 99, TR-1382

top related