string analysis for binaries mihai christodorescu nicholas kidd wen-han goh university of wisconsin,...

Post on 06-Jan-2018

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Why Do We Need String Analysis? We could just use the strings program: $ strings no_msg /lib/ld-linux.so.2 libc.so.6... no msg This food has %s. $  

TRANSCRIPT

String Analysis for BinariesString Analysis for Binaries

Mihai ChristodorescuNicholas KiddNicholas KiddWen-Han Goh

University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 2

What is What is String AnalysisString Analysis??• Recovery of values a string variable

might take at a given program point.void main( void ){char * msg = “no msg”;printf( “This food has %s.\n”, msg );

}

Output: This food has no msg.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 3

Why Do We Need String Why Do We Need String Analysis?Analysis?

• We could just use the strings program:$ strings no_msg/lib/ld-linux.so.2libc.so.6......no msgThis food has %s.$

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 4

Why Perform String Analysis?Why Perform String Analysis?• Computer forensics

Given an unknown program, we want to know the files it might access,the registry keys it might get and set,the commands it might execute.

• Program verificationSQL queries, embedded scripting, …

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 5

A Complicated ExampleA Complicated Examplevoid main( void ){char buf[257];strcpy( buf, “/” );strcat( buf, “b” );strcat( buf, “i” );strcat( buf, “n” );...system( buf );

}

Running strings:/abcd...

Running a string analysis:/bin/ifconfig –a |

/bin/mail ...@...

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 6

Our ContributionsOur Contributions• Developed a string analysis for

binaries.

• Implemented x86sa, a string analyzer for Intel IA-32 binaries.

• Evaluated on both benign and malicious binaries.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 7

OutlineOutline String analysis for Java.

String analysis for x86.

Evaluation.

Applications & future work.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 8

String Analysis for JavaString Analysis for Java1.Create string flowgraph.void main( void ){

String x = “/”;x = x + “b”;x = x + “i”;x = x + “n”;...System.exec( x );

}

Christensen, Møller, Schwartzbach "Precise Analysis of String Expressions" (SAS’03)

“/” “b”

concat “i”

concat...

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 9

String Analysis for Java [2]String Analysis for Java [2]2. Create context-free grammar.3. Approximate with finite automaton.

“/” “b”

concat “i”

concat...

A1 “/” “b”A2 A1 “i”A3 A2 “n”A4 A3 “/”…

“/bin/ifconfig ...”

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 10

From Java to x86 executablesFrom Java to x86 executables

Flowgraph CFG FSAJava.class

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 11

From Java to x86 executablesFrom Java to x86 executables

Rest of this talk:Bridge the syntactic and semantic gaps between Java and assembly language.

x86executable Flowgraph CFG FSA

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 12

OutlineOutline String analysis for Java.

String analysis for x86.

Evaluation.

Applications & future work.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 13

Four Problems with AssemblyFour Problems with Assembly1. No types.

2. No high-level constructs.

3. No argument passing convention.

4. No Java string semantics.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 14

Problem 1: Problem 1: No TypesNo Types• Solution: infer types from C lib. funcs.

Assumption #1:Strings are manipulated only using string library functions.

char * strcat( char * dest, char * src )• After: “eax” points to a string.• Before: “dest” and “src” point to a

string.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 15

Problem 1: Problem 1: No Types [cont.]No Types [cont.]• Perform a backwards analysis to find

the strings:– Destination registers “kill” string type

information.– Libc string functions “gen” string type

information.– Strings at entry to CFG are constant

strings or function parameters.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 16

Problem 1: Problem 1: No Types No Types [example][example]

String variables:– after the call: { eax }– before the call: { ebx, ecx }

eax = _strcat( ebx, ecx );

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 17

Problem 2: Problem 2: Function Function ParametersParameters

• Function parameters are not explicit in x86 machine code.

mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 18

Problem 2: Problem 2: Fn. Params Fn. Params [example][example]

mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8

Stack pointer push ecx

push ebx

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 19

mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8

Problem 2: Problem 2: Function Function ParametersParameters

• Solution: Perform forwards analysis modeling x86 instructions effects on the stack.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 20

Problem 3: Problem 3: Unmodeled Unmodeled FunctionsFunctions

• String type information and stack model may be incorrect!

Assumption #2:“_cdecl” calling convention and well behaved functions

• Treat all function arguments and return values as strings.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 21

Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics

• Java strings are immutable,x86 strings are not.

String y, x=“x”;y = x;y = y + “123”;System.out.println(x);

char *y;char x[10] = “x”;y = x;y = strcat(y,”123”);printf(x);

=> “x” => “x123”

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 22

Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics

• Solution: May-Must alias analysis.

0x200: _strcat( ebx,ecx )

eax

ebx

esi

200

eax

ebx

esi

“May alias” relations:

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 23

Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics

• Solution: May-Must alias analysis.

0x200: _strcat( ebx,ecx )

eax

ebx

esi

200

eax

ebx

esi

“Must alias” relations:

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 24

Stack HeightTransformers

String TypeTransformersAlias AnalysisTransformers

UTF-8 func’s

Stack HeightTransformers

String TypeTransformersAlias AnalysisTransformers

libc char* func’s

Stack HeightTransformers

String TypeTransformersAlias AnalysisTransformers

Unicode func’s

x86sax86sa Architecture Architecture

EXE

IDA Pro Connector WPDS++

JSA

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 25

IntraIntraprocedural Analysis procedural Analysis SummarySummary

1. Recover callsite arguments.(stack-operation modeling)

2. Infer string types.(backward type analysis)

3. Discover aliases.(may-, must-alias forward analysis)

Generate the String Flow Graph for the Control Flow Graph.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 26

OutlineOutline String analysis for Java.

String analysis for x86.

Evaluation.

Applications & future work.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 27

Example 1: Example 1: simplesimplechar * s1 = "Argc has ";char * s2;char * s3 = " arguments";char * s4;switch( argc ) { case 1: s2 = "1“ ; break; case 2: s2 = "2“ ; break; default: s2 = "> 2"; break;}s4 = malloc( strlen(s1)+strlen(s2)+strlen(s3)+1 );s4[0] = 0;strcat( strcat( strcat( s4, s1 ), s2), s3 );printf( "%s\n", s4 );

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 28

Example 1: String Flow GraphExample 1: String Flow Graph“Argc has”

“ arguments”

“1”

concat

concat

concat

malloc “2” “> 2”

assign

Our result:"Argc has 1 arguments""Argc has 2 arguments""Argc has > 2 arguments"

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 29

Example 2: Example 2: cstarcstarchar * c = “c";char * s4 = malloc(101);for( int i=0; i < 100 ; i++ ) { strcat( s4, c );}printf( "%s\n", s4 );

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 30

Example 2: String Flow GraphExample 2: String Flow Graph

“c”

concat

malloc

assign

Our result: c*Correct answer: c100

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 31

Example 3: Example 3: Lion WormLion Worm• Code and String Flow Graph omitted.

• x86sa analysis results:"/sbin/ifconfig -a|/bin/mail angelz1578@usa.net"

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 32

Future Work:Future Work:InterInterprocedural Analysisprocedural Analysis

1. Inline everything and apply intra-procedural analysis.

2. “Hook” intraprocedural String Flow Graphs into a “Super String Flow Graph”.

3. Polyvariant analysis over function summaries for String Flow Graphs.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 33

Future Work:Future Work:Relax AssumptionsRelax Assumptions

• Relax the assumptions:– Strings can be manipulated in many ways.– Calling conventions can vary in a

program.

• Value Set Analysis looks promising:– Identifies “variables” based on usage

patterns.

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 34

Future Work:Future Work:More ApplicationsMore Applications

• Malicious code analysis

• Analysis of dynamic code generators:– Packed programs– Shell code generators

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 35

What to Take Away:What to Take Away:• If you have type information, AND

arguments to calls, ANDno aliasing

then string analysis is straightforward.

• String analysis on binaries performs well and provides useful results.

String Analysis for BinariesString Analysis for Binaries

Mihai ChristodorescuNicholas KiddNicholas KiddWen-Han Goh

University of Wisconsin, Madison{ mihai, kiddkidd, wen-han }@cs.wisc.edu

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 37

From Java to x86 BinariesFrom Java to x86 Binaries• Input: x86 binary file

• Java string analyzer input: Flowgraph• Output: finite automaton

?

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 38

Problem 4: Java vs. x86 Problem 4: Java vs. x86 SemanticsSemantics

• Solution: Alias analysis.0x100: mov eax,ecx

• λS.let A = aliases(ecx)S – {(eax,*)} ({eax} A)

0x200: _strcat( ebx, ecx )• λS. let A = alias(ebx)

let N = vars(A) (S (N {200})) – {(ebx,*),{(eax,*)}

{(ebx,200),(eax,200)}

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 39

Example 1: x86sa Analysis Example 1: x86sa Analysis ResultsResults

1. "Argc has 1 arguments"2. "Argc has 2 arguments"3. "Argc has > 2 arguments"

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 40

cstar’s x86sa analysis resultscstar’s x86sa analysis results• Infinitely many strings with common

prefix: “c”

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 41

Transformers – String Transformers – String InferenceInference

• 0x200: _strcat( ebx, ecx )• String inference

λd.{ d – {eax} {ebx,ecx} }

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 42

Transformers – Alias AnalysisTransformers – Alias Analysis• 0x200: _strcat( ebx, ecx )

• ebx mustlet T = aliasmust(ebx)let V = aliasmay(ebx)λd. { must(d) {(ebx, *), (eax, *)}

{(ebx, 200), (eax, 200)} -aT{(a,*)} aT{(a,200)},

(may(d) {(ebx, *), (eax, *)} aV{(a,200)})

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 43

Transformers – Alias AnalysisTransformers – Alias Analysis• 0x200: _strcat( ebx, ecx )

• ebx maylet T = aliasmust(ebx)let V = aliasmay(ebx)λd. { must(d) {(eax, *)} -aT{(a,*)}

{(ebx, 200), (eax, 200)}, {may(d) {(ebx, *), (eax, *)}

aT{(a,200),(a,old(a)} aV{(a,200)}}

top related