reverse-engineering instruction encodings

23
Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah

Upload: yoko

Post on 20-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Reverse-Engineering Instruction Encodings. Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah. What’s the Problem?. Dynamic code generation, JIT compilation Emit instructions quickly Therefore, avoid assembler - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reverse-Engineering Instruction Encodings

Reverse-EngineeringInstruction Encodings

Wilson Hsieh, University of Utah

Dawson Engler, Stanford University

Godmar Back, University of Utah

Page 2: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

What’s the Problem?

Dynamic code generation, JIT compilation– Emit instructions quickly

– Therefore, avoid assembler

Need to know how to produce binary instructions Want to express instructions in assembly

“Generate add %l1, %l2, %l1 for SPARC”

Page 3: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

What Do We Do?

How can I get the following mapping:assembly instruction binary format

That mapping exists in the assembler already!

assemblerassembly

instructionbinary

instruction

So let’s reverse-engineer it out of the assembler.

Page 4: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

DERIVE Tool Chain

instructiondescription

DERIVE

assembler

encodingdescription

code emitter

JIT compiler

code emittergenerator

debuggerdisassembler

instructiondescription

Page 5: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Instruction Descriptions

/* SPARC fragment */

iregs = ( %g0, %g1, %g2, ..., %i6, %i7 );

and, andcc, andn, ... &op& r_1:iregs, r_2:iregs, r_dest:iregs | &op& r_1:iregs, imm, r_dest:iregs ;

ba, bn, bne, … &op& &label& | &op&”,a” &label& ;

Page 6: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

DERIVE Tool Chain

instructiondescription

DERIVE

assembler

encodingdescription

code emitter

JIT compiler

code emittergenerator

debuggerdisassembler

Page 7: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Encoding Descriptions

/* MIPS breakpoint instruction */

{ “break”, “&op& imm”, 1, /* operand */ 4, /* bytes */ ... { 0xd, 0x0, 0x0, 0x0, }, /* opcode information */ { /* operand information */ { “imm”, /* name */ IMMED, /* an immediate */ IDENT, /* encoded value = input value */ 0, /* lowest value */ 10, /* length */

... 16, /* bit offset */ I_UNSIGNED, /* unsigned field */ ... }, } }

Page 8: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

DERIVE Tool Chain

instructiondescription

DERIVE

assembler

encodingdescription

code emitter

JIT compiler

code emittergenerator

debuggerdisassembler

Page 9: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Code Emitters

/* x86 addl instruction */

#define E_addl_rr_1(_code, rf, rt) do {\ register unsigned short _0 = (0xc001\ | ((((rf)) << 11))\ | (((rt)) << 8)));\ *(unsigned short*)((char*) _code) = _0;\ _code = (void *)((char *) _code + 2);\} while (0)

/* emit “addl %ecx, %ebx” in code_buffer */E_addl_rr_1(code_buffer, REGecx, REGebx);

Page 10: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Instruction Model

Opcode Registers (names)

– Register sets

– Cache prefetch hints on MIPS

– Address scale on x86

Immediates (integers)– Not registers

Labels (jump targets)– Absolute jumps

– Relative jumps

OPCODE

ARG

1

ARG

2

ARG

3

0 31

Page 11: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Overall Strategy

Solve for one field at a time– Hold other fields fixed and vary the desired field

– Use randomization when necessary to find legal values

Anything that is not in a field is the opcode

Page 12: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Intuition Behind DERIVE

Assembly instruction Binary encoding

and %g7, %g6, %g0; 0x8009 0xc006 and %g7, %g6, %g1; 0x8209 0xc006 and %g7, %g6, %g2; 0x8409 0xc006 and %g7, %g6, %g3; 0x8609 0xc006and %g7, %g6, %g4; 0x8809 0xc006

and %g7, %g6, %g5; 0x8a09 0xc006 and %g7, %g6, %g6; 0x8c09 0xc006 and %g7, %g6, %g7; 0x8e09 0xc006 and %g7, %g6, %o0; 0x9009 0xc006 and %g7, %g6, %o1; 0x9209 0xc006 and %g7, %g6, %o2; 0x9409 0xc006 and %g7, %g6, %o3; 0x9609 0xc006 and %g7, %g6, %o4; 0x9809 0xc006

Page 13: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

DERIVE Structure

Field Type Solver

register fields register solver

absolute jump targets immediate solver

immediate fields immediate solver

relative jump targets jump solver

Page 14: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Register Solver

Primary assumptions (for purposes of the talk):– Register fields are independent

– All register values are legal

Enumerate registers for one field at a time– Hold other fields constant

– Solve each field separately

Example: 3 register fields, 5 bits per field– 2^5 * 3 = 32 * 3 = 96 combinations

Page 15: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Intuition Behind DERIVE

Assembly instruction Binary encoding

and %g7, %g6, %g0; 0x8009 0xc006 and %g7, %g6, %g1; 0x8209 0xc006 and %g7, %g6, %g2; 0x8409 0xc006 and %g7, %g6, %g3; 0x8609 0xc006and %g7, %g6, %g4; 0x8809 0xc006

and %g7, %g6, %g5; 0x8a09 0xc006 and %g7, %g6, %g6; 0x8c09 0xc006 and %g7, %g6, %g7; 0x8e09 0xc006 and %g7, %g6, %o0; 0x9009 0xc006 and %g7, %g6, %o1; 0x9209 0xc006 and %g7, %g6, %o2; 0x9409 0xc006 and %g7, %g6, %o3; 0x9609 0xc006 and %g7, %g6, %o4; 0x9809 0xc006

Page 16: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Immediate Solver

Primary assumptions:– Immediate field is a single range of bits in instruction

Explore each bit size to find encoding of one field– Values of 1, 2, 4, 8, 16, ...

– Again, hold other fields constant

Example: 10-bit immediate field– 10 combinations

Page 17: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Jump Solver

Primary assumptions:– Label field is a single range of bits

Emit jumps to different offsets– Find where label goes for encoding of “0”

– Find smallest jump size

– Find high bit by emitting a negative-valued jump

Page 18: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Solving Time

Processor Run Time (minutes)

Description (lines)

Alpha 6.3 104

ARM ~43. 227

MIPS 2.5 81

PowerPC 4.8 186

SPARC 4.8 97

x86 ~240. 221

x86-kaffe 4.9 106

Page 19: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Instruction Emitter Generator

Reads in DERIVE-generated specifications Produces C macros

– Can generate runtime checks

– Debugging support

– Handles multiple instruction encodings

– “Linkage” macros for backpatching

Used to retarget Kaffe (publicly available JVM) on x86– Reduced backend description from 20841267 lines (40%)

Page 20: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Extensions

Can handle instructions that take a subset of registers– SPARC double-word loads

Special encodings that are register-dependent– %eax on x86

Can handle simple transformations– Low bits dropped off of jump offsets

User can specify transformations– Address scaling on x86

User can specify registers that are dependent– PowerPC post-increment instructions

Page 21: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Future Work

Extending DERIVE– Fields that are broken up into multiple bit ranges

– Memoization of computations

ATOM-like tools– Reverse-engineering linkers

Page 22: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Related Work

Instruction encoding munging– NJ Toolkit [Ramsey & Fernández, USENIX 1995]

Testing assemblers– NJ Toolkit [Fernández and Ramsey, ICSE 1997]

Reverse engineering compiler technology– Retarget back-end generators [Collberg, PLDI 1997]

Page 23: Reverse-Engineering Instruction Encodings

Reverse-Engineering Instruction Encodings

USENIX ‘01

Summary

DERIVE is a cool hack, but it isn’t just a hack.– It is a useful tool.

– It is a good proof of concept.

– We did some clever tricks to build it.

http://www.cs.utah.edu/~wilson/derive.tar.gz