matematiˇckifakultet,univerzitetubeogradu · nedostupne/nedostiˇzneputanje 1.4...

20
Verifikacija softvera Simboliˇ cko izvrˇ savanje (prvi deo) Milena Vujoˇ sevi´ c Janiˇ ci´ c Matematiˇ cki fakultet, Univerzitet u Beogradu Sadrˇ zaj 1 Uvod 1 1.1 Simboliˇ cko izvrˇ savanje kroz primer ................. 1 1.2 Istorija, alati, stablo izvrˇ savanja ................... 3 1.3 Primene simboliˇ ckog izvrˇ savanja .................. 6 1.4 Izazovi simboliˇ ckog izvrˇ savanja ................... 8 2 Konkoliˇ cko izvrˇ savanje i principi dizajna 11 2.1 Dinamiˇ cko simboliˇ cko izvrˇ savanje .................. 12 2.2 Selektivno simboliˇ cko izvrˇ savanje .................. 14 2.3 Principi dizajna ............................ 15 3 Strategije obilaska puteva 17 3.1 Naivni pristupi, random strategija ................. 17 3.2 Izvrˇ savanje vo deno pokrivenoˇ cu koda ............... 18 3.3 Razne strategije ............................ 19 3.4 Izvrˇ savanje unazad .......................... 19 4 Literatura 20 1 Uvod 1.1 Simboliˇ cko izvrˇ savanje kroz primer Konkretno izvrˇ savanje

Upload: others

Post on 09-Sep-2019

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Verifikacija softvera— Simbolicko izvrsavanje (prvi deo) —

Milena Vujosevic Janicic

Matematicki fakultet, Univerzitet u Beogradu

Sadrzaj1 Uvod 1

1.1 Simbolicko izvrsavanje kroz primer . . . . . . . . . . . . . . . . . 11.2 Istorija, alati, stablo izvrsavanja . . . . . . . . . . . . . . . . . . . 31.3 Primene simbolickog izvrsavanja . . . . . . . . . . . . . . . . . . 61.4 Izazovi simbolickog izvrsavanja . . . . . . . . . . . . . . . . . . . 8

2 Konkolicko izvrsavanje i principi dizajna 112.1 Dinamicko simbolicko izvrsavanje . . . . . . . . . . . . . . . . . . 122.2 Selektivno simbolicko izvrsavanje . . . . . . . . . . . . . . . . . . 142.3 Principi dizajna . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Strategije obilaska puteva 173.1 Naivni pristupi, random strategija . . . . . . . . . . . . . . . . . 173.2 Izvrsavanje vodeno pokrivenoscu koda . . . . . . . . . . . . . . . 183.3 Razne strategije . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Izvrsavanje unazad . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Literatura 20

1 Uvod

1.1 Simbolicko izvrsavanje kroz primerKonkretno izvrsavanje

Page 2: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Simbolicko izvrsavanje

Simbolicko izvrsavanje

Simbolicko izvrsavanje

Neformalno...

2

Page 3: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

• Izvrsavamo program sa simbolima, tj pratimo simbolicka stanja umestokonkretnih ulaza

• Izvrsavamo puno putanja simultano, kada izvrsavanje neke putanje diver-gira, pravimo nove putanje i dodajemo uslove nad simbolickim vredno-stima

• Kada izvrsavamo jednu putanju zapravo simuliramo veliki broj testova sobzirom da razmatramo sve ulaze koji prolaze kroz tu istu putanju

1.2 Istorija, alati, stablo izvrsavanjaPoreklo ideje...

Tehnika nastala jos 70tih godina proslog vekaNaucni rad koji je najvise citiran: James C. King. 1976. Symbolic exe-cution and program testing. Commun. ACM 19, 7 (July 1976), 385-394. DOI=http://dx.doi.org/10.1145/360248.360252 https://yurichev.com/mirrors/king76symbolicexecution.pdf

Poreklo ideje...

Ima i drugih slicnih radova iz tog periodaRobert S. Boyer, Bernard Elspas, and Karl N. Levitt. 1975. SELECT — aformal system for testing and debugging programs by symbolic exe-cution. In Proceedings of the international conference on Reliable software.ACM, New York, NY, USA, 234-245. DOI=http://dx.doi.org/10.1145/800027.808445

Lori A. Clarke. 1976. A program testing system. In Proceedings ofthe 1976 annual conference (ACM ’76). ACM, New York, NY, USA, 488-491.DOI=http://dx.doi.org/10.1145/800191.805647

Leon J. Osterweil and Lloyd D. Fosdick. 1976. Program testing tech-niques using simulated execution. In Proceedings of the 4th symposiumon Simulation of computer systems (ANSS ’76), Harold Joseph Highland (Ed.).IEEE Press, Piscataway, NJ, USA, 171-177.

Izazovi

Tek od 2005 — prakticna upotreba simbolickog izvrsavanjaU trenutku nastanka ideje, nije bilo jasno kako resiti osnovne probleme koji suse odmah javili. Proboj su napravili alati

• DART Godefroid and Sen, PLDI 2005 (uvodenje dinamickog izvrsavanjau simbolicko izvrsavanje)

• EXE Cadar, Ganesh, Pawlowski, Dill, and Engler, CCS 2006 (STP: podrskaza teoriju nizova)

3

Page 4: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Alati

Alati

• KLEE (Stanford) — Open source, runs on top of LLVM, Has found lotsof problems in open-source software

• SAGE — Microsoft internal tool, Symbolic execution to find bugs in fileparsers - E.g., JPEG, DOCX, PPT, etc.

• Cloud9 parallel symbolic execution, also supports threads

• Pex symbolic execution for .NET

• CUTE (UC Berkeley) and jCUTE (symbolic execution for Java)

• Java PathFinder (NASA) - symbolic execution

• S2E (EPFL) — LLVM based platform

• SymDroid - symbolic execution on Dalvik Bytecode

• Kleenet - testing interaction protocols for sensor network

Alati

Simbolicko izvrsavanje

Stablo izvrsavanja

• Izvrsavanje programa nad simbolickim vrednostima.

• Simbolicka stanja preslikavaju promenljive u simbolicke vrednosti.

• Uslov putanje (path condition) je formula (bez kvantifikatora) nad sim-bolickim ulazima koja sadrzi sve odluke koje su do te prilike donete

• Sve putanje programa formiraju stablo izvrsavanja

4

Page 5: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Simbolicko stablo izvrsavanja

Simbolicko stablo izvrsavanja

Primer

Simbolicko stablo izvrsavanja

5

Page 6: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

1.3 Primene simbolickog izvrsavanjaSimbolicko izvrsavanje

Primena simbolickog izvrsavanja

• Pronalazenje gresaka

– In Microsoft, 30% of bugs were discovered by symbolic executionduring the development of Windows 7, (these were bugs that otherprogram analyses and blackbox testing techniques missed)

– Symbolic execution is the key technique used in DARPA1 CyberGrand Challenge2.

• Generisanje test primera

• Otkrivanje nedostiznih putanja

Simbolicko izvrsavanje

Primena simbolickog izvrsavanja

• Proving two code segments are equivalent (Code Hunt https://www.codehunt.com/)

• Advanced applications:

– Generating program invariants

– Program repair

– Debugging

1Defense Advanced Research Projects Agency2A two-year competition seeking to create automatic systems for vulnerability detection,

exploitation, and patching in near real-time

6

Page 7: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Pronalazenje gresaka

Generisanje test primera

Nedostupne/nedostizne putanje

7

Page 8: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Nedostupne/nedostizne putanje

1.4 Izazovi simbolickog izvrsavanjaTeorija i praksa

Teorija i praksa simbolickog izvrsavanjaA symbolic execution of a program can generate – in theory – all possible con-trol flow paths that the program could take during its concrete executions onspecific inputs. While modeling all possible runs allows for very in-teresting analyses, it is typically unfeasible in practice, especially onreal-world software. Indeed, complex applications are often built on top ofvery sophisticated software stacks. Implementing a symbolic execution engineable to statically analyze the whole stack can be rather challenging given thedifficulty in accurately evaluating any possible side effect during execution. Se-veral problems arise in this context, which can hardly be faced following thepurely symbolic approach.

8

Page 9: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Izazovi simbolickog izvrsavanja

Eksplozija stanja

• State space explosion: how does symbolic execution deal with path explo-sion? Language constructs such as loops might exponentially increase thenumber of execution states. It is thus unlikely that a symbolic executionengine can exhaustively explore all the possible states within a reasonableamount of time.

Path Explosion

Izazovi simbolickog izvrsavanja

Modelovanje hipa i rezonovanje o pokazivacima

• Memory: how does the symbolic engine handle pointers, arrays, or othercomplex objects? Code manipulating pointers and data structures may

9

Page 10: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

give rise not only to symbolic stored data, but also to addresses beingdescribed by symbolic expressions.

Izazovi simbolickog izvrsavanja

Modelovanje okoline (bibliotecke funkcije, sistemski pozivi)

• Environment and third-party components: how does the engine handleinteractions across the software stack? Calls to library and system codecan cause side-effects, e.g., the creation of a file, that could later affect theexecution and must be accounted for. However, evaluating any possibleinteraction outcome may be unfeasible.

Izazovi simbolickog izvrsavanja

Efikasni resavaci — SMT

• Constraint solving: what can a constraint solver do in practice? SMTsolvers can scale to complex combinations of constraints over hundreds ofvariables. However, constructs such as non-linear arithmetic pose a majorobstacle to efficiency.

Unutrasnjost alata sa simbolicko izvrsavanje

Unutrasnjost alata sa simbolicko izvrsavanje

10

Page 11: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Izazovi simbolickog izvrsavanja

Binarni kod

• Binary code: what issues can arise when symbolically executing binarycode? In some scenarios binary code is the only available representationof a program. However, having the source code of an application canmake symbolic execution significantly easier, as it can exploit high-levelproperties (e.g., object shapes) that can be inferred statically by analyzingthe source code.

Izazovi simbolickog izvrsavanja

Odluke se prave u zavisnosti od konteksta upotrebeDepending on the specific context in which symbolic execution is used, differentchoices and assumptions are made to address the highlighted questions. Altho-ugh these choices typically affect soundness or completeness, in several scenariosa partial exploration of the space of possible execution states may be sufficientto achieve the goal (e.g., identifying a crashing input for an application) withina limited time budget.

2 Konkolicko izvrsavanje i principi dizajnaKonkretno i simbolicko izvrsavanje

Prakticni problemi

• Exhaustive exploration of external library calls may lead to an exponentialexplosion of states, preventing the analysis from reaching interesting codeportions.

• Calls to external third-party components may not be traceable by theexecutor.

11

Page 12: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

• Symbolic engines continuously invoke SMT solvers during the analysis.The time spent in constraint solving is one of the main performance barri-ers for an engine, and programs may yield constraints that even powerfulsolvers cannot handle well.

Konkretno i simbolicko izvrsavanje

Konkolicko izvrsavanje (konkretno+simbolicko)A fundamental idea to cope with the aforementioned issues and to make sym-bolic execution feasible in practice is to mix concrete and symbolic execution,called concolic execution.

Vrste konkolickog izvrsavanja

• Dinamicko simbolicko izvrsavanje

• Selektivno simbolicko izvrsavanje

2.1 Dinamicko simbolicko izvrsavanjeDinamicko simbolicko izvrsavanje

Simbolicko izvrsavanje vodeno konkretnim vrednostimaIn addition to the symbolic store and the path constraints, the execution enginemaintains a concrete store. After choosing an arbitrary input to begin with, itexecutes the program both concretely and symbolically by simultaneously upda-ting the two stores and the path constraints. Whenever the concrete executiontakes a branch, the symbolic execution is directed toward the same branch andthe constraints extracted from the branch condition are added to the currentset of path constraints. In short, the symbolic execution is driven by aspecific concrete execution. As a consequence, the symbolic engine doesnot need to invoke the constraint solver to decide whether a branch conditionis (un)satisfiable: this is directly tested by the concrete execution.

Dinamicko simbolicko izvrsavanje

Simbolicko izvrsavanje vodeno konkretnim vrednostimaIn order to explore different paths, the path conditions given by one or morebranches can be negated and the SMT solver invoked to find a satisfying assign-ment for the new constraints, i.e., to generate a new input. This strategy canbe repeated as much as needed to achieve the desired coverage.

Pogledati na slajdovima animaciju koja kroz primer objasnjavadinamicko simbolicko izvrsavanje.

Izbor uslova za negiranje?

Tehnike pretrageAlthough dynamic symbolic execution uses concrete inputs to drive the symbolicexecution toward a specific path, it still needs to pick a branch to negate whe-never a new path has to be explored. Notice also that each concrete execution

12

Page 13: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

may add new branches that will have to be visited. Since the set of non-takenbranches across all performed concrete executions can be very large, adoptingeffective search heuristics can play a crucial role.

• DART — DFS (depth-first search strategy)

• SAGE — generacijska pretraga

• ...

Izbor pocetnih vrednosti

Uticaj na pretraguSince the state space is only partially explored, the initial input plays a crucialrole in the effectiveness of the overall approach. The importance of the first inputis similar to what happens in traditional black-box fuzzing and, for this reason,symbolic engines such as SAGE are often referred to as white-box fuzzers.

Kako pratiti funkcijske pozive za koje kod nije dostupan?

Kako pratiti funkcijske pozive za koje kod nije dostupan?

Uticaj 4aConsider function foo in Figure 4a and suppose that bar is not symbolicallytracked by the concolic engine (e.g., it could be provided by a third-party com-ponent, written in a different language, or analyzed following a black-box ap-proach). Assuming that x = 1 and y = 2 are randomly chosen as the initialinput parameters, the concolic engine executes bar (which returns a = 0) andskips the branch that would trigger the error statement. At the same time,the symbolic execution tracks the path constraint αy ≥ 0 inside function foo.Notice that branch conditions in function bar are not known to the engine.

Kako pratiti funkcijske pozive za koje kod nije dostupan?

Uticaj 4aTo explore the alternative path, the engine negates the path constraint of thebranch in foo, generating inputs, such as x = 1 and y = -4, that actually drivethe concrete execution to the alternative path. With this approach, the enginecan explore both paths in foo even if bar is not symbolically tracked.

13

Page 14: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Kako pratiti funkcijske pozive za koje kod nije dostupan?

Uticaj 4bA variant of the previous code is shown in Figure 4b, where function qux –differently from foo – takes a single input parameter but checks the result of barin the branch condition. Although the engine can track the path constraint inthe branch condition tested inside qux, there is not guarantee that an input ableto drive the execution toward the alternative path is generated: the relationshipbetween a and x is not known to the concolic engine, as bar is not symbolicallytracked. In this case, the engine could re-run the code using a different randominput, but in the end it could fail to explore one interesting path in foo.

Kako pratiti funkcijske pozive za koje kod nije dostupan?

Uticaj 4cA related issue is presented by Figure 4c. Function baz invokes the externalfunction abs, which simply computes the absolute value of a number. Choosingx = 1 as the initial concrete value, the concrete execution does not trigger theerror statement, but the concolic engine tracks the path constraint αx ≥ 0 dueto the branch in baz, trying to generate a new input by negating it. Howeverthe new input, e.g., x = -1, does not trigger the error statement due to the(untracked) side effects of abs. In this case, after generating a new input theengine detects a path divergence: a concrete execution that does not followthe predicted path. Interestingly, in this example no input could actually triggerthe error, but the engine is not able to detect this property.

Posledice

Simbolicko izvrsavanje moze da ima lazne negativne rezultateAs shown by the example, false negatives (i.e., missed paths) and path divergen-ces are notable downsides of dynamic symbolic execution. Dynamic symbolicexecution trades soundness for performance and implementation effort: falsenegatives are possible, because some program executions - and therefore pos-sible erroneous behaviours - may be missed, leading to a complete, but under-approximate form of program analysis.

Osnovni pojmovi — ispitujemo ispravnsotSoundness — sve greske ce biti pronadene Completeness — ako se za nestotvrdi da je greska, to ce stvarno i biti greska

2.2 Selektivno simbolicko izvrsavanjeSelektivno simbolicko izvrsavanje S2E

Drugaciji pristup mesanju simbolickog i konkretnog izvrsavanjaA different approach to mix symbolic and concrete execution is based on theobservation that one might want to explore only some components of a softwarestack in full, not caring about others. Selective symbolic execution ca-refully interleaves concrete and symbolic execution, while keeping theoverall exploration meaningful.

Suppose a function A calls a function B and the execution mode changesat the call site.

14

Page 15: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Od konkretnog prema simbolickom i nazad

Konkretno izvsavanje A, simbolicko BThe arguments of B are made symbolic and B is explored symbolically in full.B is also executed concretely and its concrete result is returned to A. After that,A resumes concretely.

Od simbolickog prema konkretnom i nazad

Simbolicko izvrsavanje A, konkretno BThe arguments of B are concretized, B is executed concretely, and executionresumes symbolically in A.

This may impact both soundness and completeness of the analysis.

Completeness

Lazni pozitivni rezultatiDa bi bili sigurni da simbolicko izvrsavanje preskace sve putanje koje nisu do-stizne u skladu sa izvrsenom konkretizacijom, tj da ne bi imali lazno pozitivnerezultate, potrebno je skupiti sva ogranicenja putanje koja je izvrsena u skladusa odgovarajucom konkretizacijom, sve bocne efekte koje je B napravila kao ipovratnu vrednost koju B proizvodi.

Soundness

Lazno negativni rezultatiKonkretizacija moze da uzrokuje da se u simbolickom izvrsavanju preskoce granekoje su dostizne nakon sto se vratimo u A, sto moze da dovede do lazno nega-tivnih rezultata. Da bi to izbegli, skupljena ogranicenja se markiraju kao softogranicenja: kada god neka grana, nakon vracanja u A bude obelezena kaonedostizna zbog soft ulsova, izvrsavanje radi bektreking i bira na drugi nacinargumente za B. Za vodenje nove konkretizacije argumenata za B, u tom slucajuse koriste uslovi grana koji su skupljeni za B i biraju se one konkretne vrednostikoje omogucavaju drugacije konkretno izvrsavanje kroz B.

2.3 Principi dizajnaPrincipi dizajna

Napredak, ponavljanje posla i ponovno koriscenje rezultata

• Progress: the executor should be able to proceed for an arbitrarily longtime without exceeding the given resources. Memory consumption can beespecially critical, due to the potentially gargantuan number of distinctcontrol flow paths.

• Work repetition: no execution work should be repeated, avoiding torestart a program several times from its very beginning in order to analyzedifferent paths that might have a common prefix.

• Analysis reuse: analysis results from previous runs should be reused asmuch as possible. In particular, costly invocations to the SMT solver onpreviously solved path constraints should be avoided.

15

Page 16: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Principi dizajna

Kako izabrati prioritete u dizajnu?

• Online izvrsavanje

• Offline izvrsavanje

• Hibridno izvrsavanje (kombinacija prethodna dva)

Izbori dizajna

Online izvrsavanje

• Symbolic executors that attempt to execute multiple paths simultaneouslyin a single run – also called online — clone the execution state at eachinput-dependent branch (KLEE, AEG, S2E )

• These engines never re-execute previous instructions, thus avoiding workrepetition.

• However, many active states need to be kept in memory and memoryconsumption can be large, possibly hindering progress.

Izbori dizajna

Offline izvrsavanje

• On the other side, work can be largely repeated, since each run usuallyrestarts the execution of the program from the very beginning.

• In a typical implementation of offline executors, runs are concrete andrequire an input seed: the program is first executed concretely, a trace ofinstructions is recorded, and the recorded trace is then executed symboli-cally.

Izbori dizajna

Offline izvrsavanje

• On the other side, work can be largely repeated, since each run usuallyrestarts the execution of the program from the very beginning.

• In a typical implementation of offline executors, runs are concrete andrequire an input seed: the program is first executed concretely, a trace ofinstructions is recorded, and the recorded trace is then executed symboli-cally.

16

Page 17: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

3 Strategije obilaska putevaStrategije obilaska puteva

Heuristike vodene ciljevimaSince enumerating all paths of a program can be prohibitively expensive, inmany software engineering activities related to testing and debugging the searchis prioritized by looking at the most promising paths first. Among severalstrategies for selecting the next path to be explored, we now briefly overviewsome of the most effective ones. We remark that path selection heuristics areoften tailored to help the symbolic engine achieve specific goals (e.g., overflowdetection). Finding a universally optimal strategy remains an openproblem.

3.1 Naivni pristupi, random strategijaNajcesce strategije: DFS i BFS

Naivne tehnike: DFS i BFSTehnike zasnovane na strukturi koda

DFS

• Depth-first search (DFS) — expands a path as much as possible beforebacktracking to the deepest unexplored branch.

• DFS is often adopted when memory usage is at a premium, but is ham-pered by paths containing loops and recursive calls.

Najcesce strategije: DFS i BFS

BFS

• Breadth-first search (BFS)— expands all paths in parallel.

• In spite of the higher memory pressure and of the long time required tocomplete the exploration of specific paths, some tools resort to BFS, whichallows the engine to quickly explore diverse paths detecting interestingbehaviors early.

• On the other hand, if the ultimate goal requires to fully terminate theexploration of one or more paths, BFS may take a very long time.

Random strategija

Random path selection

• Kako se sprovodi random pretraga?

– Ideja 1: Izaberi sledecu putanju za istrazivanje random metodom

– Ideja 2: Random metodom restartuj pretragu ukoliko se nista novone desava vec neko vreme

17

Page 18: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

– Ideja 3: Kada imamo da istrazimo dve jednako prioritetne putanje,izaberi sledecu random

– ...

• Problem sa reprodukovanjem, pesudo-random se koristi i cuva se seed

Random strategija

Random path selection

• KLEE assigns probabilities to paths based on their length and on thebranch arity: it favors paths that have been explored fewer times, preven-ting starvation caused by loops and other path explosion factors.

3.2 Izvrsavanje vodeno pokrivenoscu kodaIzvrsavanje vodeno pokrivenoscu koda

Maksimizovati pokrivenostIzaberi putanju koja ce najverovatnije da izvrsi neku novu instrukciju.

• Pokusaj da posetis instrukcije koje ranije nisu bile izvrsavane.

• Ukoliko takve putanje nema, izaberi onu putanju u kojoj su instrukcijeizvrsavane najmanji broj puta.

Dobra osobina: greske su cesto u delovima programa koji se retko izvrsavaju, aova strategija pokusava da dopre svuda.

Izvrsavanje vodeno pokrivenoscu koda — stanja

Coverage optimize search......, discussed in KLEE, computes for each state a weight, which is later usedto randomly select states. The weight is obtained by considering how far thenearest uncovered instruction is, whether new code was recently covered by thestate, and the state’s call stack.

Izvrsavanje vodeno pokrivenoscu koda — putanje

Subpath-guided search ...... attempts to explore less traveled parts of a program by selecting the subpathof the control flow graph that has been explored fewer times. This is achievedby maintaining a frequency distribution of explored subpaths, where a subpathis defined as a consecutive subsequence of length n from a complete path. In-terestingly, the value n plays a crucial role with respect to the code coverageachieved by a symbolic engine using this heuristic and no specific value has beenshown to be universally optimal.

18

Page 19: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

3.3 Razne strategijeShortest-distance symbolic execution

Shortest-distance symbolic executionShortest-distance symbolic execution does not target coverage, but aims atidentifying program inputs that trigger the execution of a specificpoint in a program. The heuristic is based however, as in coverage-basedstrategies, on a metric for evaluating the shortest distance to the target point.This is computed as the length of the shortest path in the inter-proceduralcontrol-flow graph, and paths with the shortest distance are prioritized by theengine.

Generacijska pretraga (SAGE)

Generacijska pretragaHibrid DFSa i izvrsavanja vodenog pokrivenoscu koda

• Generacija 0: Izvrsi random putanju do kraja

• Generacija 1: Uzmi sve putanje iz generacije 0, negiraj jedan uslov takoda vodi do novog prefiksa putanje, nadji resenje te putanje i onda je izvrsi

• ...

• Generacija N: slicno, samo sto se grananje vrsi u odnosu na putanju ge-neracije N-1 (za biranje putanje se koristi heuristika pokrivenosti koda)

Kombinovana strategija

Pretraga istovremeno sa razlicitim algoritmima

• Izvrsavaj vise razlicitih algoritama istovremeno i alterniraj izmedu njih

• Zavisi od uslova koji su potrebni da se pronade greska u kodu, ponasase kao najbolji od njih, sa konstantnim faktorom izgubljenog vremena imemorije sa svim ostalim algoritmima

• Mogu se koristiti razliciti algoritmi da bi se doslo do razlicitih delovaprograma

3.4 Izvrsavanje unazadSymbolic Backward Execution (SBE)

Izvrsavanje unazad

• Simbolicko izvrsavanje unazad je varijanta simbolickog izvrsavanja u ko-jem izvrsavanje pocinje od ciljne tacke prema tacki ulaza u program, tjanaliza se izvrsava u obrnutom smeru.

• Osnovni cilj ovog pristupa je da se napravi test primer koji uzrokujeizvrsavanje specificne linije koda (obicno nekog assert-a ili throw naredbe)

• Ovo je takode veoma korisno za debagovanje ili regresiono testiranje

19

Page 20: Matematiˇckifakultet,UniverzitetuBeogradu · Nedostupne/nedostiˇzneputanje 1.4 Izazovisimboliˇckogizvrˇsavanja Teorijaipraksa Teorijaipraksasimboliˇckogizvrˇsavanja Asymbolicexecutionofaprogramcangenerate

Izvrsavanje unazad (SBE)

Slicnost sa obicnim simbolickim izvrsavanjem

• Kako izvrsavanje pocinje od ciljne linije koda, ogranicenja putanje se sku-pljaju po grananjima unazad.

• Vise putanja se istrazuje u jednom trenutku i kao kod obicnog simbolickogizvrsavanja, putanje se povremeno proveravaju da li su dostizne.

• Ako putanja nije dostizna, ona se odbacuje i radi se backtracking.

Izvrsavanje unazad (CCSBE)

Call-chain backward symbolic execution (CCBSE)

• Tehnika zapocinje utvrdivanjem validne putanje u okviru funkcije gde jeciljna linija locirana.

• Kada se putanja pronade, pomeramo se na funkciju pozivaoca ove funk-cije i pokusavamo da rekonstruisemo validnu putanju od njenog ulaza dopoziva funkcije u kojoj je ciljna linija koda.

• Proces se rekurzivno nastavlja dok ne dodemo do main funkcije

• Osnovna razlika izmedju SBE i CCSBE je sto se u okviru svake funkcijeza CCSBE izvrsava obicno simbolicko izvrsavanje dok se za SBE izvrsavaunazad.

Izvrsavanje unazad

Ogranicenja

• Da bi izvrsavanje unazad moglo da se primeni, potrebno je da postoji naraspolaganju inter-proceduralni CFG (control-flow graph) koji obezbedujetok kontrole za ceo program i omogucava da se odrede mesta poziva svihfunkcija koje ucestvuju u istrazivanju.

• Nazalost, konstruisanje takvog grafa cesto je vrlo slozen posao u praksi.

• Dodatno, svaka funkcija moze biti pozvana sa vise mesta u kodu sto do-datno otezava (usporava) pretragu.

4 LiteraturaLiteratura

LiteraturaTkest je zasnovan na radu A Survey of Symbolic Execution Techniquesautori: Roberto Baldoni, Emilio Coppa, Daniele Cono D’Elia, Camil Demetre-scu, and Irene Finocchi https://arxiv.org/abs/1610.00502

20