malware-analyse und reverse engineering€¦ · analyse des “stoned”-virus mare 08 –...
TRANSCRIPT
Malware-AnalyseundReverseEngineering
8: StatischeunddynamischeAnalysenzurMalware-Erkennung
11.5.2017Prof.Dr.MichaelEngel
Überblick
Themen:• Statische Analysen zur Malware-Erkennung• Dynamische Analysen zur Malware-Erkennung
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
2
Malware-Erkennung
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
StatischeAnalysen• Signaturen• AbstractInterpretation
Dynamische Analysen• Anomalieerkennung• Kontrollflussverfolgung
3
StatischeAnalysen
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
VorigeVorlesung• StatischeAnalysevonProgramm-QuellcodezumFinden
möglicherSicherheitslücken
Jetzt• StatischeAnalysevonBinärprogrammen,umfestzustellen,ob
einProgrammMalwareenthält• StatischeAnalyse:• InformationenüberdasVerhalteneinesProgramms
ermitteln,ohnedasProgrammselbstauszuführen
4
EinfacheStatischeAnalyse:Signaturen
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
Idee• Malwarecodebestehtausbestimmter(Maschinen-)Befehlsfolge• WenndieseinProgrammcodevorkommt⇒Malwareentdeckt• GrundlagevielereinfacherVirenscanner
Signatur• FolgevonBytesimProgrammcode• Nichtnotwendigerweiseaufeinanderfolgend• AuchkomplexereMustermöglich
• SyntaktischeMethode– keinVerständnisdesVerhaltens derMalware
5
Signaturen:Beispiel
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
„Stoned“-Virusvon1987• AngeblichvoneinemStudenteninNeuseelandentwickelt• 1989:inNeuseelandundAustralienverbreitet• 1990:VariantenaufderganzenWeltverbreitet
• Bootsektor-Virus:befälltBootsektorenvonDiskettenundFestplatten(unterMS-DOS!)• KopiertsichbeimBootenresidentindenHauptspeicher• InfiziertneueingelegteDisketten(undneuangeschlossene
Festplatten– diewarenaberdamalsnichtimBetriebwechselbar...)• Malware-Funktionalität:• Beijedem8.BootvorgangdesPC(vonbefallenerDiskette
oderFestplatte)wirdderText„YourPCisnowStoned!”ausgegeben
6
Stoned“inthewild”
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
Ein Virusvon1987macht 20Jahre später noch Ärger…
7
Dumpdes“Stoned”-Virus
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
8
Analysedes“Stoned”-Virus
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
Bootsektor=„MasterBootRecord“(MBR)• ErsterSektoreinerDiskette/FestplattebeiPCs,diemit
klassischemBIOS(nichtUEFI)arbeiten:512Bytes• WirdvonBIOS
beimStartdesPCvonDiskette/Plattegeladen
• EnthältCode,dermitHilfevonBIOS-Funktionen(SW-Interrupts)RestdesOSlädt
9
Der“Stoned”-Bootsektor
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
0x1FE/1FF:SignaturdesBootsektors(wird später eingefügt)
0x000-004:Sprungbefehl“JMP07C0:0005”(wird oftzur Erkennungeines gültigen Bootsektorsverwendet)
0x1BE-0x1FD:Partitionstabelle(wird freigelassenundspäter ergänzt)
10
Analyse:“Stoned”-Bootsektor
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
00000000 EA0500C007 jmp 07C0:0005 ; jump to set correct Code SegmentSet_Code_Segment:00000005 E99900 jmp Stoned_Start; initial address of stoned boot virus
BIOSlädt Bootsektor immer anAdresse 0x07C0:0000Virusnutzt absoluteAdresse aus,umDaten usw.zu referenzieren
11
Erkennungvon“Stoned”(1)
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
ErkennenvonBytefolge,dieeindeutigVirusidentifiziertKein geeigneter Kandidat:• jeder Bootsektor enthält diese
Bytefolge(oder eine sehr ähnliche)
http://www.stoned-vienna.com/analysis-of-stoned.html
12
Erkennungvon“Stoned”(2)
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
ErkennenvonBytefolge,dieeindeutigVirusidentifiziert
8
Figure 3: Search Pattern for Stoned virus [18]
3.2 Anomaly based virus detection
Anomaly based detection systems monitor the processes on a host machine for any
abnormal activity. If any abnormal activity is identified, the system raises an alarm signaling the
possible presence of malware [15]. In this detection technique, the system uses the collected
heuristics to categorize an activity as normal or malicious. Even though chances of false alarm
are relatively higher in this method, it is more reliable because it is also capable of detecting new
viruses. The important thing to note is that raising a false alarm is not as potential harmful as
allowing a new virus. However, these systems can be trained gradually by intruders to consider
abnormal behavior as normal. Thus, system will fail to detect the abnormal activity in such cases
[15].
3.3 Emulation based detection
The emulation based detection is an effective method where a virus is executed in a virtual
environment by emulating the instructions in the virus code. This type of detection is used to
detect polymorphic, as well as metamorphic, viruses. The virus instance can be executed in the
Bessere Signatur:• eigentlicher Code des Virus• länger, damit höhere
Wahrscheinlichkeit, dassBytefolge eindeutig ist
13
Signaturen:Probleme
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
False Positives(falscherAlarm)• FolgevonBytes„legaler“BestandteileinesProgramms• Wiezwischengutundböseunterscheiden?
False Negatives• Virusistvorhanden,wirdabernichterkannt• Ursache:kleineÄnderungdesVirus-Codes• AndereAdressen,VerwendungandererRegister,...
• PolymorpheMalware• Viren,diesichbeijederInfektionselbstverändern,um
Signaturprüfungzuentgehen
14
Signaturen:Beispielfürfalsepositives
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
„Stoned“-VirusundBitcoins• Virus-Signaturvon„Stoned“wurdeimMai2014indie
Blockchain vonBitcoineingeführt• Blockchain =Daten,keinausführbares Programm• VirenscannerscanntDateienaufFestplatte(+evtl.Inhaltdes
Hauptspeichers),kannnichtzwischenCodeundDatenunterscheiden
• Folge:MicrosoftSecurityEssentialserkannteBlockchain alsVirusundwollteDateilöschen• Bitcoin-SWlädtBlockchain neu• Virenscannererkennt„Virus“erneut...
http://thehackernews.com/2014/05/microsoft-security-essential-found.html
15
WeitereProblemevonSignaturen
MitjedemneuenVirussteigtGrößederSignatur-Datenbank• AufwendigererScanvorgang– immermehrSignaturenmüssen
überprüftwerden
Signaturdatenbankmussaktuellgehaltenwerden• NeuauftretendeMalwarewirdnichterkannt,daHerstellervon
Virenscannerndieseerstanalysierenmüssen• DanachwirdSignaturdatenbankaktualisiert• ...unddann(irgendwann)verteilt...
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
16
PolymorpheMalware
Ziel:UmgehenderErkennungdurchSignaturen• ÄnderungvonBytesinCodedesProgramms
• Absichtlich:MutationoderObfuskation• Einfügenvon„NOP“:0x90 =XCHGEAX,EAXoderanderen
Instruktionen,diedieSemantiknichtändern• ÄndernderCodereihenfolgedurchSprungbefehle• Ersetzenvon(Folgenvon)Maschineninstruktionendurch
semantischidentischeInstruktionen
• ...oderaucheherunabsichtlichdurchAdressänderungenimCode,Compileroptimierungenbei„Update“einesVirususw.
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
17
Malware-Obfuskation:BeispielfürEinfügenvonNOPs(1)Einfügenvon„NOP“-Operationen• BeibehaltungderSemantik• aber:andereSignatur
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch
90 nopFA cli8B 2B mov ebp, [ebx]
Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90
FA8B 2B
Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.
E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B
Figure 2: Extended signature to catch nop-insertion.
HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known
in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.
4 Obfuscation Attacks on CommercialVirus Scanners
We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-
cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.
4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion
adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus
software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-
inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-
Codeaus Chernobyl/CIH-Virus:MihaiChristodorescu andSomesh Jha,“Staticanalysisofexecutablestodetectmaliciouspatterns”,USENIXSecuritySymposium- Volume12,2003
18
Malware-Obfuskation:BeispielfürEinfügenvonNOPs(2)AndereSignaturbenötigt• Einfügenvon
NOPsistaberan beliebigenStellenmöglich...
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch
90 nopFA cli8B 2B mov ebp, [ebx]
Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90
FA8B 2B
Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.
E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B
Figure 2: Extended signature to catch nop-insertion.
HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known
in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.
4 Obfuscation Attacks on CommercialVirus Scanners
We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-
cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.
4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion
adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus
software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-
inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-
Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch
90 nopFA cli8B 2B mov ebp, [ebx]
Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90
FA8B 2B
Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.
E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B
Figure 2: Extended signature to catch nop-insertion.
HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known
in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.
4 Obfuscation Attacks on CommercialVirus Scanners
We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-
cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.
4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion
adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus
software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-
inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-
19
FlexibleErkennungvondurch„NOP“mutiertenVariantenSignatur wirdzureguläremAusdruck• MöglichkeitderBeschreibungoptionalerundwiederholter
BytesundBytefolgennotwendig• Aufwendigerzuanalysieren
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch
90 nopFA cli8B 2B mov ebp, [ebx]
Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90
FA8B 2B
Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.
E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B
Figure 2: Extended signature to catch nop-insertion.
HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known
in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.
4 Obfuscation Attacks on CommercialVirus Scanners
We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-
cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.
4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion
adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus
software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-
inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-
Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch
90 nopFA cli8B 2B mov ebp, [ebx]
Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90
FA8B 2B
Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.
E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B
Figure 2: Extended signature to catch nop-insertion.
HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known
in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.
4 Obfuscation Attacks on CommercialVirus Scanners
We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-
cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.
4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion
adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus
software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-
inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-
Andenmit (90)*markiertenStellen könnten möglicherweiseeine oder mehrere NOP-Instruktionen zusätzlich auftauchen
Flexible Signatur
20
Malware-Obfuskation:BeispielfürCode-UmsortierungCodewirdimSpeicherumgeordnet• VerwendenvonSprungbefehlen,umKontrollflusszuverändern
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
21
Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx
sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]
Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.
Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus
7.0 6.01 4.61.2
Chernobyl original ✓ ✓ ✓ ✓
obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓
z0mbie-6.boriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
f0sf0r0original ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Hareoriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition
Table 1: Results of testing various virus scanners on obfuscated viruses.
in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.
The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.
Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx
sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]
Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.
Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus
7.0 6.01 4.61.2
Chernobyl original ✓ ✓ ✓ ✓
obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓
z0mbie-6.boriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
f0sf0r0original ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Hareoriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition
Table 1: Results of testing various virus scanners on obfuscated viruses.
in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.
The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.
Malware-Obfuskation:VerwendungsemantischidentischerInstr.Instruktionenwerdenersetzt• Semantikwirdbeibehalten• z.B.pushdurchexpliziteSP-Operation+Storeersetzen
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
22
Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx
sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]
Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.
Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus
7.0 6.01 4.61.2
Chernobyl original ✓ ✓ ✓ ✓
obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓
z0mbie-6.boriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
f0sf0r0original ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Hareoriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition
Table 1: Results of testing various virus scanners on obfuscated viruses.
in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.
The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.
Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx
sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]
Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.
Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus
7.0 6.01 4.61.2
Chernobyl original ✓ ✓ ✓ ✓
obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓
z0mbie-6.boriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
f0sf0r0original ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Hareoriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition
Table 1: Results of testing various virus scanners on obfuscated viruses.
in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.
The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.
ErkennungvonObfuskationen
Vorgehensweise:• ErstellendesKontrollflussgraphen
desMaschinencodes• Markieren„unnötiger“Operationen
(NOP,JMP,etc.)• Vergleichmitbekanntem
Kontrollflussgraphen(Graph-Isomorphismus)
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
23
mov eax, dr1
jmp n_11
mov ebx, [eax+10h]
mov edi, [eax]
Loop: pop ecx
jecxz n_18
nop
(F)
pop ebx
(T)
mov esi, ecx
nop
nop
mov eax, 0d601h
jmp n_13
pop edx
jmp n_02
pop ecx
nop
call edi
jmp Loop
pop eax
push eax
pop eax
stc
pushf
mov eax, dr1
jmp n_11
Assign(eax,dr1)
mov ebx, [eax+10h]
IrrelevantJump
jmp n_02
Assign(ebx,[eax+10h])
IrrelevantJump
mov edi, [eax]Assign(edi,[eax])
Loop: pop ecxLoop: Pop(ecx)
jecxz n_18If(ecx==0)
nop
(F)
pop ebx
(T)
IrrelevantInstr
mov esi, ecxAssign(esi,ecx)
nopIrrelevantInstr
nop
mov eax, 0d601hAssign(eax,0d601h)
jmp n_13IrrelevantJump
pop edxPop(edx)
pop ecxPop(ecx)
nopIrrelevantInstr
call ediIndirectCall(edi)
jmp LoopGoTo(Loop)
Pop(ebx)
pop eaxPop(eax)
push eaxIrrelevantInstr
pop eax
stcAssign(Carry,1)
pushfPush(flags)
Figure 8: Control flow graph of obfuscated code fragment, and annotations.
ÜberblickObfuskationstechnikenKombinationderverschiedenenTechnikenmöglich• ExtremhoherErkennungsaufwand
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
24
Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx
sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]
Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.
Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus
7.0 6.01 4.61.2
Chernobyl original ✓ ✓ ✓ ✓
obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓
z0mbie-6.boriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
f0sf0r0original ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Hareoriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition
Table 1: Results of testing various virus scanners on obfuscated viruses.
in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.
The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.
Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx
sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]
Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.
Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus
7.0 6.01 4.61.2
Chernobyl original ✓ ✓ ✓ ✓
obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓
z0mbie-6.boriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
f0sf0r0original ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Hareoriginal ✓ ✓ ✓ ✓
obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓
Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition
Table 1: Results of testing various virus scanners on obfuscated viruses.
in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.
The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.
Kommerzielle VirenscannererkanntenmutierteVarianten nicht!
PolymorpheMalware:BeispielfürAdreßänderungenVerwendungeinerstatischgelinktenLibrary• Identischer
Assembler-Quelltext
• UnterschiedlicheabsoluteAdressendurchanderesSpeicherlayout
• KomplexereSignaturerkennungnotwendig
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
FLIRT Signatures: Good Scenario
● Library statically-linked, not recompiled
25
PolymorpheMalware:BeispielfürCompileroptimierungenCompilereliminiertnichtbenötigteMaschineninstruktionen• Oftidentischer
(z.B.C-)Quelltext• Optimierungs-
phase desCompilerserkennt,dasseinigeerzeugteMaschinen-instruktionen inKontextnichtbenötigtwerden
• EntferntdieseausProgramm
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
FLIRT Signatures: Bad Scenario
● Library was recompiled
26
KomplexereStatischeAnalyse:AbstrakteInterpretation
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
Idee• ErmittelnderSemantikeinesProgramms,ohneesauszuführen• ErkennungvonAnomalien
Methode:AbstrakteInterpretation• Informationen über dasVerhalten vonProgrammen (Analyse
derSemantik)erhalten,indemmanvonTeilen desProgrammsabstrahiert unddieAnweisungen Schritt für Schrittnachvollzieht (Interpretation)
27
PatrickCousot,Radhia Cousot:“Abstractinterpretation:aunified latticemodelforstaticanalysisofprogramsbyconstructionorapproximationoffixpoints”, SixthAnnualACMSIGPLAN-SIGACTSymposium onPrinciplesofProgramming Languages,1977
AbstrakteInterpretation
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
Vorgehensweise• KonzentrationaufTeilaspektederAusführungderAnweisungen• GezieltesWeglassenvonInformation(Abstraktion)• Ergebnis:NäherungandieProgrammsemantik• KannfürgewünschtenZweckvollkommenausreichendsein
• VieleEigenschaftenvonProgrammensindnichtberechenbar• MankannkeinProgrammangeben,welchessieinendlicherZeitmit
endlichenSpeicherresourcen fürbeliebigeProgrammeberechnet(→Halteproblem)
• Approximation(WeglasseneinigerInformationen)ermöglichteinigeweitereAnalysen
28
AbstrakteInterpretation:Beispiel
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
AbstrakteInterpretationistkomplexermathematischerAnsatz• DieGrundprinzipiensindabereinfach• Beispiel:BestimmendesVorzeichensvonVariablen
29
Abstract Interpretation: Signs Analysis
● AI is complicated, but the basic ideas are not
● Ex: determine each variable's sign at each point
● Replaced the
● concrete state with an abstract state
● concrete semantics with an abstract semantics
Ersetzen vonkonkretem Zustand durch abstrakten Zustandundkonkreter Semantik durch abstrakte Semantik
Concept: Abstract the State
● Different abstract interpretations use different abstract states.
● For the signs analysis, each variable could be
● Unknown: either positive or negative (+/-)
● Positive: x >= 0 (0+)
● Negative: x <= 0 (0-)
● Zero (0)
● Uninitialized (?)
● Ignore all other information, e.g., the actual values of variables.
AbstrakteInterpretation:Zustandsabstraktion
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
VerschiedeneMethodenderabstr. InterpretationverwendenunterschiedlicheAbstraktionen• FürVorzeichenanalyseistjedeVariable:• unbekannt:positivodernegativ(+/-)• positiv:x>=0(0+)• negativ:x<=0(0-)• null(0)• uninitialisiert (?)
• AlleweiterenInformationenwerdenignoriert• Hierz.B.derkonkreteWertderVariablen
30
AbstrakteInterpretation:Semantikabstraktion(1)
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
AbstrakteInterpretationbefolgtVorzeichenregelnfürdieMultiplikation:• „+“*„+“=„+“• „–“*„–“=„+“• „–“*„+“=„+“*„–“=„–“
• Hinweis:dieseRegelnbeziehensichaufmathematischeganzeZahlen,imComputerrepräsentierteZahlenkönnenüberlaufenundunabsichtlichihrVorzeichenändern!
31
Concept: Abstract the Semantics (*)● Abstract multiplication follows the well-known
“rule of signs” from grade school
● A positive times a positive is positive
● A negative times a negative is positive
● A negative times a positive is negative
● Note: these remarks refer to mathematical integers; machine integers are subject to overflow
* ? 0 0+ 0- +/-
? +/- 0 +/- +/- +/-
0 0 0 0 0 0
0+ +/- 0 0+ 0- +/-
0- +/- 0 0- 0+ +/-
+/- +/- 0 +/- +/- +/-
AbstrakteInterpretation:Semantikabstraktion(2)
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
AbstrakteInterpretationbenötigtVorzeichenregelnfürdieAddition:• „+“+„+“=„+“• „–“+„–“=„–“• „–“+„+“=„+“+„–“=„?“
• Beispiel:• –5+5⇒ konkretesErgebnis0• –6+5⇒ konkretesErgebnis –1• –5+6⇒ konkretesErgebnis 1
32
Concept: Abstract the Semantics (+)
● Positive + positive = positive.
● Negative + negative = negative.
● Negative + positive = unknown:
● -5 + 5. Concretely, the result is 0.
● -6 + 5. Concretely, the result is -1.
● -5 + 6. Concretely, the result is 1.
+ ? 0 0+ 0- +/-
? +/- +/- +/- +/- +/-
0 +/- 0 0+ 0- +/-
0+ +/- 0+ 0+ +/- +/-
0- +/- 0 0- 0+ +/-
+/- +/- 0 +/- +/- +/-
KonkreteAnwendungderabstraktenInterpretation:SprunganalyseHerausfindenderStruktureinercompilierten „switch“-Anweisung• CompilerverwendenSprungtabellenfüreffizientenCode
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
33
Switch Tables: Contiguous, Indexed
Sprunganalyse(2)
SwitchmitweitverteiltenWerten• Sprungtabellewärenurdünnbesetzt(vieleNullen)• ErsetzungdurchFolgenvon„if“-Abfagen istineffizient• VieleAbfragenundbedingteSprüngenotwendig:O(N)• stattdessenverwendenCompilerEntscheidungsbäume,
Aufwand:O(log(n))
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
34
Switch Tables: Sparsely-Populated
Switch cases are sparsely-distributed.
Cannot implement efficiently with a table.
One option is to replace the construct with a series of if-statements.
This works, but takes O(N) time.
Instead, compilers generate decision trees that take O(log(N)) time, as shown on the next slide.
Decision Trees for Sparse SwitchesSprunganalyse(3)
Entscheidungsbaum
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
35
Sprunganalyse:Assemblercode(1)
ImplementierungdesEntscheidungsbaums
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
36
Assembly Language Reification
Additional, slight complication: red instructions modify EAX throughout the decision tree.
Vergleich mit Entscheidungswert
Größer?Springen!
Gleich?Springen!
Sonst:weiter im Code,nächsten Vergleich ausführen
Kleines Problem:eax wird vonmarkierten Instruktionenverändert
Sprunganalyse:Assemblercode(2)
Kontrollflussdarstellung:
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
37
Assembly Language Reification, Graphical
AbstraktionderEntscheidungen imswitch-Statement• Erkenntnis:• wichtig:welcheWertebereiche führen
zurAusführungwelchescase-Falls?
• Datenabstraktion:• Intervalle[l,u]mitl<=u
• switch implementiertmitsub,dec undcmp-Instruktionen– allesSubtraktionen– undbedingtenSprüngen
• Semantikabstraktion:• BeibehaltenderSubtraktionen• Verzweigungbeibranch-Befehlen
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
38
Switch Tables: Sparsely-Populated
Switch cases are sparsely-distributed.
Cannot implement efficiently with a table.
One option is to replace the construct with a series of if-statements.
This works, but takes O(N) time.
Instead, compilers generate decision trees that take O(log(N)) time, as shown on the next slide.
Assembly Language Reification
Additional, slight complication: red instructions modify EAX throughout the decision tree.
Analysis Results
Beginning with no information about arg_0, each path through the decision tree induces a constraint upon its range of possible values, with single values or simple
ranges at case labels.
Analyseergebnisseswitch-Statement• KeineinitialeInformationüberarg_0vorhanden• JederPfaddurchCodeschränktmöglicheWertefürarg_0ein• WertanBlättern(case-Labels):Wert/einfacherWertebereich
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
39
DynamischeAnalysen
ÜberwachungdesProgrammverhaltenszurLaufzeit• StatischeAnalysenbenötigenSignaturen• Könnenneueundevtl.mutierteMalwarenichterkennen
• IdeederdynamischenAnalyse:• BetrachtendesVerhaltenseinesProgrammszurLaufzeit
• SemantischeMethode– AnalysedesVerhaltens einesProgramms
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
40
DynamischeAnalyse:KontrollflussverfolgungIdee:Überprüfen,obtatsächlichstattgefundenerKontrollflussmitSemantikeinesProgrammsübereinstimmt• z.B.aufFunktionsebene:
ÜberprüfenvonRücksprungadressenbeiEintrittinFunktion
• VergleichmitFunktionsaufruf-graph
• Entdecktz.B.manipulierteRück-sprungadressen
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
41
KomplexerFunktions-aufrufgraphKomplexereProgrammemachenAnalyseaufwendig...
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
42
ImplementierungderKontrollflussverfolgungIdee:ErstelleneinerRepräsentationdesAufrufgraphen• Automatischdurch
CompilererstellbarBeiEintrittinFunktion:• Überprüfen,obKantevom
Aufrufer(Returnadresse istauf demStack!)inGraphenthaltenist
• z.B.darf(kann?)„io_png_write_raw“nurvon„io_png_write_f32“und„io_png_write_u8“aufgerufenwerden
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
43
ProblemederKontrollflussverfolgungDynamischerCode• VerwendungvonFunktionszeigern/Sprungtabellen:
CompilerkannzurÜbersetzungszeitkeinevollständigeListederFunktionenerstellen,diezumAufrufeinerFunktionfführen• AlsoistderAufrufgraphunvollständig
• Lösung:• Zeigeranalysewäreerforderlich...✘
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
44
Fazit
StatischeMethodenzurErkennungvonMalware• Signaturcheckssindproblematisch• MutierendeMalware,mangelndeAktualität
• AufwendigerestatischeMethodenexistieren• AbstrakteInterpretation
DynamischeAnalysenzurErkennungvonMalware• KontrollflussanalysedurchÜberwachungvonFunktionsaufrufen• mehr:nächsteWoche...
MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung
45