examining malware with python

40

Upload: mrphilroth

Post on 21-Apr-2017

3.811 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Examining Malware with Python

Phil RothData Scientist at Endgame

@mrphilroth

3

Python tools for text classification can easily be adopted for malware classification.

When using instruction ngrams, your disassembler and analysis passes are very important.references: http://bit.ly/scipy-malware

Conclusions

4

Yes it’s malware, but what kind?

The Data 5

10868 labeled samples10873 unlabeled samples~500 GB uncompressed9 classes

Classes 6

Hex Dump 7

00401000 00 00 80 40 40 28 00 1C 02 42 00 C4 00 20 04 2000401010 00 00 20 09 2A 02 00 00 00 00 8E 10 41 0A 21 0100401020 40 00 02 01 00 90 21 00 32 40 00 1C 01 40 C8 1800401030 40 82 02 63 20 00 00 09 10 01 02 21 00 82 00 0400401040 82 20 08 83 00 08 00 00 00 00 02 00 60 80 10 8000401050 18 00 00 20 A9 00 00 00 00 04 04 78 01 02 70 9000401060 00 02 00 08 20 12 00 00 00 40 10 00 80 00 40 1900401070 00 00 00 00 11 20 80 04 80 10 00 20 00 00 25 0000401080 00 00 01 00 00 04 00 10 02 C1 80 80 00 20 20 0000401090 08 A0 01 01 44 28 00 00 08 10 20 00 02 08 00 00004010A0 00 40 00 00 00 34 40 40 00 04 00 08 80 08 00 08004010B0 10 00 40 00 68 02 40 04 E1 00 28 14 00 08 20 0A004010C0 06 01 02 00 40 00 00 00 00 00 00 20 00 02 00 04004010D0 80 18 90 00 00 10 A0 00 45 09 00 10 04 40 44 82004010E0 90 00 26 10 00 00 04 00 82 00 00 00 20 40 00 00004010F0 B4 00 00 40 00 02 20 25 08 00 00 00 00 00 00 0000401100 08 00 00 50 00 08 40 50 00 02 06 22 08 85 30 0000401110 00 80 00 80 60 00 09 00 04 20 00 00 00 00 00 0000401120 00 82 40 02 00 11 46 01 4A 01 8C 01 E6 00 86 1000401130 4C 01 22 00 64 00 AE 01 EA 01 2A 11 E8 10 26 1100401140 4E 11 8E 11 C2 00 6C 00 0C 11 60 01 CA 00 62 1000401150 6C 01 A0 11 CE 10 2C 11 4E 10 8C 00 CE 01 AE 0100401160 6C 10 6C 11 A2 01 AE 00 46 11 EE 10 22 00 A8 0000401170 EC 01 08 11 A2 01 AE 10 6C 00 6E 00 AC 11 8C 0000401180 EC 01 2A 10 2A 01 AE 00 40 00 C8 10 48 01 4E 1100401190 0E 00 EC 11 24 10 4A 10 04 01 C8 11 E6 01 C2 00

raw data in hex

Hex Dump 8

00401000 00 00 80 40 40 28 00 1C 02 42 00 C4 00 20 04 2000401010 00 00 20 09 2A 02 00 00 00 00 8E 10 41 0A 21 0100401020 40 00 02 01 00 90 21 00 32 40 00 1C 01 40 C8 1800401030 40 82 02 63 20 00 00 09 10 01 02 21 00 82 00 0400401040 82 20 08 83 00 08 00 00 00 00 02 00 60 80 10 8000401050 18 00 00 20 A9 00 00 00 00 04 04 78 01 02 70 9000401060 00 02 00 08 20 12 00 00 00 40 10 00 80 00 40 1900401070 00 00 00 00 11 20 80 04 80 10 00 20 00 00 25 0000401080 00 00 01 00 00 04 00 10 02 C1 80 80 00 20 20 0000401090 08 A0 01 01 44 28 00 00 08 10 20 00 02 08 00 00004010A0 00 40 00 00 00 34 40 40 00 04 00 08 80 08 00 08004010B0 10 00 40 00 68 02 40 04 E1 00 28 14 00 08 20 0A004010C0 06 01 02 00 40 00 00 00 00 00 00 20 00 02 00 04004010D0 80 18 90 00 00 10 A0 00 45 09 00 10 04 40 44 82004010E0 90 00 26 10 00 00 04 00 82 00 00 00 20 40 00 00004010F0 B4 00 00 40 00 02 20 25 08 00 00 00 00 00 00 0000401100 08 00 00 50 00 08 40 50 00 02 06 22 08 85 30 0000401110 00 80 00 80 60 00 09 00 04 20 00 00 00 00 00 0000401120 00 82 40 02 00 11 46 01 4A 01 8C 01 E6 00 86 1000401130 4C 01 22 00 64 00 AE 01 EA 01 2A 11 E8 10 26 1100401140 4E 11 8E 11 C2 00 6C 00 0C 11 60 01 CA 00 62 1000401150 6C 01 A0 11 CE 10 2C 11 4E 10 8C 00 CE 01 AE 0100401160 6C 10 6C 11 A2 01 AE 00 46 11 EE 10 22 00 A8 0000401170 EC 01 08 11 A2 01 AE 10 6C 00 6E 00 AC 11 8C 0000401180 EC 01 2A 10 2A 01 AE 00 40 00 C8 10 48 01 4E 1100401190 0E 00 EC 11 24 10 4A 10 04 01 C8 11 E6 01 C2 00

00401180

EC 01 2A 10 2A 01 AE

offset

data in hex

raw data in hex

Disassembly 9

HEADER:00400000 ;HEADER:00400000 ; +-------------------------------------------------------------------------+HEADER:00400000 ; | This file has been generated by The Interactive Disassembler (IDA) |HEADER:00400000 ; | Copyright (c) 2013 Hex-Rays, <[email protected]> |HEADER:00400000 ; | License info: |HEADER:00400000 ; | Microsoft |HEADER:00400000 ; +-------------------------------------------------------------------------+HEADER:00400000 ;HEADER:00400000HEADER:00400000HEADER:00400000 .686pHEADER:00400000 .mmxHEADER:00400000 .model flatHEADER:00400000HEADER:00400000 ; ===========================================================================HEADER:00400000HEADER:00400000 ; [00001000 BYTES: COLLAPSED SEGMENT HEADER. PRESS KEYPAD CTRL-"+" TO EXPAND].text:00401000 ;.text:00401000 ; Format : Portable executable for 80386 (PE).text:00401000 ; Imagebase : 400000.text:00401000 ; Section 1. (virtual address 00001000).text:00401000 ; Virtual size : 00071050 ( 462928.).text:00401000 ; Section size in file : 00071200 ( 463360.).text:00401000 ; Offset to raw data for section: 00000400.text:00401000 ; Flags 60000020: Text Executable Readable.text:00401000 ; Alignment : default.text:00401000 ; ===========================================================================

HEADER:00400000 ;HEADER:00400000 ; +-------------------------------------------------------------------------+HEADER:00400000 ; | This file has been generated by The Interactive Disassembler (IDA) |HEADER:00400000 ; | Copyright (c) 2013 Hex-Rays, <[email protected]> |HEADER:00400000 ; | License info: |HEADER:00400000 ; | Microsoft |HEADER:00400000 ; +-------------------------------------------------------------------------+HEADER:00400000 ;HEADER:00400000HEADER:00400000HEADER:00400000 .686pHEADER:00400000 .mmxHEADER:00400000 .model flatHEADER:00400000HEADER:00400000 ; ===========================================================================HEADER:00400000HEADER:00400000 ; [00001000 BYTES: COLLAPSED SEGMENT HEADER. PRESS KEYPAD CTRL-"+" TO EXPAND].text:00401000 ;.text:00401000 ; Format : Portable executable for 80386 (PE).text:00401000 ; Imagebase : 400000.text:00401000 ; Section 1. (virtual address 00001000).text:00401000 ; Virtual size : 00071050 ( 462928.).text:00401000 ; Section size in file : 00071200 ( 463360.).text:00401000 ; Offset to raw data for section: 00000400.text:00401000 ; Flags 60000020: Text Executable Readable.text:00401000 ; Alignment : default.text:00401000 ; ===========================================================================

Disassembly 10

HEADER:00400000

section nameoffset

HEADER:00400000 ;HEADER:00400000 ; +-------------------------------------------------------------------------+HEADER:00400000 ; | This file has been generated by The Interactive Disassembler (IDA) |HEADER:00400000 ; | Copyright (c) 2013 Hex-Rays, <[email protected]> |HEADER:00400000 ; | License info: |HEADER:00400000 ; | Microsoft |HEADER:00400000 ; +-------------------------------------------------------------------------+HEADER:00400000 ;HEADER:00400000HEADER:00400000HEADER:00400000 .686pHEADER:00400000 .mmxHEADER:00400000 .model flatHEADER:00400000HEADER:00400000 ; ===========================================================================HEADER:00400000HEADER:00400000 ; [00001000 BYTES: COLLAPSED SEGMENT HEADER. PRESS KEYPAD CTRL-"+" TO EXPAND].text:00401000 ;.text:00401000 ; Format : Portable executable for 80386 (PE).text:00401000 ; Imagebase : 400000.text:00401000 ; Section 1. (virtual address 00001000).text:00401000 ; Virtual size : 00071050 ( 462928.).text:00401000 ; Section size in file : 00071200 ( 463360.).text:00401000 ; Offset to raw data for section: 00000400.text:00401000 ; Flags 60000020: Text Executable Readable.text:00401000 ; Alignment : default.text:00401000 ; ===========================================================================

Disassembly 11

HEADER:00400000

section nameoffset

Disassembly 12

.text:00470050 ; =============== S U B R O U T I N E ====================================

.text:00470050

.text:00470050 ; Attributes: bp-based frame

.text:00470050

.text:00470050 sub_470050 proc near ; CODE XREF: start+D8D^Yp

.text:00470050

.text:00470050 var_68 = dword ptr -68h

.text:00470050 var_64 = dword ptr -64h

.text:00470050 var_60 = dword ptr -60h

.text:00470050

.text:00470050 55 push ebp

.text:00470051 8B EC mov ebp, esp

.text:00470053 83 C4 98 add esp, 0FFFFFF98h

.text:00470056 33 C0 xor eax, eax

.text:00470058 8B 15 7C 10 4B 00 mov edx, dword_4B107C

.text:0047005E 89 55 EC mov [ebp+var_14], edx

.text:00470061 89 45 EC mov [ebp+var_14], eax

.text:00470064 53 push ebx

.text:00470065 8B 1D 7C 10 4B 00 mov ebx, dword_4B107C

.text:0047006B 83 FB 2D cmp ebx, 2Dh

.text:0047006E 75 03 jnz short loc_470073

.text:00470070 89 5D EC mov [ebp+var_14], ebx

.text:00470073

.text:00470073 loc_470073: ; CODE XREF: sub_470050+1E^Xj

.text:00470073 56 push esi

.text:00470074 33 C0 xor eax, eax

.text:00470076 8B 5D EC mov ebx, [ebp+var_14]

.text:00470050 ; =============== S U B R O U T I N E ====================================

.text:00470050

.text:00470050 ; Attributes: bp-based frame

.text:00470050

.text:00470050 sub_470050 proc near ; CODE XREF: start+D8D^Yp

.text:00470050

.text:00470050 var_68 = dword ptr -68h

.text:00470050 var_64 = dword ptr -64h

.text:00470050 var_60 = dword ptr -60h

.text:00470050

.text:00470050 55 push ebp

.text:00470051 8B EC mov ebp, esp

.text:00470053 83 C4 98 add esp, 0FFFFFF98h

.text:00470056 33 C0 xor eax, eax

.text:00470058 8B 15 7C 10 4B 00 mov edx, dword_4B107C

.text:0047005E 89 55 EC mov [ebp+var_14], edx

.text:00470061 89 45 EC mov [ebp+var_14], eax

.text:00470064 53 push ebx

.text:00470065 8B 1D 7C 10 4B 00 mov ebx, dword_4B107C

.text:0047006B 83 FB 2D cmp ebx, 2Dh

.text:0047006E 75 03 jnz short loc_470073

.text:00470070 89 5D EC mov [ebp+var_14], ebx

.text:00470073

.text:00470073 loc_470073: ; CODE XREF: sub_470050+1E^Xj

.text:00470073 56 push esi

.text:00470074 33 C0 xor eax, eax

.text:00470076 8B 5D EC mov ebx, [ebp+var_14]

Disassembly 13

mov ebx,dword_4B107Coperation stringinstruction

.text:00470050 ; =============== S U B R O U T I N E ====================================

.text:00470050

.text:00470050 ; Attributes: bp-based frame

.text:00470050

.text:00470050 sub_470050 proc near ; CODE XREF: start+D8D^Yp

.text:00470050

.text:00470050 var_68 = dword ptr -68h

.text:00470050 var_64 = dword ptr -64h

.text:00470050 var_60 = dword ptr -60h

.text:00470050

.text:00470050 55 push ebp

.text:00470051 8B EC mov ebp, esp

.text:00470053 83 C4 98 add esp, 0FFFFFF98h

.text:00470056 33 C0 xor eax, eax

.text:00470058 8B 15 7C 10 4B 00 mov edx, dword_4B107C

.text:0047005E 89 55 EC mov [ebp+var_14], edx

.text:00470061 89 45 EC mov [ebp+var_14], eax

.text:00470064 53 push ebx

.text:00470065 8B 1D 7C 10 4B 00 mov ebx, dword_4B107C

.text:0047006B 83 FB 2D cmp ebx, 2Dh

.text:0047006E 75 03 jnz short loc_470073

.text:00470070 89 5D EC mov [ebp+var_14], ebx

.text:00470073

.text:00470073 loc_470073: ; CODE XREF: sub_470050+1E^Xj

.text:00470073 56 push esi

.text:00470074 33 C0 xor eax, eax

.text:00470076 8B 5D EC mov ebx, [ebp+var_14]

Disassembly 14

mov ebx,dword_4B107Coperation stringinstruction

Disassembly 15

.idata:0046F4DC ;

.idata:0046F4DC ; Imports from KERNEL32.DLL

.idata:0046F4DC ;

.idata:0046F4DC ; ===========================================================================

.idata:0046F4DC

.idata:0046F4DC ; Segment type: Externs

.idata:0046F4DC ; _idata

.idata:0046F4DC ; DWORD __stdcall GetCurrentThreadId()

.idata:0046F4DC ?? ?? ?? ?? extrn __imp_GetCurrentThreadId:dword

.idata:0046F4DC ; DATA XREF: .text:0046F66C^Yo

.idata:0046F4DC ; GetCurrentThreadId^Yr

.idata:0046F4E0 ; BOOL __stdcall WriteFile(HANDLE hFile, LPCVOID lpBuffer, DWORD ...

.idata:0046F4E0 ?? ?? ?? ?? extrn WriteFile:dword ; DATA XREF: .text:00471E4C^Yr

.idata:0046F4E4 ; BOOL __stdcall FindNextVolumeA(HANDLE hFindVolume, LPSTR lpszVolumeName, DW ...

.idata:0046F4E4 ?? ?? ?? ?? extrn FindNextVolumeA:dword

.idata:0046F4E4 ; DATA XREF: .text:00471E46^Yr

.idata:0046F4E8 ; LPVOID __stdcall VirtualAlloc(LPVOID lpAddress, SIZE_T dwSize, DWORD ...

.idata:0046F4E8 ?? ?? ?? ?? extrn __imp_VirtualAlloc:dword

.idata:0046F4E8 ; DATA XREF: VirtualAlloc^Yr

.idata:0046F4EC ; BOOL __stdcall EnumResourceLanguagesA(HMODULE hModule, LPCSTR lpType, LPCSTR ...

.idata:0046F4EC ?? ?? ?? ?? extrn EnumResourceLanguagesA:dword

.idata:0046F4EC ; DATA XREF: .text:00471E70^Yr

.idata:0046F4DC ;

.idata:0046F4DC ; Imports from KERNEL32.DLL

.idata:0046F4DC ;

.idata:0046F4DC ; ===========================================================================

.idata:0046F4DC

.idata:0046F4DC ; Segment type: Externs

.idata:0046F4DC ; _idata

.idata:0046F4DC ; DWORD __stdcall GetCurrentThreadId()

.idata:0046F4DC ?? ?? ?? ?? extrn __imp_GetCurrentThreadId:dword

.idata:0046F4DC ; DATA XREF: .text:0046F66C^Yo

.idata:0046F4DC ; GetCurrentThreadId^Yr

.idata:0046F4E0 ; BOOL __stdcall WriteFile(HANDLE hFile, LPCVOID lpBuffer, DWORD ...

.idata:0046F4E0 ?? ?? ?? ?? extrn WriteFile:dword ; DATA XREF: .text:00471E4C^Yr

.idata:0046F4E4 ; BOOL __stdcall FindNextVolumeA(HANDLE hFindVolume, LPSTR lpszVolumeName, DW ...

.idata:0046F4E4 ?? ?? ?? ?? extrn FindNextVolumeA:dword

.idata:0046F4E4 ; DATA XREF: .text:00471E46^Yr

.idata:0046F4E8 ; LPVOID __stdcall VirtualAlloc(LPVOID lpAddress, SIZE_T dwSize, DWORD ...

.idata:0046F4E8 ?? ?? ?? ?? extrn __imp_VirtualAlloc:dword

.idata:0046F4E8 ; DATA XREF: VirtualAlloc^Yr

.idata:0046F4EC ; BOOL __stdcall EnumResourceLanguagesA(HMODULE hModule, LPCSTR lpType, LPCSTR ...

.idata:0046F4EC ?? ?? ?? ?? extrn EnumResourceLanguagesA:dword

.idata:0046F4EC ; DATA XREF: .text:00471E70^Yr

Disassembly 16

Imports from KERNEL32.DLL

__stdcall VirtualAlloc(import

imported function

My Solution 17

Byte ngrams

Instruction ngramsNamed features

SelectKBest

SelectKBestGradient Boosting Classifier

Features Feature Selection Model

Manual Features

Byte ngrams 18

00401000 00 00 80 40 40 28 00 1C 02 42 00 C4 00 20 04 2000401010 00 00 20 09 2A 02 00 00 00 00 8E 10 41 0A 21 0100401020 40 00 02 01 00 90 21 00 32 40 00 1C 01 40 C8 1800401030 40 82 02 63 20 00 00 09 10 01 02 21 00 82 00 04

Possibilies1gram: 2562gram: 655363gram: 167772164gram: 4294967296

Solution: Hashing

Byte ngrams 19

vectorizer = HashingVectorizer( input="content", lowercase=True, stop_words=None, ngram_range=(1,3), analyzer="word", n_features=2**16, binary=False, norm=None, non_negative=True)

pipe = Pipeline([ ("extraction", CustomExtractor(vectorizer=vectorizer)), ("sel", VarianceThreshold(threshold=0)), ("tfidf", TfidfTransformer(norm="l2", use_idf=True, smooth_idf=True,

sublinear_tf=True)), ("kbest", SelectKBest(score_func=f_classif, k=500))])

Code for extracting the byte ngrams and reducing dimensionality:

Byte ngrams 20

vectorizer = HashingVectorizer( input="content", lowercase=True, stop_words=None, ngram_range=(1,3), analyzer="word", n_features=2**16, binary=False, norm=None, non_negative=True)

pipe = Pipeline([ ("extraction", CustomExtractor(vectorizer=vectorizer)), ("sel", VarianceThreshold(threshold=0)), ("tfidf", TfidfTransformer(norm="l2", use_idf=True, smooth_idf=True,

sublinear_tf=True)), ("kbest", SelectKBest(score_func=f_classif, k=500))])

Code for extracting the byte ngrams and reducing dimensionality:

class CustomExtractor() : def __init__(self, vectorizer=HashingVectorizer()) :

self.vectorizer = vectorizer def fit(self, X, y) :

return self # stateless def transform(self, X, y=None) : pool = multiprocessing.Pool()

rows = pool.map(self.feature_extract, X, 32)return scipy.sparse.vstack(list(rows))

fit_transform = transform

def feature_extract(self, file_name) :clean_bytes = " ".join(toolz.pipe(

open(file_name, "r"), map(lambda line : line.rstrip().split()[1:]), toolz.concat, filter(lambda b : b != "??" and b != "?")

))return self.vectorizer.transform([clean_bytes])

Byte ngrams 21

Why they might be useful: https://github.com/wapiflapi/binglide

x86 bytecode compressed data (jpg)

Byte ngrams 22

sample 0A32eTdBKayjCWhZqDOQ

.text section .data section

Instruction ngrams 23

push lea push mov call mov mov pop retnmov jmppush mov mov call test jz push call add mov pop retnmov mov mov mov retnmov lea mov inc test jnz sub retnmov mov mov push mov push push push push call add mov pop retnmov mov mov push mov push push push push call add mov pop retnxor retnmov retnmov retnmov retnmov mov mov retnmov test jz mov mov push push call mov mov retnpush push push push call push call mov push push push mov call mov retnmov mov mov retnmov test jz mov mov push push call mov mov retnpush push push push call mov push push push mov call push call mov retn

Extracted instructions:

Instruction ngrams 24

vectorizer = HashingVectorizer(input="content", lowercase=True, stop_words=None, ngram_range=(1, 2),analyzer="word", n_features=2**25, binary=False, norm=None,

non_negative=True) pipe = Pipeline([

("extraction", CustomExtractor(vectorizer=vectorizer)),("sel", VarianceThreshold(threshold=0)),("tfidf", TfidfTransformer(norm="l2", use_idf=True, smooth_idf=True, sublinear_tf=True)),("kbest", SelectKBest(score_func=f_classif, k=500))

])

Code for extracting the instruction ngrams and reducing dimensionality:

Section Names, Imports, Imported Functions.Extracted these features with regular expressions.Features were (awkwardly) selected in the same step as instruction ngrams.

Named Features 25

Named Features 26

import re

re_features = { "imports" : { "re" : re.compile("Imports from \w.+"), "extract" : lambda m : m.group().split()[-1], "filter" : lambda m : True }, "imported_functions" : { "re" : re.compile("__stdcall \w.+\("), "extract" : lambda m : m.group().split()[-1][:-1], "filter" : lambda m : not m.startswith("sub_") }, "section_names" : { "re" : re.compile("^\S+?:"), "extract" : lambda m : m.group()[:-1], "filter" : lambda m : True }}

Named Features 27

from toolz import pipe, uniquefrom tools.curried import map, filter

def process_re_feature(lines, re_dict) :

return pipe(lines,map(re_dict["re"].search),filter(lambda m : m is not None),map(re_dict["extract"]),filter(re_dict["filter"]),unique

)

Named Features 28

Manual Features 29

{"number_of_collapsed_functions": 451,"number_of_imported_functions": 101,"sample_length": 1201668,"number_of_imports": 4,"number_of_sections": 4,"section_length_0": 979764,...“section_length_6”: 0,"length_of_functions_0": 2706,..."length_of_functions_15": 107

}

0A32eTdBKayjCWhZqDOQ

Gradient Boosting Classifier on 1026 featuresGrid search optimized parametersAlso tried: LogisticRegression, MultinomialNB,

KNeighborsClassifier, RandomForestClassifier

Final Model 30

clf = GradientBoostingClassifier( loss='deviance', learning_rate=0.1, n_estimators=300, subsample=0.9, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, init=None, random_state=None, max_features=200, max_leaf_nodes=None, warm_start=False, verbose=2)

Final Model tSNE Plot 31

Final Model tSNE Plot 32

pipe = Pipeline([ ("tsvd", TruncatedSVD(n_components=50)), ("tsne", TSNE(n_components=2, perplexity=40.0, early_exaggeration=4.0, learning_rate=1000.0, n_iter=1000, metric='euclidean', init='random’))])

33

Results:

I did OK…

More focused on productization

xgboostmalware as an imagecompression ratio as a featureother expanded feature setsprobability calibration semi supervised learning

Winning Strategies 34

usable in a product

specific to competitions

35

ida ******************************CV Scores: [ 0.03800 0.02551 0.05283 0.03953 0.0350 ]mean: 0.03817940685733493 std: 0.008799619405211161capstone ******************************CV Scores: [ 0.05065 0.0451 0.06953 0.05583 0.05089]mean: 0.05441113231562615 std: 0.008283830117670508

code = bytes(bytearray.fromhex("".join(map(lambda l : "".join(l.split()[1:]).replace("?", ""),open("data/sample/0A32eTdBKayjCWhZqDOQ.bytes", "r")

))))

from capstone import Cs, CS_ARCH_X86, CS_MODE_32md = Cs(CS_ARCH_X86, CS_MODE_32)instructions = " ".join( [t[2] for t in md.disasm_lite(code, 0x1000) if t[2] != "int3"])

Using Capstone

IDA not (easily) batch distributablecapstone single pass produces suboptimal resultsradare2 Python scriptable reversing frameworkvivisect pure Python, largely undocumented

disassembler and analysis project

Disassemblers 36

Other Projects 37

pefile extracts header information from executablesbinglide visualizations of entropy and byte ngramscuckoo automated dynamic analysisbarf binary analysis framework with code analysis

38

Python tools for text classification can easily be adopted for malware classification.

When using instruction ngrams, your disassembler and analysis passes are very important.references: http://bit.ly/scipy-malware

Conclusions

Thank You