hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience

61
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience Lab for Internet & Security Technology (LIST) Northwestern University

Upload: donat

Post on 13-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience. Lab for Internet & Security Technology (LIST) Northwestern University. The Spread of Sapphire/Slammer Worms. Desired Requirements for Polymorphic Worm Signature Generation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

Hamsa: Fast Signature Generation for Zero-day

Polymorphic Wormswith Provable Attack Resilience

Lab for Internet & Security Technology (LIST)Northwestern University

Page 2: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

2

The Spread of Sapphire/Slammer Worms

Page 3: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

3

Desired Requirements for Polymorphic Worm Signature

Generation•Network-based signature generation

–Worms spread in exponential speed, to detect them in their early stage is very crucial… However

»At their early stage there are limited worm samples.

–The high speed network router may see more worm samples… But

»Need to keep up with the network speed !»Only can use network level information

Page 4: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

4

Desired Requirements for Polymorphic Worm Signature

Generation

No existing work satisfies these requirements !

•Noise tolerant–Most network flow classifiers suffer false

positives.–Even host based approaches can be injected

with noise.

•Attack resilience–Attackers always try to evade the detection

systems

•Efficient signature matching for high-speed links

Page 5: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

5

Outline

•Motivation•Hamsa Design•Model-based Signature Generation•Evaluation•Related Work•Conclusion

Page 6: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

6

Choice of Signatures

•Two classes of signatures–Content based

»Token: a substring with reasonable coverage to the suspicious traffic

»Signatures: conjunction of tokens

–Behavior based

•Our choice: content based–Fast signature matching. ASIC based

approach can archive 6 ~ 8Gb/s–Generic, independent of any protocol or

server

Page 7: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

7

Unique Invariants of Worms

• Protocol Frame– The code path to the vulnerability part, usually

infrequently used– Code-Red II: ‘.ida?’ or ‘.idq?’

• Control Data: leading to control flow hijacking– Hard coded value to overwrite a jump target or a

function call

• Worm Executable Payload– CLET polymorphic engine: ‘0\x8b’, ‘\xff\xff\xff’ and

‘t\x07\xeb’

• Possible to have worms with no such invariants, but very hard

Invariants

Page 8: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

8

Hamsa ArchitectureProtocolClassifier

UDP1434

HamsaSignatureGenerator

WormFlow

Classifier

TCP137

. . .TCP80

TCP53

TCP25

NormalTraffic Pool

SuspiciousTraffic Pool

Signatures

NetworkTap

KnownWormFilter

Normal traffic reservoir

Real time

Policy driven

Page 9: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

9

Components from existing work

•Worm flow classifiers–Scan based detector [Autograph]–Byte spectrum based approach

[PAYL]–Honeynet/Honeyfarm sensors

[Honeycomb]

Page 10: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

10

Hamsa Design•Key idea: model the uniqueness of worm

invariants–Greedy algorithm for finding token

conjunction signatures

•Highly accurate while much faster–Both analytically and experimentally –Compared with the latest work, polygraph–Suffix array based token extraction

•Provable attack resilience guarantee•Noise tolerant

Page 11: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

11

Outline

•Motivation•Hamsa Design•Model-based Signature Generation•Evaluation•Related Work•Conclusion

Page 12: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

12

Hamsa Signature Generator

• Core part: Model-based Greedy Signature Generation

• Iterative approach for multiple worms

TokenExtractor Tokens

FilterPool sizetoo small?

NO

SuspiciousTraffic Pool

NormalTraffic Pool

YES

Quit

SignatureRefiner

SignatureTokenIdentification

Core

Page 13: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

13

Problem Formulation

SignatureGenerator Signature

false positive bound

Maximize the coverage in the suspicious pool

False positive in the normal pool is bounded by

Suspicious pool

Normal pool

With noise NP-Hard!

Without noise, can be solve linearly using token extraction

Page 14: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

14

Model Uniqueness of Invariants

FP21%

9%

17%

5%

t1

Joint FP with t1

2%

0.5%

1%

t2

The total number of tokens bounded by k*

U(1)=upper bound of FP(t1) U(2)=upper bound of FP(t1,t2)

Page 15: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

15

Signature Generation Algorithm

(82%, 50%)

(COV, FP)

(70%, 11%)

(67%, 30%)

(62%, 15%)

(50%, 25%)

(41%, 55%)

(36%, 41%)

(12%, 9%)

u(1)=15%Suspicious pool tokens

token extraction

Order by coverage

t1

Page 16: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

16

(82%, 50%)

(COV, FP)

(70%, 11%)

(67%, 30%)

(62%, 15%)

(50%, 25%)

(41%, 55%)

(36%, 41%)

(12%, 9%)

t1

Order by joint coverage with t1

(69%, 9.8%)

(COV, FP)

(68%, 8.5%)

(67%, 1%)

(40%, 2.5%)

(35%, 12%)

(31%, 9%)

(10%, 0.5%)

u(2)=7.5%t2

Signature

Signature Generation Algorithm

Page 17: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

17

Algorithm Runtime Analysis

•Preprocessing need:O(m + n + T*l + T*(|M|+|N|))

• Running time: O(T*(|M|+|N|))– In most case |M| < |N| so, it can

reduce to O(T*|N|)

T : the # of tokens l: the maximum length of tokens

|M|: the # of flows in the suspicious pool

|N|: the # of flows in the normal pool

m: the # of bytes in the suspicious pool

n: the # of bytes in the normal pool

Page 18: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

18

Provable Attack Resilience Guarantee

• Proved the worse case bound on false negative given the false positive

• Analytically bound the worst attackers can do!

• Example: K*=5, u(1)=0.2, u(2)=0.08, u(3)=0.04, u(4)=0.02, u(5)=0.01 and =0.01

• The better the flow classifier, the lower are the false negatives

Noise ratio FP upper bound

FN upper bound

5% 1% 1.84%

10% 1% 3.89%

20% 1% 8.75%

Page 19: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

19

Attack Resilience Assumptions

• Common assumptions for any sig generation sys

1. The attacker cannot control which worm samples are encountered by Hamsa

2. The attacker cannot control which worm samples encountered will be classified as worm samples by the flow classifier

• Unique assumptions for token-based schemes1. The attacker cannot change the frequency of

tokens in normal traffic2. The attacker cannot control which normal

samples encountered are classified as worm samples by the worm flow classifier

Page 20: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

20

Attack Resilience Assumptions

• Attacks to the flow classifier– Our approach does not depend on perfect

flow classifiers– But with 99% noise, no approach can work!– High noise injection makes the worm

propagate less efficiently.

• Enhance flow classifiers– Cluster suspicious flows by return messages– Information theory based approaches

(DePaul Univ)

Page 21: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

21

Generalizing Signature Generation with noise

• BEST Signature = Balanced Signature– Balance the sensitivity with the specificity– Create notation scoring function:

score(cov, fp, …) to evaluate the goodness of signature

– Current used

» Intuition: it is better to reduce the coverage 1/a if the false positive becomes 10 times smaller.

» Add some weight to the length of signature (LEN) to break ties between the signatures with same coverage and false positive

LENCOVFPLENFPCOVscore )10),log((),,(

Page 22: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

22

Hamsa Signature Generator

Next: Token extraction and token identification

TokenExtractor Tokens

FilterPool sizetoo small?

NO

SuspiciousTraffic Pool

NormalTraffic Pool

YES

Quit

SignatureRefiner

SignatureTokenIdentification

Core

Page 23: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

23

Token Exaction•Problem formulation:

– Input: a set of strings, and minimum length l and minimum coverage COVmin

–Output: »A set of tokens (substrings) meet the minimum

length and coverage requirements• Coverage: the portion of strings having the token

»Corresponding sample vectors for each token

•Main techniques:–Suffix array–LCP (Longest Common Prefix) array, and LCP

intervals–Token Exaction Algorithm (TEA)

Page 24: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

24

Suffix Array• Illustration by an

example– String1: abrac, String2:

adabra

– Cat together: abracadabra$

– All suffix: a$, ra$, bra$, abra$, dabra$…

– Sort all the suffix:– 4n space– Sorting can be done in

4n space and O(nlog(n)) time

a 10

abra 7

abracadabra 0

acadabra 3

adabra 5

bra 8

bracadabra 1

cadabra 4

dabra 6

ra 9

racadabra 2

Page 25: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

25

LCP Array and LCP Intervals

Suffixes sufarr lcparr idx str

a 10 - (0) 0 2

abra 7 1 1 2

abracadabra 0 4 2 1

acadabra 3 1 3 1

adabra 5 1 4 2

bra 8 0 5 2

bracadabra 1 3 6 1

cadabra 4 0 7 1

dabra 6 0 8 2

ra 9 0 9 2

racadabra 2 2 10 1

0-[0,10]

1-[0,4] 3-[5,6] 2-[9,10]

4-[1..2]

LCP intervals => tokens

Page 26: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

26

Token Exaction Algorithm (TEA)

•Find eligible LCP intervals first•Then find the tokens

Page 27: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

27

Token Exaction Algorithm (TEA)

Page 28: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

28

Token Exaction Algorithm (TEA)

Page 29: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

29

Token Identification

•For normal traffic, pre-compute and store suffix array offline

•For a given token, binary search in suffix array gives the corresponding LCP intervals

•O(log(n)) time complexity–More sophisticated O(1) algorithm is

possible, may require more space

Page 30: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

30

Implementation Details

• Token Extraction: extract a set of tokens with minimum length l and minimum coverage COVmin.

– Polygraph use suffix tree based approach: 20n space and time consuming.

– Our approach: Enhanced suffix array 8n space and much faster! (at least 20 times)

• Calculate false positive when check U-bounds (Token Identification)

– Again suffix array based approach, but for a 300MB normal pool, 1.2GB suffix array still large!

– Optimization: using MMAP, memory usage: 150 ~ 250MB

Page 31: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

31

Hamsa Signature Generator

TokenExtractor Tokens

FilterPool sizetoo small?

NO

SuspiciousTraffic Pool

NormalTraffic Pool

YES

Quit

SignatureRefiner

SignatureTokenIdentification

Core

Next: signature refinement

Page 32: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

32

Signature Refinement

•Why refinement? –Produce a signature with same

sensitivity but better specificity

•How?–After we use the core algorithm to get

the greedy signature, we believe the samples matched by the greedy signature are all worm samples

–Reduce to a signature generation without noise problem. Do another round token extraction

Page 33: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

33

Extend to Detect Multiple Worms

•Iteratively use single worm detector to detect multiple worms–At the first iteration, the algorithm

find the signature for the most popular worms in the suspicious pool.

–All other worms and normal traffic treat as noise

Page 34: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

34

Practical Issues on Data Normalization

•Typical cases need data normalization– IP packet fragmentation–TCP flow reassembly (defend fragroute)–RPC fragmentation–URL Obfuscation–HTML Obfuscation–Telnet/FTP Evasion by \backspace or \

delete keys•Normalization translates data into

the canonical form

Page 35: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

35

•Hamsa with data normalization works better

•Without or with weak data normalization, Hamsa still work–But because the data many have

different forms of encoding, may produce multiple signature for a single worm

–Need sufficient samples for each form of encoding

Practical Issues on Data Normalization (II)

Page 36: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

36

Outline

• Motivation• Hamsa Design• Model-based Signature Generation• Evaluation• Related Work• Conclusion

Page 37: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

37

Experiment Methodology

• Experiential setup:– Suspicious pool:

» Three pseudo polymorphic worms based on real exploits (Code-Red II, Apache-Knacker and ATPhttpd),

» Two polymorphic engines from Internet (CLET and TAPiON).

– Normal pool: 2 hour departmental http trace (326MB)

• Signature evaluation:– False negative: 5000 generated worm samples

per worm– False positive:

» 4-day departmental http trace (12.6 GB)» 3.7GB web crawling including .mp3, .rm, .ppt, .pdf, .swf

etc.» /usr/bin of Linux Fedora Core 4

Page 38: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

38

Results on Signature Quality

• Single worm with noise– Suspicious pool size: 100 and 200 samples– Noise ratio: 0%, 10%, 30%, 50%, 70%– Noise samples randomly picked from the

normal pool– Always get above signatures and accuracy.

WormsTraining

FNTraining

FPEvaluation

FNEvaluatio

nFP

Binaryevaluation

FP

Signature

Code-Red II 0 0 0 0 0

{'.ida?': 1, '%u780': 1, ' HTTP/1.0\r\n': 1, 'GET /': 1, '%u': 2}

CLET 0 0.109% 0 0.06236% 0.268%

{'0\x8b': 1, '\xff\xff\xff': 1,'t\x07\xeb': 1}

Page 39: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

39

Results on Signature Quality (II)

• Suspicious pool with high noise ratio:– For noise ratio 50% and 70%, sometimes we

can produce two signatures, one is the true worm signature, anther solely from noise, due to the locality of the noise.

– The false positive of these noise signatures have to be very small:

» Mean: 0.09%» Maximum: 0.7%

• Multiple worms with noises give similar results

Page 40: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

40

Experiment: U-bound evaluation

• • To be conservative we chose k*=15.

– u(k*)= u(15)= 9.16*10-6.

• u(1) and ur evaluation– We tested:u(1) = [0.02, 0.04, 0.06, 0.08, 0.10,

0.20, 0.30, 0.40, 0.5]

– and ur = [0.20, 0.40, 0.60, 0.8].

– The minimum (u(1), ur) works for all our worms was (0.08,0.20)

– In practice, we use conservative value (0.15,0.5)

*1 1 *)1()( kiuuiu ir

Page 41: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

41

Speed Results• Implementation with C++/Python

– 500 samples with 20% noise, 100MB normal traffic pool, 15 seconds on an XEON 2.8Ghz, 112MB memory consumption

• Speed comparison with Polygraph– Asymptotic runtime: O(T) vs. O(|M|2), when |M|

increase, T won’t increase as fast as |M|!– Experimental: 64 to 361 times faster (polygraph

vs. ours, both in python)

0

1000

2000

3000

100 200 300 400

pool size

the

nu

mb

er

of

tok

en

s

20% noise

30% noise40% noise

50% noise

Page 42: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

42

Experiment: Sample requirement

• Coincidental-pattern attack [Polygraph]• Results

– For the three pseudo worms, 10 samples can get good results

– CLET and TAPiON at least need 50 samples

• Conclusion– For better signatures, to be conservative, at

least need 100+ samplesRequire scalable and fast signature generation!

Page 43: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

43

Token-fit Attack Can Fail Polygraph

•Polygraph: hierarchical clustering to find signatures w/ smallest false positives

•With the token distribution of the noise in the suspicious pool, the attacker can make the worm samples more like noise traffic –Different worm samples encode different

noise tokens

•Our approach can still work!

Page 44: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

44

Token-fit attack could make Polygraph fail

Noise samplesN1 N2 N3

Worm samplesW1

W2 W3

MergeCandidate 1

MergeCandidate 2

MergeCandidate 3

CANNOT merge further!NO true signature found!

Page 45: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

45

Experiment: Token-fit attack

• Suspicious of 50 samples with 50% noise• Elaborate different worm samples like

different noise samples.• Results

– Polygraph 100% false negative– Hamsa still can get the correct signature as

before!

Page 46: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

46

Outline

• Motivation• Hamsa Design• Model-based Signature Generation• Evaluation• Related Work• Conclusion

Page 47: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

47

Related worksHamsa Polygrap

hCFG PADS Nemea

nCOVERS Malware

Detection

Network or host based

Network

Network Network

Host Host Host Host

Content or behavior based

Contentbased

Contentbased

Behaviorbased

Contentbased

Contentbased

Behavior based

Behaviorbased

Noise tolerance

Yes Yes (slow)

Yes No No Yes Yes

Multi worms in one protocol

Yes Yes (slow)

Yes No Yes Yes Yes

On-line sig matching

Fast Fast Slow Fast Fast Fast Slow

Generality Generalpurpose

Generalpurpose

Generalpurpose

Generalpurpose

Protocolspecific

Serverspecific

Generalpurpose

Provable atk resilience

Yes No No No No No No

Information exploited

Page 48: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

48

Conclusion• Network based signature generation

and matching are important and challenging

• Hamsa: automated signature generation

– Fast– Noise tolerant– Provable attack resilience– Capable of detecting multiple worms in a

single application protocol

• Proposed a model to describe the worm invariants

Page 49: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

Questions ?

Page 50: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

50

Results on Signature Quality (II)

• Suspicious pool with high noise ratio:– For noise ratio 50% and 70%, sometimes we

can produce two signatures, one is the true worm signature, anther solely from noise.

– The false positive of these noise signatures have to be very small:

» Mean: 0.09%» Maximum: 0.7%

• Multiple worms with noises give similar results

Page 51: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

51

Normal Traffic Poisoning Attack

• We found our approach is not sensitive to the normal traffic pool used

• History: last 6 months time window• The attacker has to poison the normal

traffic 6 month ahead!• 6 month the vulnerability may have

been patched!• Poisoning the popular protocol is very

difficult.

Page 52: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

52

Red Herring Attack

•Hard to implement•Dynamic updating problem.

Again our approach is fast•Partial Signature matching, in

extended version.

Page 53: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

53

Coincidental Attack

•As mentioned in the Polygraph paper, increase the sample requirement

•Again, our approach are scalable and fast

Page 54: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

54

Model Uniqueness of Invariants•Let worm has a set of invariants:

Determine their order by:

t1: the token with minimum false positive in normal traffic. u(1) is the upper bound of the false positive of t1

t2: the token with minimum joint false positive with t1 FP({t1,t2}) bounded by u(2)

ti: the token with minimum joint false positive with {t1, t2, ti-1}. FP({t1,t2,…,ti}) bounded by u(i)

The total number of tokens bounded by k*

jtFPtFP j })({})({ 1

1 }),({}),({ 121 jttFPttFP j

1 }),,...,({}),...,({ 111 ijtttFPttFP jii

Page 55: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

55

Problem FormulationNoisy Token Multiset Signature Generation

Problem :

INPUT: Suspicious pool and normal traffic pool N; value <1.OUTPUT: A multi-set of tokens signature S={(t1, n1), . . . (tk, nk)} such that the signature can maximize the coverage in the suspicious pool and the false positive in normal pool should less than

•Without noise, exist polynomial time algo•With noise, NP-Hard

Page 56: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

56

Generalizing Signature Generation with noise

• BEST Signature = Balanced Signature– Balance the sensitivity with the specificity– But how? Create notation Scoring

function:score(cov, fp, …) to evaluate the goodness of signature

– Current used

» Intuition: it is better to reduce the coverage 1/a if the false positive becomes 10 times smaller.

» Add some weight to the length of signature (LEN) to break ties between the signatures with same coverage and false positive

LENCOVFPLENFPCOVscore )10),log((),,(

Page 57: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

57

Generalizing Signature Generation with noise

• Algorithm: similar

• Running time: same as previous simple form

• Attack Resilience Guarantee: similar

Page 58: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

58

Extension to multiple worm

• Iteratively use single worm detector to detect multiple worm

– At the first iteration, the algorithm find the signature for the most popular worms in the suspicious pool. All other worms and normal traffic treat as noise.

– Though the analysis for the single worm can apply to multiple worms, but the bound are not very promising. Reason: high noise ratio

Page 59: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

59

Token Extraction

• Extract a set of tokens with minimum length lmin and coverage COVmin. And for each token output the frequency vector.

• Polygraph use suffix tree based approach: 20n space and time consuming.

• Our approach:– Enhanced suffix array 4n space– Much faster, at least 50(UPDATE) times!– Can apply to Polygraph also.

Page 60: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

60

Calculate the false positive

• We need to have the false positive to check the U-bounds

• Again suffix array based approach, but for a 300MB normal pool, 1.2GB suffix array still large!

• Improvements– Caching– MMAP suffix array. True memory usage: 150

~ 250MB.– 2 level normal pool– Hardware based fast string matching– Compress normal pool and string matching

algorithms directly over compressed strings

Page 61: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

61

Future works

•Enhance the flow classifiers–Cluster suspicious flows by return

messages–Malicious flow verification by

replaying to Address Space Randomization enabled servers.