looking for apts? c&c is a goodplace to start

28
Looking for APTs? C&C is a good place to start LOOKING FOR APTS? C&C IS A GOOD PLACE TO START MOSHE ZIONI, @DALMOZ_

Upload: moshe-zioni

Post on 13-Apr-2017

63 views

Category:

Technology


2 download

TRANSCRIPT

Looking for APTs?

C&C is a good place to start

LOOKING FOR APTS?

C&C IS A GOOD

PLACE TO STARTM O S H E Z I O N I , @ D A L M O Z _

ON THE AGENDA• Why focus on C&C?• C&C - Landscape• Trends in C&C implementations• Traditional Approaches

– Signature based detection– Anomaly based detection

• Our approach• Limitations• Proof-of-Concept results• Takeaways• Q&A

WHOAMI – CREDITS & KUDOS•Moshe Zioni ( @dalmoz_ )• Leading a terrific group of talented researches at• Researching and developing cutting-edge, next generation solution for malicious activity detection on very big enterprises and ISPs. • Credit & Kudos goes to the Research team, especially to Michael, Eddie, Maria, Oren and Vadim, and to the Analysis team.

WHY FOCUS ON C&C CHANNELS?• Always present (almost)• Network interception is practical, contrast to other detection

methods/layers• While malware tends to be polymorphic, communication

protocol does not

• An old problem –– Current schemes of detection are not so promising on

detecting the ‘new’.– Traditional tactics rely heavily on somewhat naïve

comparison.

C&C LANDSCAPEC&C observed protocol distri-

bution

HTTP Native DNS

Name MethodDridex HTTP -> pastebin + social networksNano Locker ICMPPoisn IVY HTTPFLAME HTTPCITADEL HTTPBergard HTTPVawtrack URLZONE HTTPBlackMoon HTTPWekby DNSGOZ P2PDORKBOT HTTPDRIDEX P2PSIMDA NATIVE + HTTPREGIN NATIVE (TCP + UDP ) +ICMP + HTTPSOUNDFIX-11 HTTPJAKU HTTP /NATIVE TCP / DNS

TRENDS IN C&C IMPLEMENTATIONS•Rapid, fast to respond, evolution•Encryption of transmissions and payload•Encapsulation of transmissions•Steganography of messages•P2P – Forget about SPOF

TRADITIONAL APPROACHES• Signature based detection

–Blacklists/ known patterns

–Constantly needs upkeep and maintenance

–Low False Positive–Forever rely

Intelligence and Analysis

–Not suitable at all to find ‘unknown’ schemes

–High False Negative

• Anomaly based detection– Markov models– ARMA– Baseline comparison– Assuming normal traffic

differ, in statistic modelling, of malicious traffic, might reveal novel schemes.

– This assumption is failing many times in current trends.

– High False Positive Rate

OUR APPROACHC H O O S I N G A N A LT E R N AT E PAT H

WHAT DO WE NEED?

• We need something robust, that can “think” of many possibilities.• Rely on what we do know and

induce further.• Fast (polynomial) results.

• MACHINE LEARNING - For The Win!

ENTER MACHINE LEARNING• Machine Learning is the science of providing a

computer with the ability to “learn” by example and teach itself to find patterns.

• Evolved from Pattern Recognition and Artificial Intelligence studies.

• There are many methods of ML – each one has its pros and cons.

• The model ‘learns’ from known, classified data, and extrapolate to achieve even nontrivial results. (for a human)

SUPERVISED LEARNING• Rely on labelled training data.

• Collection is key for optimized model and for reducing error levels• Data sample set should be comprised of

encompassing, diverse and relevant data.

• We used Decision Tree-Random Forest based Supervised learning• Resulted in False Positive Rate (FPR) of ~ 10-5

SUIT TIE CHARM SMILE BAD-TEETH CAT CLASSFEATUREEXRACTION

Click icon to add picture

UNDERFITTING,OVERFITTINGAND THE“GOOD FIT”

WHY OVERFITTING HAPPENS?•Noise in collection

• Lack of relevant, reprasantive, samples

•Overly complex function generation

FEATURE SELECTION IN TCP/HTTPTime differences# of bytes# of unique URI calls within the session# of “user agent” strings used How many file types were downloaded?How many requests got an answer?What is the average status code?How much time it took to get an answer?What is the length of the host name?What is the length of the user-agent?What is the avg. length of the URI?…...

TCP packets

HTTP Requests

HTTP Sessions

HTTP Transactions

BUT, FIRST

•Appropriate data collection and feature engineering is crucial for a proper, effective, model•Machine learning results are hard to interpret – most of the times the question of ‘How did the machine decided that is malicious traffic?!’ Is not straight-forwardly answered.•Do not succumb to overfitting.

SAMPLE RESULTS

SPAMTORTE – OLD VERSION COMM.POST /some/uri.php HTTP/1.1

…\r\n\r\nlayer=cXJjb3JtYUJxamNwaW5jcWdwcSxhbW8=&dimm=Pl dRR1A8YG12bGd2XW9ja25ncD4tV1FHUDwIPkxDT0c8IG9ja25ncGBteiA+LUxDT0c 8CD5RV0BIPHFyY28iYG12bGd2ImtsImNhdmttbD4tUVdASDwIPlFATUZbPAhWamtxIm9ncXFjZWcidWNxInFnbHYiZHBtbyJjImFtb3JwbW9rcWdmIm9jYWprbGcsCD4tU UBNRls8&err=1(Source: Akamai)

SPAMTORTE - VERSION COMPARISONOld version body contents:

layer=cXJjb3JtYUJxamNwaW5jcWdwcSxhbW8=&dimm=Pl dRR1A8YG12bGd2XW9ja25ncD4tV1FHUDwIPkxDT0c8IG9ja25ncGBteiA+LUxDT0c 8CD5RV0BIPHFyY28iYG12bGd2ImtsImNhdmttbD4tUVdASDwIPlFATUZbPAhWamtxIm9ncXFjZWcidWNxInFnbHYiZHBtbyJjImFtb3JwbW9rcWdmIm9jYWprbGcsCD4tU UBNRls8&err=1(Source: Akamai)

New version POST Request body contents: (keeping the first letter and randomizing, 2-5 chars each)

ljj=Y24sZXBnZ2xnNTs1OyxjZUJlb2NrbixhbW8hY2hY24sZXdjcGZCbmdjdGt2dixhbW8hY24sZXdY2tuLGFtbyFjbixld2tjcGZCam12b2NrbixkcCFjbixld2tjcGZCbmNybXF2ZyxsZ3YhY24sZXdrYG10a2FqQmVvY2tuLGFtbw3%3D&dhgxbg=PldRR1A8Zm1sY25mcW1sNDQ6Pi1XUUdrcWo9Ij5gcDwiPmBwPCJwZ3JueyJvZyJrZCJ7bXcidW13bmYibmtpZyJ2bSJxZ2cib3sicmptdm1xLCJRZ2cie21&ejv=o

SPAMTORTE – MALWARE UPGRADES

Filename MD5 SizeOLD (32bit version)

1faf27f6b8e8a9cadb611f668a01cf73 47,509

OLD (64bit version)

cb0477445fef9c5f1a5b6689bbfb941e 52,515

NEW (32bit version)

c547177e6f8b2cb8be26185073d64edc

87,875

NEW (64bit version)

d04c492a5b78516a7a36cc2e1e8bf521

95,063

SPAMTORTE – PLOT TWIST!WE DIDN’T EVEN HAVE THE SAMPLE FOR THE OLD VERSION!

GETTING A HOLD OF THE DETAILS:

• SpamTorte v2: http://cyber.verint.com/spamtorte-version-2/

• Extra! http://cyber.verint.com/nymaim-malware-variant/

FINDING APTS – THE ‘INFY’ CAMPAIGN• Infy a.k.a Prince of Persia was found to be malicious by the system during

April 2015, Palo Alto detected it too during May 2016 and released a report about the campaign.

TIMELINE(STARTING FROM OUR SOLUTION DEPLOYMENT)

WHAT THE FUTURE HOLDS?

–TCP Layer features–SSL/TLS Negotiation related features–Collection corrections

We will be happy to share the results of it next year

KEY TAKEAWAYS• Traditional schemes are not relevant for the goal of APT detection•Machine Learning is the key for uncovering unknown traffic• Collection is gold and should be considered the most crucial part of the operation, if not – may lead to very error-prone models•Overfitting is the devil• C&C communications are becoming rapidly encrypted

THANKS

Q&AM O S H E Z I O N I , @ D A L M O Z _