gitg342 an enhanced model for network flow based botnet detection proceedings of the 38th...

GITG342

An Enhanced Model for Network Flow Based Botnet Detection

Proceedings of the 38th Australasian Computer Science Conference (ACSC 2015), Sydney, Australia,27 - 30 January 2015

Network Forensics 수업발표48 기 손세웅

2 GITG342

Introduction Suggested Mechanism

• Abstraction• The botnet is a group of hijacked computers, which are employed under com-

mand and control mechanism administered by a botmaster.• Botnet evolved from IRC based centralized botnet to employing common proto-

cols such as HTTP with decentralized architectures and then peer-to-peer de-signs.

• In this paper, we propose techniques to detect botnets by analysing network traf-fic flows.

• We developed templates for capturing traffic flows with more relevant attributes for botnet detection.

• Also we make use of the IPFIX standard for the specification of the templates. Hence our techniques can be used to detect different bot families with lesser overheads and are vendor neutral.

• Keywords: botnet, network security, IPFIX, NetFlow

3 GITG342


• Botnet evolved from IRC based centralized botnets to employ common pro-tocols such as HTTP with decentralized architectures and even peer-to-peer designs (Gu, Zhang & Lee 2008)

• Botnets also used methods such as domain flux (Domain Generation Algo-rithms), IP flux (Single flux or Double flux) (Antonakakis et al. 2011) DNS techniques and encryption to evade detection.

• Detecting botnets have become very challenging and requires an enormous effort to find sophisticated monitoring and detection methods.

4 GITG342


• IP Flow data has been used to detect various malwares including botnets in a high speed, large volume data networks.

• Flows are defined as group of packets, which share common characteristics such as source IP, destination IP, source port, destination port and protocol type.

• Recently IPFIX (IETF 2007) (Quitteket al. 2008) has been developed to make common standard for IP flows.

• IPFIX address transport protocol, security, IETF and vender specific informa-tion elements in addition to the principles of NetFlow v9 (Cisco Inc 2012) such as separate templates and records.

• One of the main concerns in IPFIX is to use congestion-aware protocol in or-der to transfer data. So Stream Control Transport Protocol (SCTP, RFC 2960) and Stream Control Transport Protocol – Partially Reliable (SCTPPR, RFC 3760) have defined to export data to the collectors while facilitating conven-tional protocols; TCP or UDP can be used to transport data however SCTP-PR is preferred.

5 GITG342


• Contribution• we propose a methodology to enhance botnet detection techniques by leverag-

ing features of flexible network flows.• Firstly, we identify suitable data set templates with more relevant attributes for

botnet detection from IP flows, using IPFIX.• Also we customize each template according to various botnet families and at-

tributes to enhance the effectiveness of the system. • Secondly, we use IP flow data to detect botnet behaviours in unlabelled traffic. To

best of our knowledge, this is a novel approach and it enhances existing IP flow based botnet detection research.

• We develop a general network flow based botnet detection framework that is based on flexible IP flows (IPFIX) in order to overcome limitations of current net-work flow based botnet detection systems.

• We create a generic IP flow template that is flexible for use with existing network flow based botnet detection systems. This results in minimising the dataset over-head for capturing the flow while keeping all the necessary attributes for detect-ing the bots.

6 GITG342


• Model – High level overview• Generic templates are used for capturing flow information using IPFIX. • Flow collector is a centralized server that is used for storing and organizing the

data captured at different devices by using the generic templates.• Filtering is used to reduce dataset by filtering out unwanted data that is not re-

lated to botnets.• Finally botnet detection engine, correlates flow information using machine learn-

ing techniques to find the pattern and detect bots.

Figure 1: High-level system overview

7 GITG342


• Architecture - Generic templates• Templates are used for capturing network traffic flows from different devices for

detecting the bots.• IETF introduced open standard for flexible IP flows named as IPFIX. • This IP fixed abstracts most of the features from NetFlow version 9 to define

standard for IP flows. • However unlike Netflow version 9, IPFIX added variable length fields.• Furthermore, IPFIX standard allow users to select wider range of attributes from

more than 400 different data points (RFC7012, 2013), Hence IPFIX enables the users to define the templates with any of the attributes for traffic classification.

• Customized templates allow us to develop new data templates based on botnet families, adding new features from IP header, adding new features from pay load (DNS attributes), adding new statistical attributes (average pay load size) in order to detect botnets. Further, this will facilitates to detect DGA and fast flux attacks by introducing DNS attributes in to network flow data set.

8 GITG342


• Architecture - Flow collector• As flexible IP flow comes with variable size filed set and data set, flow collector

enables storing, according to the requirement of flow templates.• The network devices send their data along with data templates to the flow col-

lector.• The flow collector software facilitates the security administrators to define stor-

age template to match with the flow templates for storing the data. • Then collector builds a raw dataset based on information from templates, by stor-

ing received data in correct order. • Data is arranged into directories based on type of traffic, flow direction, year,

month, date and time.• Flow collector also provides additional features such as compression of the data

to save the storage space and ability to perform search on the stored data.• Another important feature of flow collector is to facilitate usage of various ana-

lytical tools on the stored data to eliminate unwanted traffic and detect botnets.

9 GITG342


• Architecture - Filtering engine• Similar to (Strayer et al. 2008) we use a five step filtering mechanism for botnet

detection system. • Firstly, Strayer mainly targets detection of IRC based botnet. Our model targets a

range of bot families. Hence we make use of TCP and UDP flows for the detection of bots.

• Secondly, Strayer uses a filter to remove the nuisance port-scanning chaff in or-der to reduce dataset. instead of filtering out unestablished flow we categorize flows with only SYN or RST flags in order to detect botnets.

• Thirdly, Strayer introduced a filter to remove high volume data flows as C&C communication is not supposed to generate high bitrate data flows. we use simi-lar filter to distinct legitimate peer-to-peer traffic from P2P or hybrid botnets’ C&C channels.

• Fourthly, Strayer uses filter to remove flows with large IP packets specifically packets size above 300 bytes. However some botnets use large data packets to send stolen data to its C&C server. Hence we consider all the packets for our analysis.

• Finally, Strayer used filter to remove flows with 2 or less packets or very brief time windows to remove port scanning flows. In our model we make use of this filter to identify vulnerability or open port scanning by the botnets to find new victims.

10 GITG342


• Botnet Detection Engine• First step of the engine is to classify or cluster the flows in data set in order to

identify similarities of the flows. • There are three main behavioural patterns; bot behaviour, botnet behaviour and

temporal behaviour are used in our system.• In the bot behaviour, we analyse flows generated from one bot or single machine

to identify its C&C communication or attack graphs.• In botnet behaviour, we analysis flows generated by group of bots or machines,

in order to detect botnet activities. • Finally, in temporal or vertical method, we analysis flows generated by bots or

botnets over the period of time in order to detect patterns.• We have analysed a range of bot families such as IRC, HTTP, peer-to-peer and hy-

brid bots and made several observations that enable for the detection of bots.

11 GITG342


• botnet life cycle (Feily, Shahrestani & Ramadass 2009);• initial infection, secondary infection, connection, malicious command and con-

trol, and update and maintenance. • In the initial infection phase attacker exploits vulnerabilities on victims and gains

basic control over victims. • Then secondary infection phase is used to further download and install malicious

script and binaries to get full control of the victim. • Once secondary infection is complete, bots make connection to its C&C server in

order to become a member of botnet. • Then bot will receive command and control from C&C server to conduct mali-

cious coordinated activities. • Finally bots update its binaries to get more functionality or evade detection.

12 GITG342


• some of the important observations from that analysis of range of bot fami-lies:• In case of direct C&C related bots such as IRC and HTTP every bot needs to find

its own C&C controller to be a member of centralized botnet. • In case of peer-to-peer and Hybrid bots such as Kademila, Chord and GameOver

(Zeus v3) each bots need to find its servant bot or proxy bot to get C&C instruc-tion and become a member of P2P botnet.

• Conversely, to evade from discovery and close it down, the C&C servers’ use dis-tinctive methods like Domain flux (Domain Generation Algorithm - DGA), and IP flux to alter their DNS name or the IP addresses connected with FQDN.

• However, this makes the botnets to connect to different C&C servers instead of connecting to a specific one. As a result, the bots perform a large number of DNS lookups and scan a large volume of addresses to find the C&C server. These pat-terns are used in our model to track the botnet.

13 GITG342


• some of the important observations from that analysis of range of bot fami-lies:• The second observation related to C&C bots is that the bots need to perform fre-

quent communication with the C&C server.• This is essential to keep control and update the botnet by the bot master. Once

C&C server has been discovered, the bots take updates and commands from the botmaster about what type of action has to be executed.

• The action can be whatsoever from sending spam emails to intensifying a DDoS attack. Another important massage type is keep alive packets that are sent from the bot to the C&C server.

• Moreover, some bots (such as Zeus and SpyEye) report back to C&C server with the information they steal from compromised computers.

• Even though botnet designers are working hard to randomize those communica-tions to evade detection there are some vertical or/and horizontal correlation on their communication.

• For example, some of the Zeus bots (Zeus v1.3) send updates at fixed interval of 20 minutes.

14 GITG342


• some of the important observations from that analysis of range of bot fami-lies:• All the botnets desire to spread over its network and recruit more bots into its

botnet, which ultimately benefits to surge the strength of a botnet and conse-quently that of the attack carried out too.

• Hence the bots scan for other machines in its network for vulnerabilities. If a vul-nerable machine is found, they will run exploits to compromise the machine.

• When scanning the network for possible machine to infect, bots generate a burst of small packets.

• So this activity makes a sudden increase in the number of packets without a ma-jor increase in the traffic volume that could be used to detect bots.

15 GITG342


• some of the important observations from that analysis of range of bot fami-lies:• The bot detection engine also looks for DDoS activities such as, outbound TCP

SYN packets having an invalid source IP address. • The reason for these large number of TCP SYN packets could mean that some of

the internal hosts in the network is part of a botnet and are participating in a DDoS attack.

• Some botnets such as (Grum, Bobax, Cutwail and Donbot) generate email spam-ming.

• Email spamming involves sending enormous amount of spam emails advertising fake products intended at financial gains. When the hosts in a network are part of a botnet involved with spamming, they send huge number of emails to the outside world and mostly using some external email server.

• So, unusual SMTP activity from the network to the outside is another significant network activity that is used for tracking the bots.

16 GITG342


• Implementation and analysis:• we obtain two separate

datasets, dataset from analysis of different botnets in our lab environment and dataset from University network over a pe-riod of two days (about 5 GB), which contains everyday usage traffic patterns.

• We merged both the datasets using TCP replay tool to produce test dataset. We have used 10-fold cross-validation technique for training and testing of our model and achieved high detec-tion accuracy.

• we have used a template with 57 attributes (see Figure 2) for capturing the traffic flows of each bot then used machine learning techniques to identify the best attributes for detecting each bot family.

Figure 2: Theoretical framework

17 GITG342


• some of the important observations from that analysis of range of bot fami-lies:• We have used several machine learning techniques such as Bayesian Network,

Neural Network, Support vector Machine, Gaussian and Nearest Neighbour clas-sifier for identifying the best attributes for each bot since none of the machine learning technique was found to be effective for detecting all the bot families.

• For example, Decision Tree classifier is found to be effective for detecting peer-to- peer botnet by analysing the flow intervals.

• SVM and Bayesian networks were found to be effective for detecting C&C com-munication of the centralized botnets such as HTTP and IRC botnets.

18 GITG342

advantages and disadvantages compared with related works

• BotSniffer (Gu, Zhang & Lee 2008) proposed network based anomaly detec-tion techniques for detecting the bots. This system is capable of identifying C&C servers as well as infected hosts in a network. Detection is based on the fact that command and control communication within the same botnets most likely exhibit spatial temporal correlation and similarity such as coordi-nated communication, propagation, DDoS and fake activities.

• However, this methodology limited to detect centralized botnets (IRC and HTTP).

• Botfinder (Tegeler et al. 2012) proposed a methodology that senses bot in a network, using NetFlow v5 dataset.

• Botfinder leverage the discovery that C&C communication of a particular bot family trail specific regular pattern.

• This technique uses average time between start times of two subsequent flows in the trace, average duration of a connection, average number of bytes transferred to the destination, average number of packets transferred to the destination and Fourier transformation over the flow start times in the trace to detect botnet C&C activities.

• This approach also limited to detect centralized botnets (IRC and HTTP).

19 GITG342


• The system called DICSLOSURE (Bilge et al. 2012) present a large-scale botnet (Centralized) detection method based on NetFlow v5 data and improved ma-chine learning techniques.

• Authors identified unavailability of network data sets and terabits per sec-onds line speed.

• Authors consider following limitations of NetFlow in order to build their sys-tem.

• The NetFlow doesn’t contain payload of the packet but aggregated metadata of the packet flows.

• Another limitation of the NetFlow is it can only consider one direction of the flow. NetFlow sampling used in large networks makes another limitation for malware detection

20 GITG342


• It is clear that the related techniques make use of the fixed template avail-able in Netflow v5 and can only detect specific bot families.

• Also high overheads are in their approaches in the aspect of data set size and CPU uti-lization in order to capture the IP flows.

• We have shown that our model can detect a range of bots and has minimal overhead.

• Popular commercial systems such as Arbor-Peakflow, Radware-Defenceflow and Cisco Guard are able to detect botnets in attack phase (ex. DDOS), however our system can detect, bots during different phases in botnet life cycle (ex. infection, command and control, update and attack).

21 GITG342

Improvement strategies

• Our model makes use of IPFIX for designing a generic template to detect a range of bot families.

• We have also shown that generic template has minimal overhead on the routers.

• In the future work we will extend this model to deal with new types of bot-nets.

22 GITG342

참고자료

논문 : 블랙 리스트 접근 트래픽 감시를 통한 봇 탐지 방법 (KNOM Review, Vol. 13, No. 1) 발췌

23 GITG342

참고자료

논문 : 블랙 리스트 접근 트래픽 감시를 통한 봇 탐지 방법 (KNOM Review, Vol. 13, No. 1) 발췌

24 GITG342

참고자료

IP Flow Information Export (From Wikipedia, the free encyclopedia)

Similar to the NetFlow Protocol, IPFIX considers a flow to be any number of packets observed in a specific timeslot and sharing a number of properties, e.g. "same source, same destina-tion, same protocol". Using IPFIX, devices like routers can inform a central monitoring station about their view of a potentially larger network.IPFIX is a push protocol, i.e. each sender will periodically send IPFIX messages to configured receivers without any interaction by the re-ceiver.The actual makeup of data in IPFIX messages is to a great extent up to the sender. IPFIX intro-duces the makeup of these messages to the re-ceiver with the help of special Templates. The sender is also free to use user-defined data types in its messages, so the protocol is freely extensible and can adapt to different scenarios.IPFIX prefers the Stream Control Transmission Protocol as its transport layer protocol, but also allows the use of the Transmission Control Protocol or User Datagram Protocol.

http://en.wikipedia.org/wiki/Stream_Control_Transmission_Protocol

http://en.wikipedia.org/wiki/Transport_layer

http://en.wikipedia.org/wiki/Transmission_Control_Protocol

http://en.wikipedia.org/wiki/User_Datagram_Protocol

gitg342 an enhanced model for network flow based botnet detection proceedings of the 38th...

Documents