1 網路安全 (network security) 黃能富教授清華大學資訊工程學系 /...

1

網路安全(Network Security)

黃能富教授清華大學資訊工程學系 / 通訊工程研究所E-mail: [email protected]

2

Agenda

Introduction of Network Security Content Inspection Technologies Pattern Matching Algorithms Flow Classification by Stateful Mechanism Machine Learning Based Application Identif

ication Technologies Network Security Research Topics Conclusions

3

-- 駭客無所不在 --

2000/3 ：駭客利用 DDos 的網路攻擊方式，引起 Yahoo 、 Amazon 、CNN 、 eBay 等知名網站癱瘓

2001/7 ： Amazon.com 旗下的 Bibliofind 遭駭客盜走顧客的信用卡資料

2002 中美駭客大戰 2003/1 SQL Slammer 攻擊 2003/4 大陸「流光」後門程式 2003/8 Blaster 疾風病毒攻擊 2003/9 SoBig 老大病毒攻擊 2003/9 大陸網軍攻擊 2004/3 Netsky 天網病毒攻擊 2004/4 Sasser 殺手病毒攻擊 2005/5 國內大考中心遭駭客竄改資料 2005/6 外交部網站遭大陸網軍後門程式竊取外交機密

4

網路安全的隱憂

網路攻擊技術日新月異，攻擊工具易於取得 , 界面淺顯易懂，不需高深技巧，即可進行攻擊。

網路攻擊已不侷限於侵入動作，許多攻擊行為旨在阻斷網站之服務能力。

網路通訊設備安全性不足。路由器及交換器僅能檢視封包第三層資訊。

防火牆著重在封包第四層資訊檢查。防毒軟體逐漸無法辨識網路攻擊。

5

網路攻擊工具範例

6


7


8

網路安全基本概念

資料的保密性資訊的可信賴性資訊的可取用性

• 資訊安全政策制定

• 資訊安全教育訓練

• 網路安全弱點評估

• 7x24 資訊安全監控中心

• 資訊安全警訊通報服務

• 安全事件緊急應變與電腦鑑識

• 防火牆監控與管理

• 入侵偵測監控與管理

• 網頁竄改即時監控與復原

Policy

9

網路攻擊種類

Denial of Service (DoS), Distributed Denial of Service (DDoS)

Network Invasion Network Scanning Network Sniffing Torjan Horse and Backdoors Worm

10

(1) DoS/DDoS

Prevent another user from using network connection, or disable server or services: e.g. “Smurf” and “Fraggle” attacks, “Land”, “Teardrop”, “NewTear”, “Bonk”, “Boink”, SYN flooding, “Ping of death”, IGMP Nuke, buffer overflow.

Caused by protocol fault or program fault. It damages the “Availability”.

11

一般常見的 DoS 攻擊

Ping Flooding藉由傳送大量的 ICMP echo 封包至受害主機，以耗盡系統資源。

Ping of Death攻擊者傳送夾帶 65,536 位元組的 ICMP echo 封包至受害主機，而受害主機將因此而當機 (TCP/IP 協定實作漏洞 ) 。

UDP flooding (Chargen)攻擊者傳送大量的 UDP 封包至受害網路廣播位址的十九埠（ Port 19, Character Generator ），造成此網路的所有主機皆送出回應的 UDP 封包，耗盡網路的頻寬。

12

一般常見的 DoS 攻擊

Smurf Attack借刀殺人計策。攻擊者對某網域的廣播位址傳送 ICMP

echo 封包，而來源位址填上欲加害之主機。這會造成此網域的每一台機器均會傳送 ICMP reply 至被害主機，不但此網域頻寬受阻，被害主機也將因此而耗盡系統資源。

SYN flooding攻擊者以每秒鐘送出數千個 SYN 封包（用以建立 TCP連線）的速度攻擊受害主機，並於來源位址填上假造或不存在的網址。造成受害主機回送 SYN-ACK 給不存在的網址，而此假造網址當然不會回應。如此受害主機將無法再接受其他的 TCP 連線，也就無法讓合法的使用者登入。

13

Smurf attack (DoS)

Dangerous attacksNetwork-based, fills access pipesUses ICMP echo/reply (smurf) or UDP echo (fraggle)

packets with broadcast networks to multiply trafficRequires the ability to send spoofed packets

Abuses “bounce-sites” to attack victimsTraffic multiplied by a factor of 50 to 200Low-bandwidth source can kill high-bandwidth con

nections Similar to ping flooding, UDP flooding but more dange

rous due to traffic multiplication

14

“Smurf” Attack (cont’d)

Internet

Perpetrator

Victim

ICMP echo (spoofed source address of victim)Sent to IP broadcast address

ICMP echo reply

15

SYN flooding Attack (DoS)

Goal is to deny access to a TCP service running on a host.

Creates a number of half-open TCP connections which fill up a host’s listen queue; host stops accepting connections.

Requires the TCP service be open to connections from the victim.

16

SYN flooding (cont’d)

Attacker Victim The Innocents

Spoofed SYN ACK to spoofed address

::

17

DDoS AttackAttacker

Handler Handler Handler

Agent Agent Agent Agent Agent Agent Agent

VictimControl messageMaybe encrypted or hidden in normal packets.

Spoofed packets.

18

DDoS Attack

攻擊者從遠端控制多個傀儡機器同時對受害主機做大量的攻擊。攻擊 Yahoo.com ， Amazon.com ， CNN.com ， buy.com和 eb

ay.com 的事件即採用 DDoS 攻擊

HostHost

HostHost

HostHost

HostHost

Server ServerServer

Organization A

Organization B

Internet

OrganizationUnder Attack

Hacker

DDoSAttack

Communication to Compromised machines

19

DDoS 攻擊範例

DDOS 攻擊攻擊程式範例： Trin00 ( 會進行破壞 ) Tribe Flood Network （ TFN ） ( 會進行破壞 ) TFN2K Stacheldraht

Trin00 ： Trin00 可由某機器或某群機器發動，當攻擊發動後，每一台被暗藏 T

rin00 Daemon 的電腦都向受害主機傳送 UDP 封包（含四個位元組的資料），並一直改變目的地的埠號。這造成受害主機疲於奔命地回傳 ICMP port unreachable 訊息，而無法順利地服務合法封包及連線。

TFN ：啟動模式和 Trin00 相同 , 但 TFN 的攻擊較具多樣化。它能傳送 S

YN flood 、 UDP flood 、 ICMP flood 、或 Smurf 攻擊。最新版本的 TFN 已能自行變動攻擊封包上的來源位址，使得安全機制更難以檢查過濾此型攻擊。

20

(2) Network Invasion

Goal is to get into the target system and obtain informationAccount usernames, passwordsSource code, business critical

information Usually caused by improper configurations

or privilege setting, or program fault. Network invasion is diverse and various,

knowledge about attack pattern may help to detect, but it is quite hard to detect all attacks.

21

Example of network invasion: IIS unicode buffer overflow

For IIS 5.0 on windows 2000 without this security patch, a simple URL string: http://address.of.iis5.system/scripts/..%c1%1c../winnt/system32/cmd.exe?/c+dir+c:\ will show the information of root directory.

http://address.of.iis5.system/scripts/..%C1%1C../winnt/system32/cmd.exe?/c+dir+c:%5C




22

(3) Network Scanning

Goal is generally to obtain the chance, the topology of victim’s network.The name and the address of hosts and n

etwork devices.The opened services.

Usually uses technique of ICMP scanning, X’mas scan, SYN-FIN scan, SNMP scan.

There is an automatic and powerful tool: Nmap.

23

(4) Sniffing

Goal is generally to obtain the content of communicationAccount usernames, passwords, mail a

ccountNetwork Topology

Usually a program placing an Ethernet adapter into promiscuous mode and saving information for retrieval later

Hosts running the sniffer program (e.g. NetBus) is often compromised using host attack methods.

24

(5) Backdoor and Torjan horse

Usually, the backdoor and torjan horse is the consequences of invasion or hostile programs.

It may open a private communication channel and wait for remote commands.

Available toolkits: Subseven, BirdSpy, Dragger

It can be detected by monitoring known control channel activities, but not with 100% precision.

25

(6) Worm

The chief intention of worm is to propagate and survive.

It takes advantages of system vulnerabilities to infect and then tries to infect any possible targets.

It may decrease the production of system, leave back doors, steal confidential information and so on.

26

P2P/IM 網安威脅

P2P (Peer-to-Peer) 分享程式 IM (Instant Messenger) 即時通 Spyware 間諜軟體 Adware 廣告軟體 Tunneling 私人隧道

27

P2P: A new paradigm

Bottleneck of Server Powerful PC Flexible, efficient information sharing P2P changes the way of Web (Internet)

callto://alexchiu1968/

callto://alexchiu1968/

28

P2P 即將破壞現存的資安架構

P2P 除了檔案分享與即時通訊，也逐漸發展出不同應用，例如 SoftEther 和 Skype 。對個人用戶，利多於弊，但對企業，為資訊安全一大隱憂

P2P 應用潛藏諸多風險，包括洩漏企業內部機密資訊成為病蟲擴散的管道下載非法檔案侵犯著作權佔用大量網路頻寬影響其他系統正常運作造成員工分心，降低生產力

29

Famous P2P Examples

BitTorrent eZpeer Kuro eDonkey eMule MLdonkey Gnutella Kazaa/Morpheus

Shareaza Direct-connect Gnutella Soulseek Opennap Worklink Opennext Jelawat PP 點點通

SoftEther iMESH MIB WinMix WinMule Skype

30

Instant Messenger (IM)

MSN Yahoo Messenger ICQ YamQQ AIM (AOL IM)

31

網路安全技術演進

Firewall (Layer-4)VPN SSL VPNPKI IDS/IPSDefense-in-DepthApplication Firewall (Layer-7)UTM (Unified Threat Management)NAC (Network Access Control)

32

入侵偵測系統Intrusion Detection System (IDS)

入侵偵測防禦系統Intrusion Detection and Prevention System (IPS

/IDP)

33

Intrusion Detection System

Intrusion Detection System: a computer system that attempts to detect any set of actions that try to compromise the integrity, confidentiality, or availability of a resource.

An IDS has much more knowledge and many delicate detection functions than common firewalls. (Remember that, the main function of a firewall is to do access control).

34

IDS Types

Host based vs. Network based.Misused detection vs. Anomaly

detectionActive vs. PassiveCentralized vs. Distributed

35

Host based & Network based IDS

Host based IDS: installed on target host as a monitor service. It checks system activity, user privilege, user behavior.

Network based IDS: installed on network node, usually in promiscuous mode to listen all passing traffic. It checks network traffic, nodes interactions.

36

Misused detection & Anomaly detection IDS

Misused detection (signature-based): based on the assumption that intrusion attempts can be characterized by the comparison of user activities against a database of known attacks.

Anomaly detection (statistical-based): identify abusive behavior by noting and analyzing audit data that deviates from a predicted norm.

37

Active IDS vs. Passive IDS

Active IDS: an participate in the system. Not only observe the events, but also involve in the necessary operation. Also called IPS or IDP (Intrusion Detection and Prevention System)

Passive IDS: work on a monitor or bystander basis.

38

Passive IDS

LAN

網路入侵攻擊可穿透

ISP

利用 Port Mirror收集封包分析

Active IDS

LAN

網路入侵攻擊被攔截

ISP

直接攔截封包分析

(a) Passive IDS (b) Active IDS

Active IDS v.s. Passive IDS

39

Centralized IDS v.s. Distributed IDS

Centralized: The sensors are managed by a single analyzer or manager.

Distributed: The sensors are managed by multiple automated analyzers or managers. And among analyzers and managers, they can communicate to each other.

40

Comparison between Firewall and Network based active IDS

Same : Can’t protect insider to insider attack.Can’t protect against connections that

don’t go through.Can do ACL and filtering. (For Active IDS)

Different : IDS has the ability to detect new threats. IDS focuses on intrusion while Firewall

focuses on access control and privacy.Firewalls use address as the passport

while IDS will do much more checks.

41

The Challenge of IDS

Speed limitation: NIDS cannot keep pace with the network speed. (NIDS need to check more fields of a packet than a firewall does.)

The inability to see all the traffic: The “switched Ethernet” is getting largely deployed.

Fail-open/fail-close architecture: when a NIDS fails often without notification of the problem to the central console., leave the network as an “open” one. A “fail-closed” methodology means the network is out of service until the NIDS is brought back on-line.

42

IDS False Alarms

43

Content Inspection Technologies

44

A Generic Layer-7 Engine

Packet NormalizerMakes sure the integrity

of incoming packetsEliminates the

ambiguityDecodes URI strings if

necessary Pattern-Matching Engine Policy Engine

Gather information from pattern-matching engine and issue the verdict to allow/drop the packets

Packet Normalizer

Pattern-MatchingEngine

Policy Engine

Network

PacketStream

NormalizedTraffic

MatchedEvents

Logs,Reports

Policies

Verdicts

FilteredTraffic

PacketStream

Signatures

45

Packet Normalizer

Integrity Checking IP Fragment Reassemble TCP Segment Reassemble

TCP Segments may come out-of-orderSEQ out of window sizeSegment Overlapping

URI DecodeURI hex code obfuscation (‘a’ = %61)URI unicode/UTF-8 obfuscationself-referential directories obfuscation (/././././ = /)directories obfuscation (/abc/a/../a/../a/ = /abc/a)

46

Pattern-Matching Engine

The most computation-intensive task in packet processing. Normally the PM engine needs to process every single byte in packet payload.

In Snort, the PM routine accounts for 31% of the total execution time

47

Pattern Matching is Expensive!

•~30 Instructions/ Byte. 45K Instructions/1500 Byte packet

•~50 Instructions/ 1500 Byte packet

Source: Intel Corp.

48

Content Inspection Technologies

Pattern-Matching AlgorithmsSoftware Based

Boyer-Moore Aho-Corasick (AC)Wu-Manber

Hardware BasedBloom-Filter Reconfigure Hardware (FSM)TCAM-based

49

Pattern Matching Problem Definition

Given an input text T = t0, t1, …, tn ,and a finite set of strings P = {P1, P2, …, Pr}, the string matching problem involves locating and identifying the substring of T which is identical to Pj = , 1 j r, where

ts+i = , 0 i m-1. And this equation can be also denoted as

ts…ts+m-1 =

jia

jm

j aa 10 ...

G C A T C G C A G A G A G T A T A C A G T A A GText

G C A G A G A G

50

Aho-Corasick (AC) Algorithm

AC is a classic solution to exact set matching. It works in time O(n + m + z) where z is number of patterns occurrences in T.

AC is based on a refinement of a keyword tree.

AC is a deterministic algorithm. That is, the performance is independent of the number of patterns.

51

An Example of AC Algorithm

Example: P = {ab, ba, babb, bb}

52

An example of AC Algorithm

Dashed: fail transitions; those not shown leads to the root

h

e

e

h r s

i

ss

{hers}

{he, she}

{his}

{sh}{s}

{he}{h}!={h,s}

Patterns:

hers

his

she

53

An example of AC Algorithm

h

e

e

h r s

i

ss

Text: h e i s h i s

h e

i

s

h i s

54

Reconfigure Hardware (FSM)

Implement the AC FSM in configurable Logic Elements (LEs) of FPGA.

Achieve multiple gigabit performance. (Depends on the FPGA model)

A powerful FPGA is necessary to accommodate thousands of patterns, so that it’s not practical and visible in commercial market.

55

FPGA-based pattern matching

FPGA-based

56

Bloom Filter

Given a string X, the Bloom filter computes k hash functions on it producing k hash values ranging from 1 to m. The same procedure is repeated for all the members of the pattern set.

The input text is verified by generating k hash values in the same way. If at least one of these k bits is found not set then the string is declared to be impossible to match.

Patterns in Length n are grouped into Bn.

57

Bloom Filter (Cont.)

1 2 3 4 5 6 7 8 9 …

Payload Stream

A B C D E F G H I J

……B2 B3 B4 Bw

False positive :

Mim f = (0.5)K, while m = (k x n) / Ln2

So, total space, sum(Bi) = m x (w - 1)

if k = 1, n = 2048, m = 3072 bits k = 1, n = 3072, m = 4608 bits

if k = 4, f = 0.0625 k = 5, f = 0.0313 k = 6, f = 0.0156

Bloom Filter (B4)

Bloom Filter (B3)

Bloom Filter (B2)

1

1

1

1

0 m

0 m

0 mH1

H2

H3

Hk0 m

1

1 1

11

1 1

1

Group signature by length :

G2 (X)

G3 (X)

G4 (X)

K Hash functions H1, H2, …, Hk

58

TCAM fundamental

TCAM stores data with three logic values: ‘0’, ‘1’, ‘X’ (don’t care)

Multiple match modes are needed.

59

Policy Engine

Collect the matching events from Pattern-Matching Engine. Clarify the relationship between matched patterns:

Ordered: A policy may consists more than one pattern and should be matched in order.

Offset, Depth: The matched position should be within a certain range or location.

Distance, Within: The distance between two matched patterns should be taken into consideration also.

Trace Application States Some applications are difficult to identify by using only one signature

(e.g. P2P). Policy Engine needs to track the connection state like the following diagram:

S0S1 S2 S3

Msg Exchange

Request File

Data Exchange

60

A Pattern Matching Coprocessor for Deep and Large Signature Set in Network Security System (IEEE GLOBECOM 2005)

Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems (IEEE GLOBECOM2005)

A Time and Memory Efficient String Matching Algorithm for Intrusion Detection Systems, (IEEE GLOBECOM 2006)

A non-Computation Intensive Pre-filter for String Pattern Matching in Network Intrusion Detection Systems, (IEEE GLOBECOM 2006)

Smart Architecture for High-speed Intrusion Detection and Prevention Systems, International Conference on Cryptology and Network Security (CANS 2006, Acceptance rate < 18%).

A Deterministic Cost-effective String Matching Algorithm for Network Intrusion Detection Systems,” (IEEE ICC2007).

A Novel Algorithm and Architecture for High Speed Pattern Matching in Resource-limited Silicon Solution, (IEEE ICC2007)

Flow Digest: A State Synchronization Scheme for Stateful High Availability, (IEEE ICC2007). Performing Packet Content Inspection by Longest Prefix Matching Technology, (IEEE GLO

BECOM2007).

Fast Pattern Matching Algorithms

61

Security SoC

BroadWeb Security SoC ARM922 RISC CPU (250Mhz) Hardware NAT (400Mbps) Hardware Content Inspection Engine (40Mbps) Two 10/100/1000 RJ-45 Ports

Embedded-Linux NSS and ICSA approved IPS signature database IPS/Anti-virus functions IM/P2P Management Turn-key solution (ASIC + Software module)

1-tier Customers

62

Security SoC (Cont.)

BroadWeb Security SoC (2nd Generation) ARM926EJ RISC CPU (300Mhz) Intelligent Hardware NAT (1Gbps) Hardware Content Inspection Engine (100Mbps) Embedded GbE Smart Switch and 4-port GPHY core

NSS and ICSA approved IPS Technology IPS/Anti-virus functions IM/P2P Management Turn-key solution (ASIC+Software module)

1-tier Customers

63

Cisco/Linksys Wireless Security Router

• IEEE 802.11n 108 Mbps EWC Wireless LAN• IPS protection and IM/P2P management• Firewall/VPN/Routing• Gigabit Ethernet x 5

64

State Machine Based Technologies

65

2Login 70 1 PASS_Per

PASV_ReqPASV_Ok

3

ACTIVE_Req

ACTIVE_Ok

5

4

6

LIST_Req

LIST_Ok

File_ReqFile_Ok

Flow_Close

Transitions Trans. ports Patterns

Login

PASS_Per

ACTIVE_Req

ACTIVE_Ok

PASV_Req

PASV_Ok

LIST_Req

LIST_Ok

File_Req

File_Ok

TCP, dport:21 “ PASS”

“ 230” , ” User” , ” logged in”

“ PORT”

“ 200 PORT command successful”

“ 227 Entering Passive Mode”

“ PASV”

“ LIST”

“ 226 Transfer complete.”

“ 226 Transfer complete”

“ RETR”

TCP, sport:21

TCP, dport:21

TCP, dport:21

TCP, dport:21

TCP, dport:21

TCP, sport:21

TCP, sport:21

TCP, sport:21

TCP, sport:21

State Machine Based Technologies

The FA Example : FTP

66

The FAs of BitTorrent protocols.

1Annouce

2Get_Peers Connect_to_Peer

0 0 1

Connect PhaseTransitions Patterns Transitions Patterns

Download Phase

“ GET /announce”“HTTP/1.0 200 OK”, “e5:peers”

0x13, “ BitTorrent protocol”Connect_to_PeerAnnouce

Get_Peers

67

The FAs of Yahoo Messenger protocol.

Flow_Close

Login PhaseTransitions Patterns

Chat PhaseAuth_Resp

Trans. portsTCP, sport:5050

0 1Auth_Resp “ YMSG” , 0x54

2

3

P2P_File_Tx

File_Tx_Status_BRB

Msg_Service

Transitions PatternsTrans. portsMsg_Service

P2P_File_Tx

File_Tx_Status_BRB

“ YMSG” , 0x06

“ YMSG” , 0x4d“ YMSG” , 0x4d, 0x1

TCP, sport:5050

68

We can identify and manage Over 60 Applications

IM MSN, Yahoo Messanger, AIM, Q

Q, Google Talk, TM, ICQ, iChat, MIRC, Odigo, Rediff, Gadu-Gadu

Web-IM Meebo.com, eBuddy.com, iLove

IM.com, MSN, AIM, Yahoo, ICQ P2P

eDonkey, BitTorrent, Gnutella, Foxy, FastTrack, Vagaa, Winny, BitComet, DirectConnect, PiGo, PP365, WInMX, POCO, iMesh, ClubBox

Streaming-Media QQLive, Podcast Bar, PPLive, Re

alPlayer, Window Media Player, iTunes, WinAMP, Player 365, QuickTime, FlashMedia Video, TVAnts

Webmail Yahoo, Hotmail, Gmail

VoIP Skype (3.6)

File Transfer FTP, Web File Transfer, Thunder,

GetRight, FlashGet VPN

VNN, SpftEther, Hamachi, TinyVPN, PacketiX, HTTP-Tunnel, Tor, Ping-Tunnel

Terminal Control VNC, PCAnywhere

Online Game QQGame, OurGame, Cga.com.cn,

QQFO

69

Machine Learning Based Technologies

70

Application Traffic identification

Traffic identification(or traffic classification) issues are focused in recently years since: The introducing of P2P application greatly impacts the

network management task. Port number is not the best and efficient discriminator to

identify these prevalent traffics. How about string matching method? Accurate! But…

It cannot identify the encrypted traffic.High cost on manually maintenance work for

protocol signatures.High cost to match string in very high speed

network.Privacy issue is under debating.

71

How to resolve the problem?

Heuristics methods(2004~2005) Based on some intrinsically different behavior, s

ome rule can be constructed.E.g. # dest ip == of dest port the host is ru

nning P2P. To differentiate P2P or non-P2P traffic.

Machine learning based techniques:(2004 ~ ) To construct the “statistical signatures” for differ

ent categories/application protocols. Most machine learning techniques are directly e

mployed to construct traffic signature.

72

The Milestone of Researches on Application Traffic Identification

Before 2003: String matching and port number. 2003~2005:

HeuristicsMachine learning method.

2006~ : Machine learning method for real-time based traffic classification.First k data packet sizes and direction of

TCP connection.Stage-based classification(Statistical data

in each stage)

73

Different Objects of Application Traffic Identification

At different levels Category level or QoS class (Bulk data transfer - FTP&P2P,

interactive, mail, web, streaming) Protocol level (Kazza, eMule/eDonkey, Bittorrent, MSN, FTP,

POP3, SMTP, HTTP, Skype, Winny, Share,….) Behavior level (FTP control, FTP data, MSN file transfer, MS

N message chatting, MSN voip, Skype Chatting, Skype voip, Skype File transfer, Skype Video conference,…)

All existing researches focus on classification in protocol or category level.

Application field Offline based: traffic trend analysis. Online based: traffic shaping, traffic engineering, security

management.

74

The Classes of Applied Machine Learning Algorithms

Supervised-Machine learning The model of traffic characteristics is

constructed from the training instances with previously defined class label.

Unsupervised-Machine learning (Clustering) The model of traffic characteristics is

constructed from the training instances without previously defined class label.

However, all the existing training set employed by both include pre-classified label. Because each cluster would contain several

different classes/protocols.

75

The Discriminators (Attributes)

The key issues for machine-learning based traffic identification are: What are the most distinguishable

characteristics (attributes/discriminators)? How to remove the expensive cost on training?

Different discriminators: From L3/L4 layer—packet inter-arrival time, total

packet size, number of packets,…,etc. Combination of L3/L4 attributes with different

perspectives. e.g. upload/download size ratio.

76

The Milestone of Researches (Applying Machine Learning techniques)

2003~2004: [Matthew Roughan, IMC’04] Class-of-Service Mapping for Qo

S. 2005:

[Sebastian Zander] Automated Traffic Classification. [Andrew W. Moore] Using Bayesian Analysis Techniques.

2006: [Sebastian Zander] Internet Archeology: Estimating Individual

Application Trends in Incomplete Historic Traffic Traces. [Laurent Bernaille] Traffic classification on the fly. (first 5 pac

kets of TCP with k-means clustering). [Jeffrey Erman] Internet Traffic Identification using Machine L

earning (k-means, EM clustering).

77

2006 (cont.): [Laurent Bernaille] Early Application Identification.(first 4 p

ackets of TCP with k-means, GMM , and HMM clustering) 2007: Real time based methods

[Zhu Li] Accurate Classification of the Internet Traffic Based on the SVM Method. (TCP and UDP flow classification)

[Laurent Bernaille] Early Recognition of Encrypted Application. (first 3 packets of TCP with GMM clustering)

[Jeffrey Erman] Semi-Supervised Network Traffic Classification. (Stage-based classification)

The Milestone of Researches (Applying Machine Learning techniques)

78

Class-of-Service Mapping for QoS: A Statistical Signature-based Approach to IP Traffic

ACM SIGCOMM Internet Measurement Conference (IMC '04)

Matthew Roughan1, Subhabrata Sen2, Oliver Spatscheck2, Nick Duffield2

1School of Mathematical Sciences, University of Adelaide, Australia2AT&T Labs – Research, Florham Park, NJ, USA

79

Introduction

Before this paper: Traditional researches tried to find the model for tr

aditional protocol (FTP, web, mail). Most researches of traffic characteristics modeling

which focus on P2P and IM are case studies. Features:

This paper studied the requirements and proposed a framework of QoS for traffic which consists of traditional and novel P2P/IM application in QoS class level.

Classification is based on utilizing the statistics of particular applications in order to form “signatures”.

80

Ideas

The statistical attributes are aggregated with respect to Server ports and Server IP addresses, separately.

Employing machine learning techniques to construct the mapping from Server port aggregation/Server IP aggregation to different QoS classes.

Nearest Neighbor(NN) Linear Discriminant Analysis(LDA)

Then, the port number of aggregation that belongs to particular QoS class can form one rule.

Disadvantage: Applications that require different QoS might use the same server port number.(e.g. P2P)

81

Nearest Neighbor

To classify a data point x, let’s find the nearest neighbor! The points with same property should be closely. The class of the nearest neighbor will be

assigned to the data point x. K- Nearest Neighbor:

To find the k nearest neighbors and let them “vote”.

More information: http://neural.cs.nthu.edu.tw/jang/books/dcpr/4.2-knnr.asp?title=4-2 K-nearest-neighbor Rule

82

Linear Discriminant Analysis

To find the good “projection” for original points. Linear discriminant analysis finds a linear transformation ("discrimina

nt function") of the two predictors, X and Y, that yields a new set of transformed values that provides a more accurate discrimination than either predictor alone: Transformed Target = C1*X + C2*Y

More information: http://www.dtreg.com/lda.htmhttp://neural.cs.nthu.edu.tw/jang/books/dcpr/index.asp

3 features

2 features

83

Evaluation Example

Attributes for this evaluation: the average packet size, flow duration, bytes per flow, packets per flow, and Root Mean Square (RMS) packet size.

84

Internet Traffic Classification Using Bayesian Analysis Techniques

ACM SIGMETRICS'05

Andrew W. Moore1, Denis Zuev2

1University of Cambridge2University of Oxford

85

Introduction

Features: Only TCP flows are considered. Category-level classification. Supervised-machine-learning

Naïve Bayesian algorithm (貝氏演算法 ). Uniquely use data that has been hand-classified

(based upon flow content) to one of a number of categories.

Feature selection was applied to improved the accuracy.

86

Ideas

Discriminators: About 248 discriminators of each flow.

E.g. Packet inter-arrival time (mean, variance, . . . ), Payload size (mean, variance, . . . ), Fourier Transform of the packet inter-arrival time, TTL value, Flow duration, TCP Port…etc.

Naïve Bayesian classifier For a flow with known statistical attributes, which class is mo

st likely happened? To find the maximum probability Pr(Ci | X):

Ci is i-th classX is the attributes of flow which will be classified.

Only about 65% accuracy on flow level was achieved.

87

Ideas(cont.)

Improvement: Naïve Bayes Kernel estimation method.

Kernel estimation was used instead of Gaussian distribution model assumed by Naïve Bayesian.

Discriminator selection and dimension reduction. The accuracy was improved upto 95%

Disadvantages: All the discriminators are available after the flow is closed. Only TCP flows are considered for classification. Network management might need more finer classes (proto

col level or behavior level).

88

Evaluation for Train and Test sets from traffic of different time

FCBF: Fast Correlation-Based Filter

89

Traffic Classification on the Fly

ACM SIGCOMM Computer Communication Review Journal, Volume 36 , Issue 2, 200604

Laurent Bernaille†, Renata Teixeira†, Ismael Akodkenou†, Augustin Soule‡, Kave Salamatian†

† LIP6, Universit ´e Pierre et Marie Curie, ‡ Thomson Paris Lab

Paris, FRANCE

90

Introduction

Features: The first paper focused on real-time flow-level application

classification. To approximately model the L7 protocol handshaking. Protocol level classification. Unsupervised machine learning.

K-means clustering. (50 clusters are the best)Protocol assignment: for each cluster, the

protocol of the largest proportion dominates the cluster.

Discriminators: the first q data packet sizes (payload) and direction of each TCP connection.q = 5 is the best. (+300, -200, +100, +200, -400)

91

K-means Clustering

For given number of clusters k, to iteratively find k centers of these k clusters and “partition” all the points into these k clusters until the nearest center does not change.

Each data point is expressed as a vector, and Euclidean distance is the most common distance computation function.

92

Evaluation Result

Above 80% average accuracy can be achieved.

Disadvantages:

Only TCP connections are considered.

Protocol assignment will result in classification starvation.

The protocols which don’t dominate any cluster will be always classified as other protocol.

93

Early Application Identification 200612-ACM Conf-CONEXT06

(International Conference On Emerging Networking Experiments And Technologies)

Laurent Bernaille, R. Teixeira and K. Salamatian,

Universit ´e Pierre et Marie Curie LIP6, CNRS

Paris, France

94

Features: Three unsupervised machine learning (clustering) algorithms

were used to evaluate cluster assignment accuracy and protocol labeling accuracy.

K-means Gaussian Mixture Models (GMM) on an Euclidean space Spectral clustering on Hidden Markov Models (HMM, in

order to consider order of packets) Discriminators: size and direction of first P data packets. To deal with the starvation problem in each group, a labeling

heuristic method based on standard server port number (e.g. 25 for SMTP, 110 for POP3) is used to classify protocols in each cluster group.

Only focus on TCP flows. Wireless traffic trace has been included for evaluation.

Introduction

95

Discriminators

Discussion about the discriminators: The size and direction of each packet adds more information to di

stinguish applications than arrival time related metrics. The range of packet sizes for each application is similar across tr

aces. These models can be used to classify the same set of application

s at another network.

P = 4 packets for the three clustering methods. Clustering number:

Kh = 30 for HMM,

Kk = 40 for K-Means and

Kg = 45 for GMM.

96

Packet size is a better attribute

97

On-line Classification

98

Labeling

std(S) ={FTP, SSH, SMTP, HTTP, POP3, NNTP, HTTPS, POP3S}.

set of standard server ports

99

Labeling Accuracy

100

Features

Pros: Easy, fast, and simple!

Payload size and packet direction of first P data packets.

Unsupervised training automatic learning mechanism.

Cons: In [Jeffrey Erman’ HP TR]: “…is unsuccessful clas

sifying application types with variable-length packets in their protocol handshakes such as Gnutella. Neither of these studies access the byte accuracy of their approaches which makes direct comparisons to our work difficult.”

101

Features

Cons: Only TCP are included for classification. According to the description of traces, there are

un-ignorable fraction of flows which contain less than 4 data packets!

And, the control flow might prevent the identification system from classifying detailed protocol behavior.

Classification starvation is still exist for protocols which don’t use standard port.

102

Early Recognition of Encrypted Applications

20070405-0406Passive and Active Measurement Conference (PAM 2007)

Laurent Bernaille, Renata Teixeira

Universit´e Pierre et Marie Curie - LIP6-CNRS

Paris, France

103

Features: The classification of SSL-encrypted protocols. Two stages:SSL detection & Protocol identification. First 3 packets and 35 clusters for Gaussian Mixture Model.

Size of original packet: Most accurate method is to look up the encryption method in

the handshake packets and transform the size of application packets accordingly.

For the five most common ciphers this method is overkill because the increase varies from 21 to 33 bytes.

Simple heuristic: subtract 21 from the size of the encrypted packet regardless of the cipher.

Extending the Cluster+Port labeling heuristic SSL-specific ports: 443 for HTTPS, 993 for IMAPS and 995 for

POP3S.

Introduction

105

Accurate Classification of the Internet Traffic Based on the SVM Method

IEEE ICC 2007

Zhu Li1, Ruixi Yuan1, and Xiaohong Guan1, 2

1Center for Intelligent and Networked Systems (CFINS) Tsinghua University, Beijing 100084 , China

2SKLMS Lab and MOE Key Lab for Intelligent Networks and Network Security Xian Jiatong University, Xi’an 710049, China

106

Features: Category level classification. Supervised-machine learning.

Support Vector Machine. Feature selection (Discriminator

selection) is employed to select the best set of attributes.

Both TCP and UDP are considered.

Discriminators: Statistical data of flows.

Disadvantages: the discriminators are available after the flow has finished the communication.

Introduction

107

Feature Selection

Sequential forward selection Begin with 0 feature chosen; sequentially append

1 feature which can arrive at the best classification result.

Plus-m-minus-r algorithm Begin with 0 feature chosen; sequentially append

m features into chosen ones and pop r features from them (m>r) each time.

Plus-2-minus-1 was used in this paper.

108

Feature Selection (Cont.)

109

For the data sample set with respect to original proportion in the traffic

Accuracy After Feature selection

110

Offline/Realtime Traffic Classification Using Semi-Supervised Learning

20070713-Technique Report-HPPresented at Performance 2007, 2-5 October 2007, Cologne, Germany, and published in Perfor

mance Evaluation journal(special issue on Performance 2007 for the Proceedings of IFIP Performance 2007)

Jeffrey Erman, Anirban Mahanti, Martin Arlitt, Ira Cohen, Carey Williamson

Enterprise Systems and Software Laboratory

HP Laboratories Palo Alto

111

Features: Semi-supervised learning techniques

Allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows.

Both high byte accuracy and flow accuracy (i.e., > 90%). To examine traffic over an extended period of time, to assess the

longevity of the classifiers. Focused on TCP only.

It would likely be advantageous to have a separate classier for the non-TCP traffic.(future work).

Consideration about the elements in training set. Elephant vs. Mice Flows In order to obtain higher byte accuracy.

Introduction

112

Semi-supervised Learning: Hypothesis: few flows are labeled in each cluster, we have

a reasonable basis for creating the clusters to application type mapping.

Step1: Clustering: K-Means Step 2: Mapping from the clusters to the different known q

applications (Y) according to the fraction of labeled application flows within the cluster.

The clusters are unlabeled if they have no labeled flows. Use the unlabeled clusters to represent new or unknown

applications. For most experiments, the number of clusters K = 400.

Introduction

113

Discriminators

11 Discriminators: (After feature selection from 25 discriminators) Total number of packets. Average packet size. Total bytes. Total header (transport plus network layer) bytes. Number of caller to callee packets. Total caller to callee bytes. Total caller to callee payload bytes. Total caller to callee header bytes. Number of callee to caller Packets. Total callee to caller payload bytes. Total callee to caller header bytes.

114

On-line Classification

Online classification Layered classification system.

A packet milestone is reached when the count of the total number of packets a flow (SYN/SYNACK packets are included) has sent or received reaches a specific value.

Each layer is an independent model that classifies ongoing flows into one of the many class types using the flow statistics available at the chosen milestone.

Each milestone's classification model is trained using flows that have reached each specific packet milestone.

Reclassifying whenever a upper layer is reached:When a flow is reclassified, any previously assigned

labels are disregarded.

115

Byte Accuracy

April 13, 9 am trace

78% of the flows had correct labels after classification

116

Features

Pros: Semi-supervised mechanism reduces the cost to

prepare large training data set. Considering sampling techniques to form the

training set.

Cons: Only TCP are included. Is exponential “packet milestone” suitable for

real-time classification?

117

A High Accurate Machine-Learning Algorithm for Identifying Application Traffic in Early Stage

Nen-Fu Huang+ , Gin-Yuan Jai+, and Han-Chieh Chao11

+Department of Computer Science, National Tsing Hua University, Taiwan

*Department of Electronics, National Ilan University, Taiwan

118

Classification in Early Stage

To get characteristics of protocol handshaking for each flow in L7 perspective.

Flow id—tuple (sip, sport, dip, dport, protocol) Statistical information of each flow at first k rounds.

Elapsed time, transmitted size, throughput, response time, inter-arrival time.

120

Rule-based Machine Learning

Rule-based ML (Supervised machine learning) Rules generated are suitable for intrinsic architect

ure of firewall and IDS/IPS. Rules generated by ML algorithm provide informati

on to understand potential characteristics of application protocols

One Rule, PART, Ripple down, DecisionTable, ConjunctiveRule, Ripper…

ML Name Accuracy ML Name Accuracy ML Name Accuracy

PART 85.58 % Ripple Down 82.94 % Ripper 81.8 %

One R 69.19% Conjunctive Rule 9.898 %

121

Experiment Architecture

Traffic Dump(payload included)

FlowPreprocessing

Flow Sets

Result 1MachineLearning

FlowSampling

SampleSet

Random Split

10-fold cross validation

Training Sets 1

Test Sets 1

Training Sets 10

Test Sets 10

………Result 10

…

AverageResult

Protocolsignature

122

Accuracy Comparison with Respective to Sample Set

L. Bernaille2006

123

Accuracy Comparison with Respective to Sample Set(cont.)

Zhu Li

ICC2007

124

Accuracy After Discriminators Selection

125

Conclusions

Machine learning based techniques to identify the Network Applications are more and more important.

Focus on real-time based, protocol level requirement of application traffic classification.

No existing common traffic traces provided for comparing the performance in the same base line.

Expensive training is still a problem. Identifying encrypted traffic (e.g. Skype, Winny, Encr

ypted BT) is a new challenge. Identifying detailed behaviors of encrypted traffic is

even a big challenge.

1 網路安全 (network security) 黃能富教授 清華大學資訊工程學系 /...

Documents

1 網路安全 (network security) 黃能富教授清華大學資訊工程學系 /...