1 網路安全 (network security) 黃能富教授 清華大學資訊工程學系 /...
Post on 21-Dec-2015
259 views
TRANSCRIPT
2
Agenda
Introduction of Network Security Content Inspection Technologies Pattern Matching Algorithms Flow Classification by Stateful Mechanism Machine Learning Based Application Identif
ication Technologies Network Security Research Topics Conclusions
3
-- 駭客無所不在 --
2000/3 :駭客利用 DDos 的網路攻擊方式,引起 Yahoo 、 Amazon 、CNN 、 eBay 等知名網站癱瘓
2001/7 : Amazon.com 旗下的 Bibliofind 遭駭客盜走顧客的信用卡資料
2002 中美駭客大戰 2003/1 SQL Slammer 攻擊 2003/4 大陸「流光」後門程式 2003/8 Blaster 疾風病毒攻擊 2003/9 SoBig 老大病毒攻擊 2003/9 大陸網軍攻擊 2004/3 Netsky 天網病毒攻擊 2004/4 Sasser 殺手病毒攻擊 2005/5 國內大考中心遭駭客竄改資料 2005/6 外交部網站遭大陸網軍後門程式竊取外交機密
4
網路安全的隱憂
網路攻擊技術日新月異,攻擊工具易於取得 , 界面淺顯易懂,不需高深技巧,即可進行攻擊。
網路攻擊已不侷限於侵入動作,許多攻擊行為旨在阻斷網站之服務能力。
網路通訊設備安全性不足。路由器及交換器僅能檢視封包第三層資訊。
防火牆著重在封包第四層資訊檢查。防毒軟體逐漸無法辨識網路攻擊。
8
網路安全基本概念
資料的保密性 資訊的可信賴性 資訊的可取用性
• 資訊安全政策制定
• 資訊安全教育訓練
• 網路安全弱點評估
• 7x24 資訊安全監控中心
• 資訊安全警訊通報服務
• 安全事件緊急應變與電腦鑑識
• 防火牆監控與管理
• 入侵偵測監控與管理
• 網頁竄改即時監控與復原
Policy
9
網路攻擊種類
Denial of Service (DoS), Distributed Denial of Service (DDoS)
Network Invasion Network Scanning Network Sniffing Torjan Horse and Backdoors Worm
10
(1) DoS/DDoS
Prevent another user from using network connection, or disable server or services: e.g. “Smurf” and “Fraggle” attacks, “Land”, “Teardrop”, “NewTear”, “Bonk”, “Boink”, SYN flooding, “Ping of death”, IGMP Nuke, buffer overflow.
Caused by protocol fault or program fault. It damages the “Availability”.
11
一般常見的 DoS 攻擊
Ping Flooding藉由傳送大量的 ICMP echo 封包至受害主機,以耗盡系統資源。
Ping of Death攻擊者傳送夾帶 65,536 位元組的 ICMP echo 封包至受害主機,而受害主機將因此而當機 (TCP/IP 協定實作漏洞 ) 。
UDP flooding (Chargen)攻擊者傳送大量的 UDP 封包至受害網路廣播位址的十九埠( Port 19, Character Generator ),造成此網路的所有主機皆送出回應的 UDP 封包,耗盡網路的頻寬。
12
一般常見的 DoS 攻擊
Smurf Attack借刀殺人計策。攻擊者對某網域的廣播位址傳送 ICMP
echo 封包,而來源位址填上欲加害之主機。這會造成此網域的每一台機器均會傳送 ICMP reply 至被害主機,不但此網域頻寬受阻,被害主機也將因此而耗盡系統資源。
SYN flooding攻擊者以每秒鐘送出數千個 SYN 封包(用以建立 TCP連線)的速度攻擊受害主機,並於來源位址填上假造或不存在的網址。造成受害主機回送 SYN-ACK 給不存在的網址,而此假造網址當然不會回應。如此受害主機將無法再接受其他的 TCP 連線,也就無法讓合法的使用者登入。
13
Smurf attack (DoS)
Dangerous attacksNetwork-based, fills access pipesUses ICMP echo/reply (smurf) or UDP echo (fraggle)
packets with broadcast networks to multiply trafficRequires the ability to send spoofed packets
Abuses “bounce-sites” to attack victimsTraffic multiplied by a factor of 50 to 200Low-bandwidth source can kill high-bandwidth con
nections Similar to ping flooding, UDP flooding but more dange
rous due to traffic multiplication
14
“Smurf” Attack (cont’d)
Internet
Perpetrator
Victim
ICMP echo (spoofed source address of victim)Sent to IP broadcast address
ICMP echo reply
15
SYN flooding Attack (DoS)
Goal is to deny access to a TCP service running on a host.
Creates a number of half-open TCP connections which fill up a host’s listen queue; host stops accepting connections.
Requires the TCP service be open to connections from the victim.
17
DDoS AttackAttacker
Handler Handler Handler
Agent Agent Agent Agent Agent Agent Agent
VictimControl messageMaybe encrypted or hidden in normal packets.
Spoofed packets.
18
DDoS Attack
攻擊者從遠端控制多個傀儡機器同時對受害主機做大量的攻擊。 攻擊 Yahoo.com , Amazon.com , CNN.com , buy.com和 eb
ay.com 的事件即採用 DDoS 攻擊
HostHost
HostHost
HostHost
HostHost
Server ServerServer
Organization A
Organization B
Internet
OrganizationUnder Attack
Hacker
DDoSAttack
Communication to Compromised machines
19
DDoS 攻擊範例
DDOS 攻擊攻擊程式範例: Trin00 ( 會進行破壞 ) Tribe Flood Network ( TFN ) ( 會進行破壞 ) TFN2K Stacheldraht
Trin00 : Trin00 可由某機器或某群機器發動,當攻擊發動後,每一台被暗藏 T
rin00 Daemon 的電腦都向受害主機傳送 UDP 封包(含四個位元組的資料),並一直改變目的地的埠號。這造成受害主機疲於奔命地回傳 ICMP port unreachable 訊息,而無法順利地服務合法封包及連線。
TFN : 啟動模式和 Trin00 相同 , 但 TFN 的攻擊較具多樣化。它能傳送 S
YN flood 、 UDP flood 、 ICMP flood 、或 Smurf 攻擊。最新版本的 TFN 已能自行變動攻擊封包上的來源位址,使得安全機制更難以檢查過濾此型攻擊。
20
(2) Network Invasion
Goal is to get into the target system and obtain informationAccount usernames, passwordsSource code, business critical
information Usually caused by improper configurations
or privilege setting, or program fault. Network invasion is diverse and various,
knowledge about attack pattern may help to detect, but it is quite hard to detect all attacks.
21
Example of network invasion: IIS unicode buffer overflow
For IIS 5.0 on windows 2000 without this security patch, a simple URL string: http://address.of.iis5.system/scripts/..%c1%1c../winnt/system32/cmd.exe?/c+dir+c:\ will show the information of root directory.
22
(3) Network Scanning
Goal is generally to obtain the chance, the topology of victim’s network.The name and the address of hosts and n
etwork devices.The opened services.
Usually uses technique of ICMP scanning, X’mas scan, SYN-FIN scan, SNMP scan.
There is an automatic and powerful tool: Nmap.
23
(4) Sniffing
Goal is generally to obtain the content of communicationAccount usernames, passwords, mail a
ccountNetwork Topology
Usually a program placing an Ethernet adapter into promiscuous mode and saving information for retrieval later
Hosts running the sniffer program (e.g. NetBus) is often compromised using host attack methods.
24
(5) Backdoor and Torjan horse
Usually, the backdoor and torjan horse is the consequences of invasion or hostile programs.
It may open a private communication channel and wait for remote commands.
Available toolkits: Subseven, BirdSpy, Dragger
It can be detected by monitoring known control channel activities, but not with 100% precision.
25
(6) Worm
The chief intention of worm is to propagate and survive.
It takes advantages of system vulnerabilities to infect and then tries to infect any possible targets.
It may decrease the production of system, leave back doors, steal confidential information and so on.
26
P2P/IM 網安威脅
P2P (Peer-to-Peer) 分享程式 IM (Instant Messenger) 即時通 Spyware 間諜軟體 Adware 廣告軟體 Tunneling 私人隧道
27
P2P: A new paradigm
Bottleneck of Server Powerful PC Flexible, efficient information sharing P2P changes the way of Web (Internet)
28
P2P 即將破壞現存的資安架構
P2P 除了檔案分享與即時通訊,也逐漸發展出不同應用,例如 SoftEther 和 Skype 。對個人用戶,利多於弊,但對企業,為資訊安全一大隱憂
P2P 應用潛藏諸多風險,包括洩漏企業內部機密資訊成為病蟲擴散的管道下載非法檔案侵犯著作權佔用大量網路頻寬影響其他系統正常運作造成員工分心,降低生產力
29
Famous P2P Examples
BitTorrent eZpeer Kuro eDonkey eMule MLdonkey Gnutella Kazaa/Morpheus
Shareaza Direct-connect Gnutella Soulseek Opennap Worklink Opennext Jelawat PP 點點通
SoftEther iMESH MIB WinMix WinMule Skype
31
網路安全技術演進
Firewall (Layer-4)VPN SSL VPNPKI IDS/IPSDefense-in-DepthApplication Firewall (Layer-7)UTM (Unified Threat Management)NAC (Network Access Control)
32
入侵偵測系統Intrusion Detection System (IDS)
入侵偵測防禦系統Intrusion Detection and Prevention System (IPS
/IDP)
33
Intrusion Detection System
Intrusion Detection System: a computer system that attempts to detect any set of actions that try to compromise the integrity, confidentiality, or availability of a resource.
An IDS has much more knowledge and many delicate detection functions than common firewalls. (Remember that, the main function of a firewall is to do access control).
34
IDS Types
Host based vs. Network based.Misused detection vs. Anomaly
detectionActive vs. PassiveCentralized vs. Distributed
35
Host based & Network based IDS
Host based IDS: installed on target host as a monitor service. It checks system activity, user privilege, user behavior.
Network based IDS: installed on network node, usually in promiscuous mode to listen all passing traffic. It checks network traffic, nodes interactions.
36
Misused detection & Anomaly detection IDS
Misused detection (signature-based): based on the assumption that intrusion attempts can be characterized by the comparison of user activities against a database of known attacks.
Anomaly detection (statistical-based): identify abusive behavior by noting and analyzing audit data that deviates from a predicted norm.
37
Active IDS vs. Passive IDS
Active IDS: an participate in the system. Not only observe the events, but also involve in the necessary operation. Also called IPS or IDP (Intrusion Detection and Prevention System)
Passive IDS: work on a monitor or bystander basis.
38
Passive IDS
LAN
網路入侵攻擊可穿透
ISP
利用 Port Mirror收集封包分析
Active IDS
LAN
網路入侵攻擊被攔截
ISP
直接攔截封包分析
(a) Passive IDS (b) Active IDS
Active IDS v.s. Passive IDS
39
Centralized IDS v.s. Distributed IDS
Centralized: The sensors are managed by a single analyzer or manager.
Distributed: The sensors are managed by multiple automated analyzers or managers. And among analyzers and managers, they can communicate to each other.
40
Comparison between Firewall and Network based active IDS
Same : Can’t protect insider to insider attack.Can’t protect against connections that
don’t go through.Can do ACL and filtering. (For Active IDS)
Different : IDS has the ability to detect new threats. IDS focuses on intrusion while Firewall
focuses on access control and privacy.Firewalls use address as the passport
while IDS will do much more checks.
41
The Challenge of IDS
Speed limitation: NIDS cannot keep pace with the network speed. (NIDS need to check more fields of a packet than a firewall does.)
The inability to see all the traffic: The “switched Ethernet” is getting largely deployed.
Fail-open/fail-close architecture: when a NIDS fails often without notification of the problem to the central console., leave the network as an “open” one. A “fail-closed” methodology means the network is out of service until the NIDS is brought back on-line.
44
A Generic Layer-7 Engine
Packet NormalizerMakes sure the integrity
of incoming packetsEliminates the
ambiguityDecodes URI strings if
necessary Pattern-Matching Engine Policy Engine
Gather information from pattern-matching engine and issue the verdict to allow/drop the packets
Packet Normalizer
Pattern-MatchingEngine
Policy Engine
Network
PacketStream
NormalizedTraffic
MatchedEvents
Logs,Reports
Policies
Verdicts
FilteredTraffic
PacketStream
Signatures
45
Packet Normalizer
Integrity Checking IP Fragment Reassemble TCP Segment Reassemble
TCP Segments may come out-of-orderSEQ out of window sizeSegment Overlapping
URI DecodeURI hex code obfuscation (‘a’ = %61)URI unicode/UTF-8 obfuscationself-referential directories obfuscation (/././././ = /)directories obfuscation (/abc/a/../a/../a/ = /abc/a)
46
Pattern-Matching Engine
The most computation-intensive task in packet processing. Normally the PM engine needs to process every single byte in packet payload.
In Snort, the PM routine accounts for 31% of the total execution time
47
Pattern Matching is Expensive!
•~30 Instructions/ Byte. 45K Instructions/1500 Byte packet
•~50 Instructions/ 1500 Byte packet
Source: Intel Corp.
48
Content Inspection Technologies
Pattern-Matching AlgorithmsSoftware Based
Boyer-Moore Aho-Corasick (AC)Wu-Manber
Hardware BasedBloom-Filter Reconfigure Hardware (FSM)TCAM-based
49
Pattern Matching Problem Definition
Given an input text T = t0, t1, …, tn ,and a finite set of strings P = {P1, P2, …, Pr}, the string matching problem involves locating and identifying the substring of T which is identical to Pj = , 1 j r, where
ts+i = , 0 i m-1. And this equation can be also denoted as
ts…ts+m-1 =
jia
jm
j aa 10 ...
G C A T C G C A G A G A G T A T A C A G T A A GText
G C A G A G A G
50
Aho-Corasick (AC) Algorithm
AC is a classic solution to exact set matching. It works in time O(n + m + z) where z is number of patterns occurrences in T.
AC is based on a refinement of a keyword tree.
AC is a deterministic algorithm. That is, the performance is independent of the number of patterns.
52
An example of AC Algorithm
Dashed: fail transitions; those not shown leads to the root
h
e
e
h r s
i
ss
{hers}
{he, she}
{his}
{sh}{s}
{he}{h}!={h,s}
Patterns:
hers
his
she
54
Reconfigure Hardware (FSM)
Implement the AC FSM in configurable Logic Elements (LEs) of FPGA.
Achieve multiple gigabit performance. (Depends on the FPGA model)
A powerful FPGA is necessary to accommodate thousands of patterns, so that it’s not practical and visible in commercial market.
56
Bloom Filter
Given a string X, the Bloom filter computes k hash functions on it producing k hash values ranging from 1 to m. The same procedure is repeated for all the members of the pattern set.
The input text is verified by generating k hash values in the same way. If at least one of these k bits is found not set then the string is declared to be impossible to match.
Patterns in Length n are grouped into Bn.
57
Bloom Filter (Cont.)
1 2 3 4 5 6 7 8 9 …
Payload Stream
A B C D E F G H I J
……B2 B3 B4 Bw
False positive :
Mim f = (0.5)K, while m = (k x n) / Ln2
So, total space, sum(Bi) = m x (w - 1)
if k = 1, n = 2048, m = 3072 bits k = 1, n = 3072, m = 4608 bits
if k = 4, f = 0.0625 k = 5, f = 0.0313 k = 6, f = 0.0156
Bloom Filter (B4)
Bloom Filter (B3)
Bloom Filter (B2)
1
1
1
1
0 m
0 m
0 mH1
H2
H3
Hk0 m
1
1 1
11
1 1
1
Group signature by length :
G2 (X)
G3 (X)
G4 (X)
K Hash functions H1, H2, …, Hk
58
TCAM fundamental
TCAM stores data with three logic values: ‘0’, ‘1’, ‘X’ (don’t care)
Multiple match modes are needed.
59
Policy Engine
Collect the matching events from Pattern-Matching Engine. Clarify the relationship between matched patterns:
Ordered: A policy may consists more than one pattern and should be matched in order.
Offset, Depth: The matched position should be within a certain range or location.
Distance, Within: The distance between two matched patterns should be taken into consideration also.
Trace Application States Some applications are difficult to identify by using only one signature
(e.g. P2P). Policy Engine needs to track the connection state like the following diagram:
S0S1 S2 S3
Msg Exchange
Request File
Data Exchange
60
A Pattern Matching Coprocessor for Deep and Large Signature Set in Network Security System (IEEE GLOBECOM 2005)
Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems (IEEE GLOBECOM2005)
A Time and Memory Efficient String Matching Algorithm for Intrusion Detection Systems, (IEEE GLOBECOM 2006)
A non-Computation Intensive Pre-filter for String Pattern Matching in Network Intrusion Detection Systems, (IEEE GLOBECOM 2006)
Smart Architecture for High-speed Intrusion Detection and Prevention Systems, International Conference on Cryptology and Network Security (CANS 2006, Acceptance rate < 18%).
A Deterministic Cost-effective String Matching Algorithm for Network Intrusion Detection Systems,” (IEEE ICC2007).
A Novel Algorithm and Architecture for High Speed Pattern Matching in Resource-limited Silicon Solution, (IEEE ICC2007)
Flow Digest: A State Synchronization Scheme for Stateful High Availability, (IEEE ICC2007). Performing Packet Content Inspection by Longest Prefix Matching Technology, (IEEE GLO
BECOM2007).
Fast Pattern Matching Algorithms
61
Security SoC
BroadWeb Security SoC ARM922 RISC CPU (250Mhz) Hardware NAT (400Mbps) Hardware Content Inspection Engine (40Mbps) Two 10/100/1000 RJ-45 Ports
Embedded-Linux NSS and ICSA approved IPS signature database IPS/Anti-virus functions IM/P2P Management Turn-key solution (ASIC + Software module)
1-tier Customers
62
Security SoC (Cont.)
BroadWeb Security SoC (2nd Generation) ARM926EJ RISC CPU (300Mhz) Intelligent Hardware NAT (1Gbps) Hardware Content Inspection Engine (100Mbps) Embedded GbE Smart Switch and 4-port GPHY core
NSS and ICSA approved IPS Technology IPS/Anti-virus functions IM/P2P Management Turn-key solution (ASIC+Software module)
1-tier Customers
63
Cisco/Linksys Wireless Security Router
• IEEE 802.11n 108 Mbps EWC Wireless LAN• IPS protection and IM/P2P management• Firewall/VPN/Routing• Gigabit Ethernet x 5
65
2Login 70 1 PASS_Per
PASV_ReqPASV_Ok
3
ACTIVE_Req
ACTIVE_Ok
5
4
6
LIST_Req
LIST_Ok
File_ReqFile_Ok
Flow_Close
Transitions Trans. ports Patterns
Login
PASS_Per
ACTIVE_Req
ACTIVE_Ok
PASV_Req
PASV_Ok
LIST_Req
LIST_Ok
File_Req
File_Ok
TCP, dport:21 “ PASS”
“ 230” , ” User” , ” logged in”
“ PORT”
“ 200 PORT command successful”
“ 227 Entering Passive Mode”
“ PASV”
“ LIST”
“ 226 Transfer complete.”
“ 226 Transfer complete”
“ RETR”
TCP, sport:21
TCP, dport:21
TCP, dport:21
TCP, dport:21
TCP, dport:21
TCP, sport:21
TCP, sport:21
TCP, sport:21
TCP, sport:21
State Machine Based Technologies
The FA Example : FTP
66
The FAs of BitTorrent protocols.
1Annouce
2Get_Peers Connect_to_Peer
0 0 1
Connect PhaseTransitions Patterns Transitions Patterns
Download Phase
“ GET /announce”“HTTP/1.0 200 OK”, “e5:peers”
0x13, “ BitTorrent protocol”Connect_to_PeerAnnouce
Get_Peers
67
The FAs of Yahoo Messenger protocol.
Flow_Close
Login PhaseTransitions Patterns
Chat PhaseAuth_Resp
Trans. portsTCP, sport:5050
0 1Auth_Resp “ YMSG” , 0x54
2
3
P2P_File_Tx
File_Tx_Status_BRB
Msg_Service
Transitions PatternsTrans. portsMsg_Service
P2P_File_Tx
File_Tx_Status_BRB
“ YMSG” , 0x06
“ YMSG” , 0x4d“ YMSG” , 0x4d, 0x1
TCP, sport:5050
68
We can identify and manage Over 60 Applications
IM MSN, Yahoo Messanger, AIM, Q
Q, Google Talk, TM, ICQ, iChat, MIRC, Odigo, Rediff, Gadu-Gadu
Web-IM Meebo.com, eBuddy.com, iLove
IM.com, MSN, AIM, Yahoo, ICQ P2P
eDonkey, BitTorrent, Gnutella, Foxy, FastTrack, Vagaa, Winny, BitComet, DirectConnect, PiGo, PP365, WInMX, POCO, iMesh, ClubBox
Streaming-Media QQLive, Podcast Bar, PPLive, Re
alPlayer, Window Media Player, iTunes, WinAMP, Player 365, QuickTime, FlashMedia Video, TVAnts
Webmail Yahoo, Hotmail, Gmail
VoIP Skype (3.6)
File Transfer FTP, Web File Transfer, Thunder,
GetRight, FlashGet VPN
VNN, SpftEther, Hamachi, TinyVPN, PacketiX, HTTP-Tunnel, Tor, Ping-Tunnel
Terminal Control VNC, PCAnywhere
Online Game QQGame, OurGame, Cga.com.cn,
QQFO
70
Application Traffic identification
Traffic identification(or traffic classification) issues are focused in recently years since: The introducing of P2P application greatly impacts the
network management task. Port number is not the best and efficient discriminator to
identify these prevalent traffics. How about string matching method? Accurate! But…
It cannot identify the encrypted traffic.High cost on manually maintenance work for
protocol signatures.High cost to match string in very high speed
network.Privacy issue is under debating.
71
How to resolve the problem?
Heuristics methods(2004~2005) Based on some intrinsically different behavior, s
ome rule can be constructed.E.g. # dest ip == of dest port the host is ru
nning P2P. To differentiate P2P or non-P2P traffic.
Machine learning based techniques:(2004 ~ ) To construct the “statistical signatures” for differ
ent categories/application protocols. Most machine learning techniques are directly e
mployed to construct traffic signature.
72
The Milestone of Researches on Application Traffic Identification
Before 2003: String matching and port number. 2003~2005:
HeuristicsMachine learning method.
2006~ : Machine learning method for real-time based traffic classification.First k data packet sizes and direction of
TCP connection.Stage-based classification(Statistical data
in each stage)
73
Different Objects of Application Traffic Identification
At different levels Category level or QoS class (Bulk data transfer - FTP&P2P,
interactive, mail, web, streaming) Protocol level (Kazza, eMule/eDonkey, Bittorrent, MSN, FTP,
POP3, SMTP, HTTP, Skype, Winny, Share,….) Behavior level (FTP control, FTP data, MSN file transfer, MS
N message chatting, MSN voip, Skype Chatting, Skype voip, Skype File transfer, Skype Video conference,…)
All existing researches focus on classification in protocol or category level.
Application field Offline based: traffic trend analysis. Online based: traffic shaping, traffic engineering, security
management.
74
The Classes of Applied Machine Learning Algorithms
Supervised-Machine learning The model of traffic characteristics is
constructed from the training instances with previously defined class label.
Unsupervised-Machine learning (Clustering) The model of traffic characteristics is
constructed from the training instances without previously defined class label.
However, all the existing training set employed by both include pre-classified label. Because each cluster would contain several
different classes/protocols.
75
The Discriminators (Attributes)
The key issues for machine-learning based traffic identification are: What are the most distinguishable
characteristics (attributes/discriminators)? How to remove the expensive cost on training?
Different discriminators: From L3/L4 layer—packet inter-arrival time, total
packet size, number of packets,…,etc. Combination of L3/L4 attributes with different
perspectives. e.g. upload/download size ratio.
76
The Milestone of Researches (Applying Machine Learning techniques)
2003~2004: [Matthew Roughan, IMC’04] Class-of-Service Mapping for Qo
S. 2005:
[Sebastian Zander] Automated Traffic Classification. [Andrew W. Moore] Using Bayesian Analysis Techniques.
2006: [Sebastian Zander] Internet Archeology: Estimating Individual
Application Trends in Incomplete Historic Traffic Traces. [Laurent Bernaille] Traffic classification on the fly. (first 5 pac
kets of TCP with k-means clustering). [Jeffrey Erman] Internet Traffic Identification using Machine L
earning (k-means, EM clustering).
77
2006 (cont.): [Laurent Bernaille] Early Application Identification.(first 4 p
ackets of TCP with k-means, GMM , and HMM clustering) 2007: Real time based methods
[Zhu Li] Accurate Classification of the Internet Traffic Based on the SVM Method. (TCP and UDP flow classification)
[Laurent Bernaille] Early Recognition of Encrypted Application. (first 3 packets of TCP with GMM clustering)
[Jeffrey Erman] Semi-Supervised Network Traffic Classification. (Stage-based classification)
The Milestone of Researches (Applying Machine Learning techniques)
78
Class-of-Service Mapping for QoS: A Statistical Signature-based Approach to IP Traffic
ACM SIGCOMM Internet Measurement Conference (IMC '04)
Matthew Roughan1, Subhabrata Sen2, Oliver Spatscheck2, Nick Duffield2
1School of Mathematical Sciences, University of Adelaide, Australia2AT&T Labs – Research, Florham Park, NJ, USA
79
Introduction
Before this paper: Traditional researches tried to find the model for tr
aditional protocol (FTP, web, mail). Most researches of traffic characteristics modeling
which focus on P2P and IM are case studies. Features:
This paper studied the requirements and proposed a framework of QoS for traffic which consists of traditional and novel P2P/IM application in QoS class level.
Classification is based on utilizing the statistics of particular applications in order to form “signatures”.
80
Ideas
The statistical attributes are aggregated with respect to Server ports and Server IP addresses, separately.
Employing machine learning techniques to construct the mapping from Server port aggregation/Server IP aggregation to different QoS classes.
Nearest Neighbor(NN) Linear Discriminant Analysis(LDA)
Then, the port number of aggregation that belongs to particular QoS class can form one rule.
Disadvantage: Applications that require different QoS might use the same server port number.(e.g. P2P)
81
Nearest Neighbor
To classify a data point x, let’s find the nearest neighbor! The points with same property should be closely. The class of the nearest neighbor will be
assigned to the data point x. K- Nearest Neighbor:
To find the k nearest neighbors and let them “vote”.
More information: http://neural.cs.nthu.edu.tw/jang/books/dcpr/4.2-knnr.asp?title=4-2 K-nearest-neighbor Rule
82
Linear Discriminant Analysis
To find the good “projection” for original points. Linear discriminant analysis finds a linear transformation ("discrimina
nt function") of the two predictors, X and Y, that yields a new set of transformed values that provides a more accurate discrimination than either predictor alone: Transformed Target = C1*X + C2*Y
More information: http://www.dtreg.com/lda.htmhttp://neural.cs.nthu.edu.tw/jang/books/dcpr/index.asp
3 features
2 features
83
Evaluation Example
Attributes for this evaluation: the average packet size, flow duration, bytes per flow, packets per flow, and Root Mean Square (RMS) packet size.
84
Internet Traffic Classification Using Bayesian Analysis Techniques
ACM SIGMETRICS'05
Andrew W. Moore1, Denis Zuev2
1University of Cambridge2University of Oxford
85
Introduction
Features: Only TCP flows are considered. Category-level classification. Supervised-machine-learning
Naïve Bayesian algorithm (貝氏演算法 ). Uniquely use data that has been hand-classified
(based upon flow content) to one of a number of categories.
Feature selection was applied to improved the accuracy.
86
Ideas
Discriminators: About 248 discriminators of each flow.
E.g. Packet inter-arrival time (mean, variance, . . . ), Payload size (mean, variance, . . . ), Fourier Transform of the packet inter-arrival time, TTL value, Flow duration, TCP Port…etc.
Naïve Bayesian classifier For a flow with known statistical attributes, which class is mo
st likely happened? To find the maximum probability Pr(Ci | X):
Ci is i-th classX is the attributes of flow which will be classified.
Only about 65% accuracy on flow level was achieved.
87
Ideas(cont.)
Improvement: Naïve Bayes Kernel estimation method.
Kernel estimation was used instead of Gaussian distribution model assumed by Naïve Bayesian.
Discriminator selection and dimension reduction. The accuracy was improved upto 95%
Disadvantages: All the discriminators are available after the flow is closed. Only TCP flows are considered for classification. Network management might need more finer classes (proto
col level or behavior level).
88
Evaluation for Train and Test sets from traffic of different time
FCBF: Fast Correlation-Based Filter
89
Traffic Classification on the Fly
ACM SIGCOMM Computer Communication Review Journal, Volume 36 , Issue 2, 200604
Laurent Bernaille†, Renata Teixeira†, Ismael Akodkenou†, Augustin Soule‡, Kave Salamatian†
† LIP6, Universit ´e Pierre et Marie Curie, ‡ Thomson Paris Lab
Paris, FRANCE
90
Introduction
Features: The first paper focused on real-time flow-level application
classification. To approximately model the L7 protocol handshaking. Protocol level classification. Unsupervised machine learning.
K-means clustering. (50 clusters are the best)Protocol assignment: for each cluster, the
protocol of the largest proportion dominates the cluster.
Discriminators: the first q data packet sizes (payload) and direction of each TCP connection.q = 5 is the best. (+300, -200, +100, +200, -400)
91
K-means Clustering
For given number of clusters k, to iteratively find k centers of these k clusters and “partition” all the points into these k clusters until the nearest center does not change.
Each data point is expressed as a vector, and Euclidean distance is the most common distance computation function.
92
Evaluation Result
Above 80% average accuracy can be achieved.
Disadvantages:
Only TCP connections are considered.
Protocol assignment will result in classification starvation.
The protocols which don’t dominate any cluster will be always classified as other protocol.
93
Early Application Identification 200612-ACM Conf-CONEXT06
(International Conference On Emerging Networking Experiments And Technologies)
Laurent Bernaille, R. Teixeira and K. Salamatian,
Universit ´e Pierre et Marie Curie LIP6, CNRS
Paris, France
94
Features: Three unsupervised machine learning (clustering) algorithms
were used to evaluate cluster assignment accuracy and protocol labeling accuracy.
K-means Gaussian Mixture Models (GMM) on an Euclidean space Spectral clustering on Hidden Markov Models (HMM, in
order to consider order of packets) Discriminators: size and direction of first P data packets. To deal with the starvation problem in each group, a labeling
heuristic method based on standard server port number (e.g. 25 for SMTP, 110 for POP3) is used to classify protocols in each cluster group.
Only focus on TCP flows. Wireless traffic trace has been included for evaluation.
Introduction
95
Discriminators
Discussion about the discriminators: The size and direction of each packet adds more information to di
stinguish applications than arrival time related metrics. The range of packet sizes for each application is similar across tr
aces. These models can be used to classify the same set of application
s at another network.
P = 4 packets for the three clustering methods. Clustering number:
Kh = 30 for HMM,
Kk = 40 for K-Means and
Kg = 45 for GMM.
100
Features
Pros: Easy, fast, and simple!
Payload size and packet direction of first P data packets.
Unsupervised training automatic learning mechanism.
Cons: In [Jeffrey Erman’ HP TR]: “…is unsuccessful clas
sifying application types with variable-length packets in their protocol handshakes such as Gnutella. Neither of these studies access the byte accuracy of their approaches which makes direct comparisons to our work difficult.”
101
Features
Cons: Only TCP are included for classification. According to the description of traces, there are
un-ignorable fraction of flows which contain less than 4 data packets!
And, the control flow might prevent the identification system from classifying detailed protocol behavior.
Classification starvation is still exist for protocols which don’t use standard port.
102
Early Recognition of Encrypted Applications
20070405-0406Passive and Active Measurement Conference (PAM 2007)
Laurent Bernaille, Renata Teixeira
Universit´e Pierre et Marie Curie - LIP6-CNRS
Paris, France
103
Features: The classification of SSL-encrypted protocols. Two stages:SSL detection & Protocol identification. First 3 packets and 35 clusters for Gaussian Mixture Model.
Size of original packet: Most accurate method is to look up the encryption method in
the handshake packets and transform the size of application packets accordingly.
For the five most common ciphers this method is overkill because the increase varies from 21 to 33 bytes.
Simple heuristic: subtract 21 from the size of the encrypted packet regardless of the cipher.
Extending the Cluster+Port labeling heuristic SSL-specific ports: 443 for HTTPS, 993 for IMAPS and 995 for
POP3S.
Introduction
105
Accurate Classification of the Internet Traffic Based on the SVM Method
IEEE ICC 2007
Zhu Li1, Ruixi Yuan1, and Xiaohong Guan1, 2
1Center for Intelligent and Networked Systems (CFINS) Tsinghua University, Beijing 100084 , China
2SKLMS Lab and MOE Key Lab for Intelligent Networks and Network Security Xian Jiatong University, Xi’an 710049, China
106
Features: Category level classification. Supervised-machine learning.
Support Vector Machine. Feature selection (Discriminator
selection) is employed to select the best set of attributes.
Both TCP and UDP are considered.
Discriminators: Statistical data of flows.
Disadvantages: the discriminators are available after the flow has finished the communication.
Introduction
107
Feature Selection
Sequential forward selection Begin with 0 feature chosen; sequentially append
1 feature which can arrive at the best classification result.
Plus-m-minus-r algorithm Begin with 0 feature chosen; sequentially append
m features into chosen ones and pop r features from them (m>r) each time.
Plus-2-minus-1 was used in this paper.
109
For the data sample set with respect to original proportion in the traffic
Accuracy After Feature selection
110
Offline/Realtime Traffic Classification Using Semi-Supervised Learning
20070713-Technique Report-HPPresented at Performance 2007, 2-5 October 2007, Cologne, Germany, and published in Perfor
mance Evaluation journal(special issue on Performance 2007 for the Proceedings of IFIP Performance 2007)
Jeffrey Erman, Anirban Mahanti, Martin Arlitt, Ira Cohen, Carey Williamson
Enterprise Systems and Software Laboratory
HP Laboratories Palo Alto
111
Features: Semi-supervised learning techniques
Allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows.
Both high byte accuracy and flow accuracy (i.e., > 90%). To examine traffic over an extended period of time, to assess the
longevity of the classifiers. Focused on TCP only.
It would likely be advantageous to have a separate classier for the non-TCP traffic.(future work).
Consideration about the elements in training set. Elephant vs. Mice Flows In order to obtain higher byte accuracy.
Introduction
112
Semi-supervised Learning: Hypothesis: few flows are labeled in each cluster, we have
a reasonable basis for creating the clusters to application type mapping.
Step1: Clustering: K-Means Step 2: Mapping from the clusters to the different known q
applications (Y) according to the fraction of labeled application flows within the cluster.
The clusters are unlabeled if they have no labeled flows. Use the unlabeled clusters to represent new or unknown
applications. For most experiments, the number of clusters K = 400.
Introduction
113
Discriminators
11 Discriminators: (After feature selection from 25 discriminators) Total number of packets. Average packet size. Total bytes. Total header (transport plus network layer) bytes. Number of caller to callee packets. Total caller to callee bytes. Total caller to callee payload bytes. Total caller to callee header bytes. Number of callee to caller Packets. Total callee to caller payload bytes. Total callee to caller header bytes.
114
On-line Classification
Online classification Layered classification system.
A packet milestone is reached when the count of the total number of packets a flow (SYN/SYNACK packets are included) has sent or received reaches a specific value.
Each layer is an independent model that classifies ongoing flows into one of the many class types using the flow statistics available at the chosen milestone.
Each milestone's classification model is trained using flows that have reached each specific packet milestone.
Reclassifying whenever a upper layer is reached:When a flow is reclassified, any previously assigned
labels are disregarded.
116
Features
Pros: Semi-supervised mechanism reduces the cost to
prepare large training data set. Considering sampling techniques to form the
training set.
Cons: Only TCP are included. Is exponential “packet milestone” suitable for
real-time classification?
117
A High Accurate Machine-Learning Algorithm for Identifying Application Traffic in Early Stage
Nen-Fu Huang+ , Gin-Yuan Jai+, and Han-Chieh Chao11
+Department of Computer Science, National Tsing Hua University, Taiwan
*Department of Electronics, National Ilan University, Taiwan
118
Classification in Early Stage
To get characteristics of protocol handshaking for each flow in L7 perspective.
Flow id—tuple (sip, sport, dip, dport, protocol) Statistical information of each flow at first k rounds.
Elapsed time, transmitted size, throughput, response time, inter-arrival time.
120
Rule-based Machine Learning
Rule-based ML (Supervised machine learning) Rules generated are suitable for intrinsic architect
ure of firewall and IDS/IPS. Rules generated by ML algorithm provide informati
on to understand potential characteristics of application protocols
One Rule, PART, Ripple down, DecisionTable, ConjunctiveRule, Ripper…
ML Name Accuracy ML Name Accuracy ML Name Accuracy
PART 85.58 % Ripple Down 82.94 % Ripper 81.8 %
One R 69.19% Conjunctive Rule 9.898 %
121
Experiment Architecture
Traffic Dump(payload included)
FlowPreprocessing
Flow Sets
Result 1MachineLearning
FlowSampling
SampleSet
Random Split
10-fold cross validation
Training Sets 1
Test Sets 1
Training Sets 10
Test Sets 10
………Result 10
…
AverageResult
Protocolsignature
125
Conclusions
Machine learning based techniques to identify the Network Applications are more and more important.
Focus on real-time based, protocol level requirement of application traffic classification.
No existing common traffic traces provided for comparing the performance in the same base line.
Expensive training is still a problem. Identifying encrypted traffic (e.g. Skype, Winny, Encr
ypted BT) is a new challenge. Identifying detailed behaviors of encrypted traffic is
even a big challenge.