seven years and one day in the life of internet...
TRANSCRIPT
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Seven years and one dayin the life of Internet:
Multiresolution and random projectionsfor robust monitoring
Patrice ABRY 1,
in collab. with Pierre BORGNAT 1, Guillaume DEWAELE 1,Kensuke FUKUDA 2, Kenjiro CHO 3
1CNRS – ENS Lyon Physics Lab., Ecole Normale Superieure de Lyon, France –2NII, Tokyo, Japan – 3IIJ Internet Initiative Japan.
Does Fractal Scaling at the IP Level Depend onTCP Flow Arrival Processes ?
Nicolas Hohn, Darryl Veitch Patrice Abry
CUBIN CNRS, UMR 5672Department of Electrical& Electronic Engineering Laboratoire de PhysiqueUniversity of Melbourne Ecole Normale Superieure de Lyon
Australia France
ACM/SIGCOMM Internet Measurement WorkshopNovember 6-8 2002
Marseille, France
Introduction Classical Analysis Robust Analysis w/ Sketches Longitudinal Study Conclusion
Seven Years and One Day:Sketching the Evolution of Internet Traffic
Pierre BORGNAT 1, Guillaume DEWAELE 1
Kensuke FUKUDA 2, Patrice ABRY 1 Kenjiro CHO 3
1CNRS – ENS Lyon Physics Lab., Université de Lyon, France –2NII, Tokyo, Japan – 3IIJ Internet Initiative Japan.
INFOCOM 2009
CENTRE NATIONAL �
DE LA RECHERCHE�
SCIENTIFIQUE
ECOLE NORMALE SUPERIEURE DE LYON
Introduction Classical Analysis Robust Analysis w/ Sketches Longitudinal Study Conclusion
Internet Traffic: A Longitudinal Analysis[Context: Passive Monitoring of TCP/IP traffic on a link]
What are the evolutions of traffic over the years?• Topics in Statistical analysis of traffic
• Distributions of protocols, of packet sizes, of IAT, of flows,...• Aggregated traffic: Marginal laws• LRD (Long Range Dependence)• ...
• Diversity of expected traffic: http, P2P, mail, DNS,...• Variety of conditions: used bandwidth, congestion,...• Frequent anomalies: scans, viruses&worms, DDoS,...• ...
• Intuition: One trace is not enough!(for longitudinal, empirical data analysis)
• MAWI dataset: more than 7 years of daily traces
Introduction Classical Analysis Robust Analysis w/ Sketches Longitudinal Study Conclusion
Seven Years and One Day:Sketching the Evolution of Internet Traffic
Pierre BORGNAT 1, Guillaume DEWAELE 1
Kensuke FUKUDA 2, Patrice ABRY 1 Kenjiro CHO 3
1CNRS – ENS Lyon Physics Lab., Université de Lyon, France –2NII, Tokyo, Japan – 3IIJ Internet Initiative Japan.
INFOCOM 2009
CENTRE NATIONAL �
DE LA RECHERCHE�
SCIENTIFIQUE
ECOLE NORMALE SUPERIEURE DE LYON
Introduction Classical Analysis Robust Analysis w/ Sketches Longitudinal Study Conclusion
Internet Traffic: A Longitudinal Analysis[Context: Passive Monitoring of TCP/IP traffic on a link]
What are the evolutions of traffic over the years?• Topics in Statistical analysis of traffic
• Distributions of protocols, of packet sizes, of IAT, of flows,...• Aggregated traffic: Marginal laws• LRD (Long Range Dependence)• ...
• Diversity of expected traffic: http, P2P, mail, DNS,...• Variety of conditions: used bandwidth, congestion,...• Frequent anomalies: scans, viruses&worms, DDoS,...• ...
• Intuition: One trace is not enough!(for longitudinal, empirical data analysis)
• MAWI dataset: more than 7 years of daily traces
Introduction Classical Analysis Robust Analysis w/ Sketches Longitudinal Study Conclusion
Seven Years and One Day:Sketching the Evolution of Internet Traffic
Pierre BORGNAT 1, Guillaume DEWAELE 1
Kensuke FUKUDA 2, Patrice ABRY 1 Kenjiro CHO 3
1CNRS – ENS Lyon Physics Lab., Université de Lyon, France –2NII, Tokyo, Japan – 3IIJ Internet Initiative Japan.
INFOCOM 2009
CENTRE NATIONAL �
DE LA RECHERCHE�
SCIENTIFIQUE
ECOLE NORMALE SUPERIEURE DE LYON
Introduction Classical Analysis Robust Analysis w/ Sketches Longitudinal Study Conclusion
Internet Traffic: A Longitudinal Analysis[Context: Passive Monitoring of TCP/IP traffic on a link]
What are the evolutions of traffic over the years?• Topics in Statistical analysis of traffic
• Distributions of protocols, of packet sizes, of IAT, of flows,...• Aggregated traffic: Marginal laws• LRD (Long Range Dependence)• ...
• Diversity of expected traffic: http, P2P, mail, DNS,...• Variety of conditions: used bandwidth, congestion,...• Frequent anomalies: scans, viruses&worms, DDoS,...• ...
• Intuition: One trace is not enough!(for longitudinal, empirical data analysis)
• MAWI dataset: more than 7 years of daily traces
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 1 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Goals⇒ Discuss issues in Internet robust monitoring
• Internet monitoring :- parameter estimation, anomaly detection, traffic
classification, . . .• Issues :
- database ?- data ? Information actually available ?- level of Description (Pkt, Flow, Session,. . . ) ?- aggregation level ?- robust monitoring ?- objective (scientific) assessment and comparisons ?
• Solutions :- Multiresolution Analysis,- Random projections (sketches),
• Illustrations on MAWI database- parameter estimation,- anomaly detection.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 2 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Issue 1 : Data Set• What Data Set ?
- What network ? What link ?- What type of traffic ? What What usage (academic,
companies, commercials,...) ?- What Size ? What duration ? How many traces ?- Publicly available ? What documentation ?
• Answer : MAWI database- WIDE network (AS2500). TransPacific (Japan-US)
Backbone.- Sample Point B : 18Mbps CAR (100Mbps link)- Sample Point F : 100Mpbs, 150Mpbs CAR (1Gbps link),
after 2007- ' 1.5TB of (compressed and anonymized) packet traces- 7 years (2001-2008), each day, 15 min long traces,- A few 24h traces (One Day in the Life of Internet),- Publicly available and (partially) documented at :
http ://mawi.wide.ad.jp/
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 3 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Issue 2 : Data
• What Data ?- What information are you actually allowed/able to use ?- What are the rules of the Game ?- What are the goals of the Game ?- IP 5−tuple ? Time stamp ? Payload ? Netflow ?
• Answer : MAWI data- IP Pkt time stamps and 5−tuple :
IPProtocol, IPSrc, IPDst, PtSrc, PtDst,- no payload !- no bi-directionality !
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 4 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Issue 3 : Level of Description
• What description level ?- Pkt ? Flow (connection) ? Session ?
• Issues :- Ability to collect ? Network load ?- Sampling ? What sampling ?- Storage ? Real-Time ? On-Line ?
• Answer : IP Pkt level- all Pkt, no sampling,⇒ Aggregated Pkt number time series,⇒ Aggregated Byte number (volume) time series.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 5 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Issue 4 : Aggregation Level
• What aggregation level ∆ ?- Typical Pkt InterArrival Time ' 0.1 ms,- Typical data collection (or stationarity) ' 1hr,- 10−3s ≤ ∆ ≤ 103s, choice within 6 orders of magnitude !
• Issues :- What goals ?- e.g., Anomaly detection : anomaly duration ? volume ?- Real-Time ? On-Line ?
• Answer :- do not single out an arbitrary ∆,- do the analysis for all ∆ jointly !⇒ MutiResolution Analysis and modeling.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 6 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Issue 5 : Robust Analysis• Normal is wild :
- Heterogeneity : different traffics, usages, applications,requests, constraints, hardware, software,
- Superimposition : all mixed in a single trace,- Difficult statistics : long memory, heavy tail, non stationarity,⇒ Intrinsic (or natural) large variability,⇒ Normal ? Does it exist ? Anomalies constantly occurring ?
• Issues :- What is the confidence size of a parameter estimate ?- How much should you trust an analysis performed a
particular day ? on a particular traffic ?- How general ? What credit ?⇒ A single time series is not enough ! Illustrations !
• Answer :- Text book solution : Average,- On what ? different traces ? different days same hour ?
different hours same day ?⇒ Random Projections (Sketches).
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 7 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
OutlineIssues
MultiresolutionMarginalsCovariance
Random Projections
Robust EstimationPrincipleSeven years . . . and One Day
Anomaly DetectionPrincipleSeven years
Future
Appendix
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 8 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Gaussian or not Gaussian ?• Aggregated traffic X∆(t) : # of packets counted during ∆
(alternatively : # of bytes during ∆)• Marginal :• Poisson ? Exponential ? Gaussian ? depends on ∆ !
0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 10 20 30 40 500
0.02
0.04
0.06
0.08
0.1
0 50 100 150 200 2500
0.005
0.01
0.015
0.02
∆ = 4ms ∆ = 32ms ∆ = 256ms
• Fit/Model : Gamma Γα,β(x) =1
βΓ(α)
(xβ
)α−1
exp(−xβ
).
Neither Exp. p(x) = e−x/β/β nor Gaussian : p(x) = e−(x−µ)2/2σ2
√2πσ
.
[Scherrer et al. IEEE TDSC’07]
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 9 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Gamma Distributions
Γα,β(x) =1
βΓ(α)
„xβ
«α−1
exp„− xβ
«.
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
Gamma(1,1)Gamma(2,1)Gamma(3,1)Gamma(4,1)Gamma(6,1)Gamma(8,1)Gamma(10,1)
0 5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
Gamma(3,1)Gamma(3,2)Gamma(3,3)Gamma(3,4)Gamma(3,5)Gamma(3,6)
• Shape parameter α : From Gaussian to exponential,1/α ' distance from Gaussian,
• Scale parameter β : Multiplicative factor.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 10 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Gamma Fits
• Empirical PDFs and Gamma Fits LBL-TCP-3
0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 10 20 30 40 500
0.02
0.04
0.06
0.08
0.1
0 50 100 150 200 2500
0.005
0.01
0.015
0.02
∆ = 4ms ∆ = 32ms ∆ = 256ms
• Accurately Fits data for all aggregation levels ∆,• Stability under addition :
X1 : Γα1,β ,X2 : Γα2,β , (X1,X2) Indep. =⇒ X1 + X2 : Γα1+α2,β ,• Aggregation : X2∆(k) = X∆(k) + X∆(k + 1).
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 11 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Parameter Estimation : α∆, β∆
• Stability under addition and Independence
⇒{α(∆) = α0∆β(∆) = β0
0 50 100 150 200 250 3000
2
4
6
8
10
12
∆
αβ
• α∆, β∆ accommodate correlations !
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 12 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
OutlineIssues
MultiresolutionMarginalsCovariance
Random Projections
Robust EstimationPrincipleSeven years . . . and One Day
Anomaly DetectionPrincipleSeven years
Future
Appendix
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 13 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Covariance : the wavelet point of view
• X∆ stationary stochastic process, with spectrum fX∆(ν),
• Wavelet Coefficients : dX (j , k),WaveletTransform
• Wavelet Spectrum : S(j) =1nj
nj∑k=1
|dX∆(j , k)|2,
IES(j) =
∫fX (ν)2j |Ψ0(2jν)|2dν.
• Spectral Estimation : fX (ν = 2−jν0) = S(j) .
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 14 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Both Short and Long Range Dependencies
• Log-scale Diagram : log2 S2(j) vs. log2 2j = j .X∆, LBL-TCP-3, ∆ = 1ms
1 5 10 15
−2
0
2
4
6
8
j
log2S j
• Power law at coarse scales (low frequencies) :⇒ Long range dependence, LRD
• Short dependence at fine scales (low frequencies),• ⇒ Use a FARIMA(P,d ,Q) covariance form.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 15 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
FARIMA(P,d ,Q) covariance
farima = fractionally Integrated ARMA.
1. fractional integration with parameter d ,2. ARMA(P,Q)→ ARMA(1,1), params. θ, φ.
fX∆(ν) = σ2
ε
∣∣∣1− e−i2πν∣∣∣−2d |1− θe−i2πν |2
|1− φe−i2πν |2,
• d controls Long Range Dep., with γ = 2d ,• P,Q control Short Range Dep.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 16 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Empirical LDs and FARIMA(P,d ,Q) Fits LBL-TCP-3
1 5 10 15
−2
0
2
4
6
8
j
log 2S
j
1 5 10 15
−2
0
2
4
6
8
j
log 2S
j
1 5 10 15
−2
0
2
4
6
8
j
log 2S
j
∆ = 4ms ∆ = 32ms ∆ = 256ms
0 2 4 6 80
0.1
0.2
0.3
0.4
0.5
log2(∆)
d
0 2 4 6 8
−0.2
0
0.2
0.4
0.6
0.8
1
log2(∆)
θφ
• Accurately Fits data for all aggregation levels ∆,• LRD is persistent, SRD are cancelled out.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 17 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Gamma-Farima Modeling
• Numerical synthesis procedures for bivariate GammaFarima processes,
NumericalSynthesis
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 18 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Random Projections or sketches
Sketches = ensemble of outputs of random hash table[Muthukrishnan’03, Krishnamurty’03,...] [Abry+ SAINT’07, Dewaele+ Sigcomm LSAD’07]
• Random Hash Functions : hn- y = h(x),- M− outputs : y ∈ [1, . . . ,M],- k− universal Hash functions.
• Hash the Traffic :- Packet : i−th packet, n−tuple : ti ,PTscri ,PTdsti , IPsrci , IPdsti- Choose one specific key : e.g., Destination Address- Hash according to this key : mi = h(IPdsti ) ∈ [1, . . . ,M],- All packets with same mi = one sub-trace, sampled by
random projection.
- Aggregate traffic {ti ,mi}i∈I into M series X m∆ (t), bins of ∆s.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 19 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Sketched Traffic
• Sketches = M sub-traces representing the total traffic• Total of outputs = total trace (constrained sampling)• Each sketched output = random flow-sampling
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 20 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
OutlineIssues
MultiresolutionMarginalsCovariance
Random Projections
Robust EstimationPrincipleSeven years . . . and One Day
Anomaly DetectionPrincipleSeven years
Future
Appendix
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 21 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Robust Estimation : Sketches + Multiresolution
• On each Sketch output, for each ∆ :- Γα,β(X∆) Fit and estimation of αm
∆, βm∆.
- Compute LDs and estimate Hm∆ (or FARIMA params)
- Combine estimates over m and ∆⇒ Adaptativity : Reference is given by data themselves and
not a priori !⇒ Robustness : Median is a robust average !⇒ Impact of outliers (Anomalies) decreased !
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 22 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
MAWI data : B-US2Jp, 2005/07/11MiB/s
0s 150 300 450 600 750 900s0
0.5
1
1.5
2 103 Pkt/s
0s 150 300 450 600 750 900s0
1
2
3
4
LD for Byte count Hg=0.94
Hm
=0.88
2ms 16ms 128ms 1s 8s 64s
LD for Pkt count Hg=0.92
Hm
=0.90
2ms 16ms 128ms 1s 8s 64s
• All Hms are consistent ! Hms and Hg are consistent !
• LRDs on Bytes pr Pkts are consistent !
• Normal Traffic : no congestion (no anomaly ?)
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 23 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
MAWI data : B-US2Jp, 2003/06/03, CongestionMiB/s
0s 150 300 450 600 750 900s0
0.5
1
1.5
2 103 Pkt/s
0s 150 300 450 600 750 900s0
1
2
3
4
LD for Byte count Hg=0.41
Hm
=0.80
2ms 16ms 128ms 1s 8s 64s
LD for Pkt count Hg=0.89
Hm
=0.83
2ms 16ms 128ms 1s 8s 64s
• HByteg ' 0.4 : no variability, no LRD, HByte
g 6= HPktg
• HBytem ' 0.9, Flow variability, significant LRD, HByte
m ' HPktm
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 24 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
MAWI data : B-Jp2US, 2004/09/21, Anomalies
0 100 200 300 400 500 600 700 800 9000
2
4
6
8
10
12
14
16
18x 10
5 Sketched Traffic: Bytes Counts / 1 s
seconds0 100 200 300 400 500 600 700 800 900
0
1000
2000
3000
4000
5000
6000Sketched Traffic: Packet Counts / 1 s
seconds
0 2 4 6 8 10 12 14 16 1816
17
18
19
20
21
22
23
24Sketched Traffic: Bytes Counts − LD − H=0.777
scales0 2 4 6 8 10 12 14 16 18
−2
0
2
4
6
8
10Sketched Traffic: Packet Counts − LD − H=0.905
scales
• HByteg ' 0.7 : LD ? ? ?, HPkt
g ' 1, ? ? ?
• HBytem ' 0.8, LDs ok, significant LRD, HByte
m ' HPktm
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 25 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
OutlineIssues
MultiresolutionMarginalsCovariance
Random Projections
Robust EstimationPrincipleSeven years . . . and One Day
Anomaly DetectionPrincipleSeven years
Future
Appendix
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 26 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Longitunal study of MAWI backbone dataset
103 packets/s
Jp2US
103 packets/s
Jp2US
2001 2 3 4 5 6 7 20080
10
20103 packets/s
US2Jp
103 packets/s
US2Jp
2001 2 3 4 5 6 7 20080
10
20
MiBytes/s
Jp2US
MiBytes/s
Jp2US
2001 2 3 4 5 6 7 20080
5
10 MiBytes/s
US2Jp
MiBytes/s
US2Jp
2001 2 3 4 5 6 7 20080
5
10
Pkt Size Distrib.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 27 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Robust Estimation with Sketches : H
H (
pack
ets)
Jp2USJp2US
2001 2 3 4 5 6 7 20080.40.60.8
11.2
H (
pack
ets)
US2JpUS2Jp
2001 2 3 4 5 6 7 20080.40.60.8
11.2
H (
byte
s)
Jp2USJp2US
2001 2 3 4 5 6 7 20080.40.60.8
11.2
H (
byte
s)
US2JpUS2Jp
2001 2 3 4 5 6 7 20080.40.60.8
11.2
• Congestion = global traffic goes to H ' 0.5• However : flows still see relevant LRD :
median on sketch’s output ∼ usual traffic, H ' 0.8 to 0.9
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 28 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
LRD in Pkt or Bytes Times series ?(Non Robust Global Estimation)
Scatter plots of H(B) (byte) vs. H(P) (packet)
Hg(P)
Hg(B
)
Jp2US
Jp2USJp2USJp2USJp2US
0.4 0.6 0.8 1 1.2 1.4 1.6
0.4
0.6
0.8
1
1.2
1.4
1.6B normalB congestedB restrictedF
Hg(P)
Hg(B
)US2Jp
US2JpUS2JpUS2JpUS2Jp
0.4 0.6 0.8 1 1.2 1.4 1.6
0.4
0.6
0.8
1
1.2
1.4
1.6B normalB congestedB sasserF
o : B without congestion ; • : B with congestion ;+ : B anomaly (US2Jp) and � : F. : restricted traffic (Jp2US) ;Left : Jp2US ; Right : US2Jp.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 29 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
LRD in Pkt or Bytes Times series ?(Robust Median-sketch Estimation)
Scatter plots of H(B) (byte) vs. H(P) (packet)
Hm
(P)
Hm
(B)
Jp2US
Jp2USJp2USJp2USJp2US
0.4 0.6 0.8 1 1.2 1.4 1.6
0.4
0.6
0.8
1
1.2
1.4
1.6B normalB congestedB restrictedF
Hm
(P)
Hm
(B)
US2Jp
US2JpUS2JpUS2JpUS2Jp
0.4 0.6 0.8 1 1.2 1.4 1.6
0.4
0.6
0.8
1
1.2
1.4
1.6B normalB congestedB sasserF
o : B without congestion ; • : B with congestion ;+ : B anomaly (US2Jp) and � : F. restricted traffic (Jp2US) ;Left : Jp2US ; Right : US2Jp.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 30 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
More Gaussian or not ?Jp2US
α (j)
Jp2US
2001 2 3 4 5 6 7 20080
5
10
15
20
25
30
35
US2Jp
α (j)
US2Jp
2001 2 3 4 5 6 7 20080
5
10
15
20
25
30
35
Jp2US
α (j)
/ α
(J)
Jp2US
2001 2 3 4 5 6 7 2008
0
0.2
0.4
0.6
0.8
1US2Jp
α (j)
/ α
(J)
US2Jp
2001 2 3 4 5 6 7 2008
0
0.2
0.4
0.6
0.8
1
Top : indices αj , as a function of time, j = 2,4,6,8,9. Bottom :normalized α′j = αj/αJ (J = 9). Left : Jp2US ; Right : US2Jp.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 31 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
and One Day, 2008/03/19 : Global vs. Median
0h 4h 8h 12h 16h 20h 0h0.40.60.8
11.2
Hpkt
0h 4h 8h 12h 16h 20h 0h0.40.60.8
11.2
HByt
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 32 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
and One Day, 2008/03/19 : LRD ?
2ms 16ms 128ms 1s 8s 64s 512s
LDm
(6h)
LDm
(15min)
LD for Pkt count
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 33 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Robust Estimation
⇒ [Borgnat et al.,”Seven Years and One Day : Sketching the Evolution ofInternet Traffic”, Infocom2009]
⇒ Find outliers→ Anomaly Detection !
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 34 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
OutlineIssues
MultiresolutionMarginalsCovariance
Random Projections
Robust EstimationPrincipleSeven years . . . and One Day
Anomaly DetectionPrincipleSeven years
Future
Appendix
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 35 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Anomaly Detection ?• Issue(s) 6 :
- What traffic ? What data ? What information available ?(e.g., Netflow vs. Pkt, single link vs. network widemonitoring, . . . )
- What are the rules and Goals ? (Computational load,memory, sampling, how precise location, IP identification,Nature of the anomaly ?)
- Signatures (determinist) vs. Profiles (Statistics) ? Sign vs Prof
• Answers :- MAWI database, no anomaly documentation, Single Link- Pkt Level (5−tuple), no payload, no bidirectionality- Statistical detection : no a priori list of known anomalies, low
signal to noise ratio, anomalous in a stat way (possibly legit)
• Example : DDoS
• References : Biblio.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 36 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Anomaly Detection : Key Steps of our Contribution[Sketch based Anomaly Detection, Identification,.... Abry, Borgnat, Dewaele. SAINT’07][Extracting Hidden Anomalies using Sketch and Non Gaussian Multiresolution Statistical
Detection Procedures. Dewaele, Fukuda, Borgnat, Abry & Cho. LSAD Sigcomm’07]
- Step 1 : Sketches (for adaptive reference, no model, noprediction, no a priori, no learning)
- Step 2 :Multiresolution (to avoid a priori aggregation levelchoice)
- Step 3a :Gamma parameters (path to Gaussianity insteadof Gaussianity itself)
- Step 3b :Farima (Long vs. Short dependencies)- Step 4 : Detection : Comparison across aggregation levels
and across sketches
hey Man, u r late !
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 37 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Step 1 : Sketches (random projection/sampling)
×N
• Sketch of M Outputs × N different choices of hash tables• Hashing Key : IPSource , IPDestination...
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 38 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Step 2 : Multiresolution or Multi-Scale AggregationAnalysis
• Aggregated traffic with scales : 5ms, 10ms, ..., 1s
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 39 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Step 3 :- Modeling with non-Gaussian statistics
• Gamma laws : parameters α(∆) and β(∆)
Γα,β(x) =1
βΓ(α)
(xβ
)α−1
exp(−xβ
).
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 40 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Step 4 : Comparisons across Scales and Sketches
• Compute median and standard deviation across outputs.• Anomaly : one output is too far from the average.• Too far : Mahalanobis distance :
Dα =
1J
J∑j=1
|αn∆j− αRef
∆j|2
σ2α,∆j
1/2
>threshold.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 41 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Algo. : Sketches + Multiresolution + Gamma statistics
• Enhanced contrast of anomaly wrt. background• Adaptive Reference (extracted from traffic, not a priori)• No a priori on Typical Time Scales,• Identification of IPAddress associated of anomaly.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 42 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Identification of IP involvedUse N different sketches (or hash tables) on the same key
• IP that are not always in anomalous outputs = normal• IP that are always in anomalous outputs = anomalies• Collisions : #C = NIPM−2N � 1⇒ N > 5 (with M = 32).
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 43 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
OutlineIssues
MultiresolutionMarginalsCovariance
Random Projections
Robust EstimationPrincipleSeven years . . . and One Day
Anomaly DetectionPrincipleSeven years
Future
Appendix
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 44 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Anomaly Detection : Longitudinal Study
2001 2 3 4 5 6 7 2008
Jp2U
S0
14
Ping
2001 2 3 4 5 6 7 2008
US
2Jp
014
Sasser
Ping
Red : Definitely attacks : Ping/SYN floods, spoofed,...Yellow : Potentially attacks : various mechanisms.Green : Suspicious traffic : WWW, P2P, GRE, DNS.
• Numerous Anomalies, Each Day (more than 12 large),Large Varieties (in nature, time scales, goals, impacts,...)• Normal Traffic barely exists⇒ Need for Sketches• No Ground Truth⇒ Human Inspection,• A heuristic rule based classifier (port #) classifier
U r really late, Man !
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 45 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Anomaly Detection versus Traffic Classification
• Rule-based traffic classification (port #, heuristic rules,...)+ Anomaly Detection
Classif.
• Host-based traffic classification to cross-validate/helpAnomaly Detection (under progress)
- Random projections- Minimum Spanning Tree
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 46 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Conclusions and Perspectives
• Conclusions :- Gaussian vs. non Gaussian, Long vs. short memory,- Multiresolution Analysis and Modeling,- Random projections (Sketches)⇒ Robustness, Adaptativeness
• Perspectives :- Host based traffic classification, for anomaly detection- Scientific comparisons and assessments :
What (public) data ?What rules ? What Information allowed to use ?What goals ?How to compare ? Methodology ? Framework ?Willingness ?
- Care for history ! POP : Colosseum Example !
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 47 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
References
• Non Gaussian and Long Memory Statistical Modeling ofInternet Traffic, Scherrer et al. IEEE TDSC07.
• Sketch based Anomaly Detection, Identification,... Abry,Borgnat, Dewaele. SAINT’07.
• Extracting Hidden Anomalies using Sketch and NonGaussian Multiresolution Statistical Detection Procedures,Dewaele, Fukuda, Borgnat, Abry & Cho. LSADSigcomm’07.
• Seven Years and One Day : Sketching the Evolution ofInternet Traffic, Borgnat et al., Infocom2009.
perso.ens-lyon.fr/patrice.abry
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 48 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
MAWI data : B-US2Jp, 2005/07/11
MiB/s
0s 150 300 450 600 750 900s0
0.51
1.52
PDF (#MiB/Deltaj)
0 0.1 0.2 0.3
8ms32ms64ms128ms
LD for Byte count H=0.94
2ms 16ms 128ms 1s 8s 64s
103 Pkt/s
0s 150 300 450 600 750 900s01234
PDF (#Pkt/Deltaj)
0 200 400 600
8ms32ms64ms128ms
LD for Pkt count H=0.92
2ms 16ms 128ms 1s 8s 64s
• Compares well with current knowledge and theory/models
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 49 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
MAWI data : B-US2Jp, 2003/06/03
MiB/s
0s 150 300 450 600 750 900s0
0.51
1.52
PDF (#MiB/Deltaj)
0 0.1 0.2 0.3
8ms32ms64ms128ms
LD for Byte count H=0.41
2ms 16ms 128ms 1s 8s 64s
103 Pkt/s
0s 150 300 450 600 750 900s01234
PDF (#Pkt/Deltaj)
0 200 400 600
8ms32ms64ms128ms
LD for Pkt count H=0.89
2ms 16ms 128ms 1s 8s 64s
• Congestion.
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 50 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
MAWI data : B-Jp2US, 2004/09/21
MiB/s
0s 150 300 450 600 750 900s0
0.51
1.52
PDF (#MiB/Deltaj)
0 0.1 0.2 0.3
8ms32ms64ms128ms
LD for Byte count H=0.73
2ms 16ms 128ms 1s 8s 64s
103 Pkt/s
0s 150 300 450 600 750 900s01234
PDF (#Pkt/Deltaj)
0 200 400 600
8ms32ms64ms128ms
LD for Pkt count H=1.00
2ms 16ms 128ms 1s 8s 64s
• Anomalies :network scan, spoofed flooding, attack on a Realserver
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 51 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Longitunal study : Pkt Size
2001 2 3 4 5 6 7 20080%
20%
40%
60%
80%
100%
Jp2US smallmediumlarge
2001 2 3 4 5 6 7 20080%
20%
40%
60%
80%
100%
US2Jp smallmediumlarge
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 52 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Long Range Dependence
Definition of Long Range Dependence
Covariance is a non-summable power-law→ spectrum fX∆(ν) :
fX∆(ν) ∼ C|ν|−γ , |ν| → 0, with 0 < γ < 1.
Long Range Dependence and Wavelets
IES(j) =
∫fX (ν)2j |Ψ0(2jν)|2du ' fX (ν = 2−jν0).
LRD =⇒ IES(j) ∼ C2j(γ−1),2j → +∞.
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 53 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Wavelet Transform• Let ψ0 denote an elementary mother wavelet,• Shifted and dilated templates of ψ0 :ψj,k (t) = 2−j/2ψ0(2−j t − k),
• Wavelet Coefficients : dX∆(j , k) = 〈ψj,k ,X∆〉.
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 54 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Anomaly Detection : Some references
• Dimension reduction :- PCA, subspaces [Lakhina ’04]→ “normality” in time- Sketches [Muthukrishnan ’03], [Krishnamurty ’03]
• Model + prediction in time :- Anomaly = observation is different from prediction- [Brutlag ’00], [Barford ’02], [Zhang ’05]...
• Our contribution ?
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 55 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Anomalies in Internet Traffic – Detection ?Overview of strategies for anomaly detection
• Methods based on signatures• recognition of packets• avantage : robust• drawbacks : limited to known anomalies, with specific
signatures, scalability with increasing number ofanomalies ?
• Methods based on anomalies or statistical profile• use statistical properties of traffic : normal vs. abnormal• avantage : versatile, indifferent to number of signatures• drawbacks : variability of traffic• statistics→ false alarm vs. detection prob. trade-off
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 56 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Anomaly Detection : DDoSSchematic scenario of DDoS
• Attack with packets without specific signatures• Objective : detection in low SNR = close to the source
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 57 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Results : Longitudinal analysis of traffic + anomaliesMAWI dataset : 15’ per day, trans-pacific backbone
2001 2 3 4 5 6 7 2008
Jp2U
S10
0%0%
HTTP
Peer to peerP
ing
flood
suspected P2P
2001 2 3 4 5 6 7 2008
US
2Jp
0%10
0%
HTTP
Peer to peer
Pin
g fl.
Sas
ser
wor
m
suspectedP2P
• Bottom to top :Ping, DNS, common services, MS vulnerarities, Sasser,HTTP, broadcast, suspected P2P, identified P2P, otherTCP/UDP,INLSP (left) / GRE (right).
• Large proportion of hidden P2P, and of anomalies !
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 58 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Synthesis of a Γ-farima processProcedure.
• Mapping – 1st order stat. : if Yj(k) is a Gaussian r.v. withvariance β/2, then
X (k) =2α∑j=1
Yj(k)2 (1)
is a Γα,β r.v.• Mapping – 2nd order stat. : as a consequence,
γY (k) =√γX (k)/4α. (2)
• Procedure : generate 2α Gaussian processes withcovariance γY derived with (2) from the farima covariance,then obtain X from (1).
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 59 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Synthesis of a Γ-farima processProcedure.
• Mapping – 1st order stat. : if Yj(k) is a Gaussian r.v. withvariance β/2, then
X (k) =2α∑j=1
Yj(k)2 (3)
is a Γα,β r.v.• Mapping – 2nd order stat. : as a consequence,
γY (k) =√γX (k)/4α. (4)
• Procedure : generate 2α Gaussian processes withcovariance γY derived with (2) from the farima covariance,then obtain X from (1).
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 60 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
Anomaly Detection : DDoSSchematic scenario of DDoS
• Attack with packets without specific signatures• Objective : detection in low SNR = close to the source
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 61 / 48
Issues Multiresolution Random Projections Robust Estimation Anomaly Detection Future Appendix
POP : Colosseum Example
Back
Multiresolution and Random Projections. for Robust Estimation - P. Abry - TMA Eu. Cost Action - Barcelona, Spain - Oct. 2009 - 62 / 48