botgraph: large scale spamming botnet detection

26
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林林林

Upload: roddy

Post on 08-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

BotGraph: Large Scale Spamming Botnet Detection. Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum. Speaker: 林佳宜. References. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BotGraph: Large Scale Spamming Botnet Detection

BotGraph: Large Scale Spamming Botnet Detection   

Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum

Speaker:林佳宜

Page 2: BotGraph: Large Scale Spamming Botnet Detection

References

Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum, BotGraph: Large Scale Spamming Botnet Detection , in The 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI '09), USENIX, April 2009

Page 3: BotGraph: Large Scale Spamming Botnet Detection

3

Outline

IntroductionBotGraph ArchitectureRandom Graph TheoryHierarchical algorithmFalse Positive AnalysisConclusion

Page 4: BotGraph: Large Scale Spamming Botnet Detection

4

Introduction

Design and implement a novel system called BotGraph to detect a new type of botnet spamming attacks targeting major Web email providers.

Two months of Hotmail log containing over 500 million users.

Identified over 26 million botnetcreated user accounts with a low false positive rate.

Page 5: BotGraph: Large Scale Spamming Botnet Detection

5

Date and Environment

Each record in the input log data contains three fields: UserID, IPAddress, and Login Timestamp.

The implementation is based on the existing distributed computing models such as MapReduce and DryadLINQ

Using the same 240-machine cluster in the experiments.

Page 6: BotGraph: Large Scale Spamming Botnet Detection

6

BotGraph Architecture

BotGraph has two components: aggressive sign-up detection stealthy botuser detection based on their login

activities

Page 7: BotGraph: Large Scale Spamming Botnet Detection

7

Detection of Aggressive Signups

A sudden increase of signup activities is suspicious.

EWMA algorithm to detect sudden changes in signup activities.

Page 8: BotGraph: Large Scale Spamming Botnet Detection

8

Detection of Stealthy Bot accounts

The sharing of one IP address Multiple bot-users must log in from a

common bot

The sharing of multiple IP addresses Each account needs to be assigned to

different bots

Multiple shared IP addresses in the same Autonomous System (AS) are only counted as one shared IP address.

Page 9: BotGraph: Large Scale Spamming Botnet Detection

9

Graph-Based Bot-User Detection

Use random graph models to analyze the user-user graph , and design a hierarchical algorithm to extract such components formed by bot-users.

Page 10: BotGraph: Large Scale Spamming Botnet Detection

10

Random Graph Theory

G(n, p) as the random graph modeln-vertex graph by simply assigning an edge to each pair of vertices with probability p ∈ [0, 1]

G(n, p) has average degree d = n · p If d < 1, high probability the largest component

in the graph has size less than O(log n). If d > 1, high probability the

largest , component in the graph has size O(n).

Page 11: BotGraph: Large Scale Spamming Botnet Detection

11

spammers for assigning bot-accounts to bots

Consider the following three typical strategies1. Bot-user accounts are randomly assigned

to bots

2. The spammer assigns k available bot-users for bot request. a bot makes only one request for k bot-users each day

3. no limit on the number of bot-users a bot can request for one day and that k = 1

Page 12: BotGraph: Large Scale Spamming Botnet Detection

12

Simulate assigning strategies

Simulate the above typical spamming strategies and construct the corresponding user-user graph

model1:10000 acount 500 bot model2:pick k = 20 model3:assume the bots go online with a

Poisson arrival distribution and the length of bot online time fits a exponential distribution

Page 13: BotGraph: Large Scale Spamming Botnet Detection

13

Result

1. T is a transition point.2. Model 2 has a transition value of T = 2.3. Model 1 and 3 have the same transition value of T = 3.4. Normal users usually cannot form large components with

more than 100 nodes.

Page 14: BotGraph: Large Scale Spamming Botnet Detection

14

Extracting the graph components

From the user-user graph generated with some predefined threshold T

Need to handle the following issues Hard to choose a single fixed threshold of T Bot-users from different bot-user groups

may be in thesame connected component Exist connected components of normal

users

Page 15: BotGraph: Large Scale Spamming Botnet Detection

15

Partitioned data by IP addresses

Page 16: BotGraph: Large Scale Spamming Botnet Detection

16

Partitioned data by user IDs

Page 17: BotGraph: Large Scale Spamming Botnet Detection

17

Hierarchical algorithm

Page 18: BotGraph: Large Scale Spamming Botnet Detection

18

Bot-user groups

Page 19: BotGraph: Large Scale Spamming Botnet Detection

19

Confidence measures

BotGraph computes two histograms from a 30-day email log:

h1: the numbers of emails sent per day per user. h2: the sizes of emails.

Computes two statistics, s1 and s2, from the normalized histograms to quantify their differences:

s1: the percentage of users who sent more than 3 emails per day;

s2: the areas of peaks in the normalized email-size histogram, or the percentage of users who sent out emails with a similar size.

Both s1 and s2 are in the range of [0, 1] and can be used as confidence measures

Page 20: BotGraph: Large Scale Spamming Botnet Detection

20

Bot-User Pruning

Page 21: BotGraph: Large Scale Spamming Botnet Detection

21

Performance Evaluation[1/2]

Page 22: BotGraph: Large Scale Spamming Botnet Detection

22

Performance Evaluation[2/2]

Page 23: BotGraph: Large Scale Spamming Botnet Detection

23

False Positive Analysis

Naming Patterns User-name template. such names are

‘w9168d4dc8c5c25f9” and ‘x9550a21da4e456a2”.

Only 0.44% of the identified bot-users do not strictly follow the naming templates.

Signup Dates Only 0.08% bot-users were signed up before

year 2007 About 59.1% of bot-account of all the input

dataset signed up before 2007 Adjust the false positive rate to be

0.08%/59.1% = 0.13%

Page 24: BotGraph: Large Scale Spamming Botnet Detection

24

Naming pattem score

Page 25: BotGraph: Large Scale Spamming Botnet Detection

25

Conclution

BotGraph is implemented as a parallel Dryad/DryadLINQ application running on a large-scale computer cluster

Using two-month Hotmail logs, BotGraph successfully detected more than 26 million botnet accounts

The experience will be useful to a wide category of applications for constructing and analyzing large graphs

Page 26: BotGraph: Large Scale Spamming Botnet Detection

26

Thank you