b@bel:leveraging email delivery for spam mitigation
DESCRIPTION
Gianluca Stringhini, Manuel Egele, Apostolis Zarras, Thorsten Holz, Christopher Kruegel, and Giovanni Vigna. B@bel:Leveraging Email Delivery for Spam Mitigation. University of California, Santa Barbara Ruhr-University Bochum. Usenix Security 2012. - PowerPoint PPT PresentationTRANSCRIPT
B@bel:Leveraging Email Delivery for Spam Mitigation
Usenix Security 2012
Gianluca Stringhini, Manuel Egele, Apostolis Zarras, Thorsten Holz,
Christopher Kruegel, and Giovanni Vigna
University of California, Santa Barbara Ruhr-University Bochum
Outline
Introducion
Background
Approach
Evaluation
Conclusion
IntroducionKASPERSKY LAB. Spam Report: April 2012.
Email spam Accounting for more than 77% of all email traffic
https://www.securelist.com/en/analysis/204792230/Spam_Report_April_2012
SYMANTEC CORP. State of spam & phishing report
http://www.symantec.com/business/theme.jsp?themeid=state_of_spam
About 85% of world-wide spam traffic is sent by botnets
Traditional spam dection systems
1.Content analysis
2.Origin base
Ex.Blacklists
============new way==========
Focus on the email deliivery mechanism
(How messages are sent by spammers)
Background
SMTP
(mail user agent )eg: Outlook
(mail transfer agent )
eg: msa.hinet
From wiki
eg: Hotmail
SMTP Conversaction
SMTP
Reply:220 msr5.hinet.net ESMTP Sendmail 8.14.2/8.14.2; Sun, 29 Jul 2012 17:38:35 +0800 (CST)
EHLO adl.com
Reply:250-msr5.hinet.net Hello 114-34-35-96.HINET-IP.hinet.net [114.34.35.96], pleased to meet you
MAIL FrOm:<[email protected]>
Reply:250 2.1.0 <[email protected]>... Sender ok
rCpt tO: <[email protected]>
Reply:250 2.1.5 <[email protected]>... Recipient ok
Data
Reply:354 Enter mail, end with "." on a line by itself
SubJECT : HI i am dada
YOYOYO
test !!!`~~~~
...
.
Reply:250 2.0.0 q6T9cZtc012399 Message accepted for delivery
SMTP
SMTP RFC defines 14 commands.
Each command consists of four case-insensitive,alphabetic-character command codes
One or more space characters separate command codes
All command are terminated by line terminator(<CR><LF>)
Smtp replies :three-digit status code+space+description
(one line ,e.g., 250 OK)
RFC 821
Approach
SMTP Dialects
Different clients might implement the SMTP protocol in slightly different ways.
1.RFCs Do not always provide a single Format (e.g.,EHLO vs HELO)
2.Using different extension,client might add different parameters
3.Server accept commands that do not comply with the strict SMTP definitions
Learning Dialects
Passively observe ( )
A set of SMTP conversations
Each conversation is a sequence of <reply,command> pairs
E.g.,<220 hinet.net, EHLO adl.com>
Active probing
Send specifically-crafted replies to a client
And observe its responses
Active probing
Standard SMTP replies (e.g., send error)
Addiional SMTP replies (e.g., send twice)
Out-of-order Smtp replies
Missing replies (nerver sends a reply to a command)
Compliant replies (e.g., hOsT)
Incorrect replies (e.g., 9999)
incorrectly-terminated replis (e.g.,<CR><CR>)
Regular expressions
MAIL FROM:<[email protected]>MAIL FROM:[email protected]
MAIL FROM:<email-addr>
Mail From :[email protected]
Mail From :<email-addr>
E.g.,<220 hinet.net, EHLO adl.com> <220 hostname,EHLO domain>
wiki
State machine
spam
<Reply ,Command> <transaction, state>
E.g.,<220 hostname,EHLO domain>
Gmail
Decision state Machine
Wolf
WOLF, W. An Algorithm for Nearly-Minimal Collapsing of Finite-State Machine Networks.
(ICCAD) (1990).
Making a descison
E.g.,<220 hostname,EHLO domain> ...
E.g.,<220 hostname,HELO domain>
<250 OK,MAIL FROM:<email-addr>> ...
< Reply,Command>
E.g.,<220 hostname,HELO domain>
<250 OK,RSET> ...
C3 unknow
C3 unknow
C2
unknowunknow
The Botnet Feedback Mechanism
The Botnet Feedback Mechanism
Some spammers take server feedback into account
e.g., recopient address does not exist
Cutwail : 35% email address were not exist [38]
Providing False Responses to Spam Emails.
[38]http://www.iseclab.org/papers/cutwail-LEET11.pdf
Evaluation
Enviroment
B@bel
1.Virtual machine zoo
2.gateway
3.learner => decision fsm =>
4.decision maker
Evaluating dialects for Classification Run BabelTraining set (13 legitimate , 91malware)
Legitimate MUAs and MTAs are distinct from Bots Legitimate MUAs and MTAs are all speak distinct dialects (except for Outlook Express and Windows Live Mail)
91malware: 48 dialects Same dialects belong to the same family
Evaluating Dialects for Spam Detection
Run Babel
SMTP converastions for 621919 email messages(40days)
7114 bot samples[4] >> bad dialects
MUA+MTA+webmail >> good dialects
Passive spam detection
Decision machine do not recognize the conversaction >> mark as spam
Evaluating Dialects for Spam Detection
621919 email (ALL)
260074 spam , 218675 ham ,143170 ??
Verify
true positive
IP blacklist (30) + resolve domain
99.32% true positive
False negative
21% False negative
(misused web mail account,dedicated MTA)
(half is legitimate MTAs)
Limitations and Evasion
Evading dialects detection:
Use an existing open source smtp engine (CDO)
But spambots are built for performance
Bagle(a spam bot) : 20ms / a letter
CDO(windows) : 200ms / a letter
collaboration data objects library
Conclusion
Introduced a novel way to detect and mitigate spam emails
We study how the feedback mechanism used by botnets can be poisoned
Empirical result confirm that our approach can be used to detect and mitigate spam emails.
THANKS