yin, h., song, d., egele, m., kruegel, c., kirda, e. in proc. of the 14th acm conference on computer...
TRANSCRIPT
Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.
In Proc. of the 14th ACM conference on Computer and communications security,
October 2007.
112/04/19 1
OutlineIntroductionPanorama System OverviewTaint GraphsMalware DetectionExperiment Results
112/04/19 2
IntroductionMalicious software (i.e., Malware) creeps
into users’ computers, collecting users’ private information, wrecking havoc on the Internet and causing millions of dollars in damage
Even software provided by reputable vendors may contain code that performs undesirable actions whichmay violate users’ privacyE.g. Google Desktop, Sony Media Player
112/04/19 3
Malware Detectionsignature-based detection
cannot detect new malware or new variants.Heuristics-based detection
often based on some heuristics such as the monitoring of modifications to the registry and the insertion of hooks into certain library or system interfaces
incur high false positive and false negative rates
Malware is easy to evade detection
112/04/19 4
New Approach for malware detectionNumerous malware categories share similar
fundamental characteristics, which lies in their malicious or suspicious information access and processing behavior.
They access, tamper, and (in some cases) leak sensitive information that was not intended for their consumption.
Thus, based on this observation, the author have designed and developed an end-to-end system (Panorama) to automatically identify this fundamental trait of malicious/suspicious information.
112/04/19 5
System Overview
112/04/19 6
Components of the systemTest Engine
run a series of automated tests (may be benign or malicious)
Taint Engineperforms whole-system, fine-grained information flow
tracking.Taint Graph
a graph representation depicts the system-wide information behavior
Malware Detection Enginedetect malware from unknown samples
Malware Analysis Engineexamine the taint graphs, for detailed analysis
information112/04/19 7
Design and ImplementationHardware-level taint trackingOS-Aware Taint TrackingAutomated Testing and Taint Graph
Generation
112/04/19 8
Hardware-level taint trackingSince the source code for commodity software such as the
Windows operating system and applications are usually not available, they monitor the whole system execution in a processor emulator and dynamically instrument code to keep track of how tainted data propagates during program execution.
Shadow Memory to store the taint status of each byte of the physical
memory, CPU’s general purpose registers, the hard disk and the network interface buffer
Taint Sources from hardware Panorama supports taint input from hardware, such as the
keyboard, network interface, and hard disk.Taint Propagation
monitor each CPU instruction and DMA operation that manipulates this data
112/04/19 9
OS-Aware Taint TrackingResolving process and module informationResolving filesystem and network information
when tainted data is written to the hard disk or sent over the network
Identifying the code under analysis and its actions
112/04/19 10
Automated Testing and Taint Graph GenerationAutomated Testing
without human intervention, Panorama executes a number of test cases that mimic common tasks that a user might perform E.g. editing text in an editor, visiting several
websites, and so on
Taint Graph GenerationThe system-wide propagation of tainted input
introduced by the test engine forms a graph over the processes/program modules and OS resources.
112/04/19 11
Taint GraphA taint graph can be represented as g =(V,E),
where V is a set of vertices either represent an operating
system object (such as a process or module), an OS resource (such as a file), or a taint source (such as keyboard or network input with the appropriate labels)
E is a set of directed edges connecting the vertices when tainted data is propagated from the entity that corresponds to vertices.
g.root represents the root node of graph g (i.e., the taint source). Currently, Panorama defines the following nine different
types of taint sources: text, password, HTTP, HTTPS, ICMP, FTP, document, and directory
112/04/19 12
Taint Graph Example1. A user process A reads the character that
corresponds to the keystroke2. When this process later writes the character
into a file F 3. File F is then read by process B, we can
establish a link from process A to the file, and subsequently from file F to process B.
text A F B
112/04/19 13
Taint-Graph-Based Malware DetectionAnomalous information access behavior
For some information sources, a simple access performed by the samples under analysis is suspicious.
Anomalous information leakage behaviorFor some other information sources, it is acceptable
for the samples to access them locally, but unacceptable to leak the information to third parties.
Excessive information access behaviorFor some information sources, benign samples may
access some of them occasionally, while malicious samples will access them excessively to achieve their malicious intent.
112/04/19 14
Test cases and policies
they specify the following policies:text, password, FTP, UDP and ICMP inputs cannot
be accessed by the samplesURL, HTTP, HTTPS and document inputs cannot be
leaked by the samplesdirectory inputs cannot be accessed excessively by
the samples.
112/04/19 15
Automatic Policies GenerationIt is possible to automatically generate policies by
using machine learning techniques.First, they can gather a representative collection of
malware and benign samples as our training set.Based on the feature vectors for the benign and
malicious samples, standard classification algorithms can be applied to determine a model.
Using this model, novel samples can then be classified. We will further explore this approach in our “future work”.
112/04/19 16
Malware Detection Example
This graph reflects the procedure for Windows user authentication.
While a password thief is running in the background, it catches the password and saves them to its log file “c:\ginalog.log”.
112/04/19 17
Detection results against malware and benign samples
112/04/19 18
LimitationThe taint-graph-based detection approach
can only identify the information access and processing behavior of a given sample, but not its intent.
In real-life, the taint graphs are invaluable for human analysts, as they help them to quickly determine and understand whether an unknown sample is indeed malicious, or whether it is benign software that is exhibiting malware-like behavior.
112/04/19 19
CommentIt’s too arbitrary to asses a behavior as
malicious or benign only by few policies.Probabilistic model may help
Automatic policy generation is importantFalse positives issues
112/04/19 20