yin, h., song, d., egele, m., kruegel, c., kirda, e. in proc. of the 14th acm conference on computer...

Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.

In Proc. of the 14th ACM conference on Computer and communications security,

October 2007.

112/04/19 1

OutlineIntroductionPanorama System OverviewTaint GraphsMalware DetectionExperiment Results

112/04/19 2

IntroductionMalicious software (i.e., Malware) creeps

into users’ computers, collecting users’ private information, wrecking havoc on the Internet and causing millions of dollars in damage

Even software provided by reputable vendors may contain code that performs undesirable actions whichmay violate users’ privacyE.g. Google Desktop, Sony Media Player

112/04/19 3

Malware Detectionsignature-based detection

cannot detect new malware or new variants.Heuristics-based detection

often based on some heuristics such as the monitoring of modifications to the registry and the insertion of hooks into certain library or system interfaces

incur high false positive and false negative rates

Malware is easy to evade detection

112/04/19 4

New Approach for malware detectionNumerous malware categories share similar

fundamental characteristics, which lies in their malicious or suspicious information access and processing behavior.

They access, tamper, and (in some cases) leak sensitive information that was not intended for their consumption.

Thus, based on this observation, the author have designed and developed an end-to-end system (Panorama) to automatically identify this fundamental trait of malicious/suspicious information.

112/04/19 5

System Overview

112/04/19 6

Components of the systemTest Engine

run a series of automated tests (may be benign or malicious)

Taint Engineperforms whole-system, fine-grained information flow

tracking.Taint Graph

a graph representation depicts the system-wide information behavior

Malware Detection Enginedetect malware from unknown samples

Malware Analysis Engineexamine the taint graphs, for detailed analysis

information112/04/19 7

Design and ImplementationHardware-level taint trackingOS-Aware Taint TrackingAutomated Testing and Taint Graph

Generation

112/04/19 8

Hardware-level taint trackingSince the source code for commodity software such as the

Windows operating system and applications are usually not available, they monitor the whole system execution in a processor emulator and dynamically instrument code to keep track of how tainted data propagates during program execution.

Shadow Memory to store the taint status of each byte of the physical

memory, CPU’s general purpose registers, the hard disk and the network interface buffer

Taint Sources from hardware Panorama supports taint input from hardware, such as the

keyboard, network interface, and hard disk.Taint Propagation

monitor each CPU instruction and DMA operation that manipulates this data

112/04/19 9

OS-Aware Taint TrackingResolving process and module informationResolving filesystem and network information

when tainted data is written to the hard disk or sent over the network

Identifying the code under analysis and its actions

112/04/19 10

Automated Testing and Taint Graph GenerationAutomated Testing

without human intervention, Panorama executes a number of test cases that mimic common tasks that a user might perform E.g. editing text in an editor, visiting several

websites, and so on

Taint Graph GenerationThe system-wide propagation of tainted input

introduced by the test engine forms a graph over the processes/program modules and OS resources.

112/04/19 11

Taint GraphA taint graph can be represented as g =(V,E),

where V is a set of vertices either represent an operating

system object (such as a process or module), an OS resource (such as a file), or a taint source (such as keyboard or network input with the appropriate labels)

E is a set of directed edges connecting the vertices when tainted data is propagated from the entity that corresponds to vertices.

g.root represents the root node of graph g (i.e., the taint source). Currently, Panorama defines the following nine different

types of taint sources: text, password, HTTP, HTTPS, ICMP, FTP, document, and directory

112/04/19 12

Taint Graph Example1. A user process A reads the character that

corresponds to the keystroke2. When this process later writes the character

into a file F 3. File F is then read by process B, we can

establish a link from process A to the file, and subsequently from file F to process B.

text A F B

112/04/19 13

Taint-Graph-Based Malware DetectionAnomalous information access behavior

For some information sources, a simple access performed by the samples under analysis is suspicious.

Anomalous information leakage behaviorFor some other information sources, it is acceptable

for the samples to access them locally, but unacceptable to leak the information to third parties.

Excessive information access behaviorFor some information sources, benign samples may

access some of them occasionally, while malicious samples will access them excessively to achieve their malicious intent.

112/04/19 14

Test cases and policies

they specify the following policies:text, password, FTP, UDP and ICMP inputs cannot

be accessed by the samplesURL, HTTP, HTTPS and document inputs cannot be

leaked by the samplesdirectory inputs cannot be accessed excessively by

the samples.

112/04/19 15

Automatic Policies GenerationIt is possible to automatically generate policies by

using machine learning techniques.First, they can gather a representative collection of

malware and benign samples as our training set.Based on the feature vectors for the benign and

malicious samples, standard classification algorithms can be applied to determine a model.

Using this model, novel samples can then be classified. We will further explore this approach in our “future work”.

112/04/19 16

Malware Detection Example

This graph reflects the procedure for Windows user authentication.

While a password thief is running in the background, it catches the password and saves them to its log file “c:\ginalog.log”.

112/04/19 17

Detection results against malware and benign samples

112/04/19 18

LimitationThe taint-graph-based detection approach

can only identify the information access and processing behavior of a given sample, but not its intent.

In real-life, the taint graphs are invaluable for human analysts, as they help them to quickly determine and understand whether an unknown sample is indeed malicious, or whether it is benign software that is exhibiting malware-like behavior.

112/04/19 19

CommentIt’s too arbitrary to asses a behavior as

malicious or benign only by few policies.Probabilistic model may help

Automatic policy generation is importantFalse positives issues

112/04/19 20

yin, h., song, d., egele, m., kruegel, c., kirda, e. in proc. of the 14th acm conference on computer...

Documents