recuperabit: forensic file system reconstruction given partially corrupted metadata

24
RecuperaBit: Forensic File System Reconstruction Given Partially Corrupted Metadata CANDIDATE Andrea Lazzarotto SUPERVISOR Ch. Prof. Riccardo Focardi Ca’ Foscari Dorsoduro 3246 30123 Venezia Università Ca’Foscari Venezia

Upload: andrea-lazzarotto

Post on 10-Feb-2017

1.099 views

Category:

Software


8 download

TRANSCRIPT

RecuperaBit: Forensic File SystemReconstruction Given PartiallyCorrupted Metadata

C A N D I D AT E Andrea LazzarottoS U P E RV I S O R Ch. Prof. Riccardo Focardi

Ca’ FoscariDorsoduro 324630123 Venezia

UniversitàCa’FoscariVenezia

F I L E S Y S T E M R E C O N S T R U C T I O N

Work:

• File system reconstruction from damaged metadata

• Detection of partition geometry

• Tests against similar software

Motivation:

• File system analysis is used in many investigations

• Carving does not provide context

• File systems may be damaged

C O N T E N T S

1. F O R E N S I C F I L E S Y S T E M A N A LY S I S

Problem definition and N T F S features

2. F I L E S Y S T E M R E C O N S T R U C T I O N A L G O R I T H M

Tree reconstruction and partition detection

3. S O F T WA R E I M P L E M E N TAT I O N

Test results

F O R E N S I C F I L E S Y S T E M

A N A LY S I S

P R O B L E M D E F I N I T I O N

Problem (Forensic File System Reconstruction). Develop an algo-rithm that reconstructs the directory structure of one or moretypes of file systems.

I N P U T

1. Bitstream copy of drive

2. File system types to search

O U T P U T

Files divided in Root and Lost Files, for each detected file system.

FILE SYSTEM STRUCTURE

5 Root

0 $MFT

1 $MFTMirr

2 $LogFile

3 $Volume

4 $AttrDef

6 $Bitmap

7 $Boot

8 $BadClus

8:$Bad $BadClus:$Bad

9:$SDS $Secure:$SDS

9 $Secure

10 $UpCase

11 $Extend

25 $ObjId

24 $Quota

26 $Reparse

66 bbb.txt

64 interesting

65 aaa.txt

−1 LostFiles

67 Dir_67

68 another

N T F S

Interesting artifacts:

• B O O T S E C T O R S → partition geometry

• M F T E N T R I E S → identifier, name, timestamps of files

• I N D E X R E C O R D S → contents of directories

C O R R U P T E D M E TA D ATA ( E X A M P L E )

Hard drive

New file system

Old file system

Boot sector

MFT MFT mirror

Backup boot sector

Result

F I L E S Y S T E M R E C O N S T R U C T I O N

A L G O R I T H M

D I S K S C A N N I N G

• The disk is S C A N N E D for artifacts (metadata carving)

• File records are C L U S T E R E D in partitions

• For N T F S: p = y− sx where s = 2

Hard drive

Sector y

Entry number x

3014 3016 3018 3020 3022 3024 3026 3028

29 30 31 32 33 102 103 104

Value of p2956 2956 2956 2956 2956 2820 2820 2820

D I R E C T O RY T R E E R E C O N S T R U C T I O N

Each node is linked to its parent (bottom-up reconstruction).

When the parent is not available, a ghostentry is created under Lost Files.

linked to parent↑

PA R T I T I O N G E O M E T RY

Needed for extracting file contents and accessing external at-tributes in N T F S (including index records).

Parameters:

• S P C (Sectors per Cluster)

• C B (Cluster Base)→ where the file system starts

I N F E R E N C E O F PA R T I T I O N G E O M E T RY

Procedure:

1. Fingerprinting index records

2. Generation of text (from disk)

3. Generation of patterns (from partitions and S P C

enumeration)

4. Matching

T E X T G E N E R AT I O N ( E X A M P L E )

The following index records are found on disk:

S E C T O R O W N E R I D

54 1462 2378 14

The resulting T E X T is:

. . . ∅ ∅ 14 ∅ ∅ ∅ ∅ ∅ ∅ ∅ 23 ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ 1454 62 78

PAT T E R N G E N E R AT I O N ( E X A M P L E )

Given the file records:

M F T E N T R YP O I N T E R S T O

R E C O R D S ( R U N L I S T )

14 11, 1723 13

S P C = 1 → ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ 14 ∅ 23 ∅ ∅ ∅ 140 11 13 17

S P C = 2 → . . . ∅ ∅ 14 ∅ ∅ ∅ 23 ∅ ∅ ∅ ∅ ∅ ∅ ∅ 1422 26 34

A P P R O X I M AT E S T R I N G M AT C H I N G

Each pattern is matched against the text.

The best match provides both the C B and S P C parameters.

We use an optimized version of the Baeza-Yates–Perleberg algo-rithm for approximate string matching.

S O F T WA R E I M P L E M E N TAT I O N

R E C U P E R A B I T

RecuperaBit is the software implementation of ourreconstruction algorithm:

• Modular program written in Python

• Full implementation for N T F S reconstruction

• Extensible by adding additional plug-ins

E X P E R I M E N T S

Test results:

• RecuperaBit was tested against 9 existing programs

• 4 different hard drive images were considered

• The final test involves increasing damage on one drive

F I L E S Y S T E M D E T E C T I O N

S O F T WA R E # 1 # 2 # 3 # 4

Gpart OK OK Nothing PartialTestDisk OK OK Nothing OK (+1)Autopsy OK Partial Nothing OKScrounge-NTFS OK OK Nothing OKRestorer Ultimate OK OK OK OKDMDE OK OK OK OK (+3)Recover It All Now OK Nothing Nothing OKGetDataBack OK OK Nothing OK (+1)SalvageRecovery OK OK Nothing OK (+1)RecuperaBit OK OK OK OK× 2 (+302)

D I R E C T O RY T R E E A C C U R A C Y

S O F T WA R E # 1 # 2 # 3 # 4

TestDisk Perfect Error — ErrorAutopsy Perfect No files — GoodScrounge-NTFS Partial Terrible Terrible TerribleRestorer Ultimate Perfect Partial Perfect GoodDMDE Perfect Error Perfect GoodRecover It All Now Terrible — — No filesGetDataBack Perfect Good — GoodSalvageRecovery Perfect Terrible — PerfectRecuperaBit Perfect Perfect Perfect Perfect

R E C O V E R E D F I L E C O N T E N T S

S O F T WA R E S PA R S E C O M P R E S S E D E N C R Y P T E D

TestDisk OK OK EmptyAutopsy Empty OK OKScrounge-NTFS OK Unsupported OKRestorer Ultimate OK OK OKDMDE OK OK UnsupportedRecover It All Now OK Wrong OKGetDataBack Empty OK OKSalvageRecovery Empty Wrong OKRecuperaBit OK Unsupported OK

O U T P U T Q U A L I T Y V S C O R R U P T I O N L E V E L

0% 20% 40% 60% 80% 100%

Damaged sectors

0

5000

10000

15000

19399

Num

ber

offil

es

All detected filesUnreachable from Root

C O N C L U S I O N

Contributions:

• Generic bottom-up reconstruction algorithm

• Strategy for partition geometry detection (N T F S)

Results:

• Successful reconstruction in all tested cases

• Sometimes better than commercial programs