In this module

Extracting and Parsing the MFT

14 hours · Module 1 · Free

Operational Objective

The previous six subsections taught you what the MFT contains at the binary level — record headers, attributes, timestamps, data storage, directory indexes, and sequence numbers. This subsection bridges the gap between that binary knowledge and practical forensic work: how to extract the MFT from a live system or forensic image, how to parse it with MFTECmd into analysis-ready output, how to verify that the extraction captured the complete MFT without corruption, and how to validate the parser's output against the raw data for critical records. The extraction method matters because an incomplete or corrupted MFT extraction produces incomplete or incorrect forensic results — and the examiner may not realize data is missing unless they verify the extraction. The parsing configuration matters because MFTECmd's default output omits certain fields (like $FN timestamps in some output formats) that are critical for timestomping detection and timeline accuracy. This subsection establishes the extraction-to-analysis workflow that you will use throughout the remainder of this course.

Deliverable: A validated MFT extraction and parsing workflow: KAPE collection targeting the MFT, MFTECmd parsing with optimal flags for forensic analysis, output verification (record count, entry range, hash integrity), and a method for validating parser output against raw hex for critical evidence files.

Estimated completion: 40 minutes

Figure WF1.7 — The four-step MFT extraction and parsing workflow. Collection with hash verification, structural validation of the raw MFT file, parsing with MFTECmd including $FN timestamps and deleted entries, and analysis in Timeline Explorer with raw hex verification for critical evidence files.

Extraction methods

The MFT is a file — specifically, it is the file represented by MFT entry 0, stored at a location defined in the NTFS boot sector. Unlike regular files, the MFT cannot be copied through normal file operations while Windows is running because NTFS holds an exclusive lock on it. Extracting the MFT requires a tool that can read locked files or access the raw disk.

KAPE (Kroll Artifact Parser and Extractor) is the standard collection tool for DFIR triage. KAPE's $MFT target extracts the Master File Table using a raw disk read that bypasses the NTFS lock. The command kape.exe --tsource C: --tdest D:\Evidence\%m --target $MFT collects the MFT from the C: volume. KAPE handles the raw read, hashing, and evidence logging automatically. The output is a file named $MFT in the target destination directory.

FTK Imager can extract the MFT from a live system or a forensic image. On a live system: File → Obtain Protected Files. From a forensic image: mount or open the image, navigate to the root of the NTFS volume, and export the $MFT file. FTK Imager generates a hash during export.

Raw disk access tools (dd on Linux, Arsenal Image Mounter for mounting images) can extract the MFT by reading the appropriate sectors. The MFT's starting cluster is defined in the NTFS boot sector at offset 0x30 (8 bytes, little-endian, the cluster number of the $MFT file). To find the byte offset: multiply the starting cluster by the cluster size (also defined in the boot sector). Then read from that offset for the file's total size (defined in the MFT entry 0's $DATA attribute).

For forensic images (E01, raw/dd), mount the image read-only with Arsenal Image Mounter and then use KAPE or FTK Imager to extract the MFT from the mounted volume. The image's write-protection ensures the extraction doesn't modify the evidence.

Compliance Myth: "Extracting the MFT from a live system modifies the evidence — you should only work with disk images"

Extracting the MFT from a live system using KAPE or FTK Imager is a read-only operation on the MFT itself — the tool reads the raw MFT data without modifying it. However, the act of running any tool on a live system causes incidental modifications: the tool's executable is loaded (Prefetch entry created or updated), the tool reads the MFT (potentially updating the MFT's own accessed timestamp), and the output file is written to a destination volume (modifying that volume's MFT and USN Journal). These incidental modifications are expected and documented as part of the collection methodology. They do not invalidate the collected MFT data. Best practice: document the tool used, the version, the command executed, and the timestamp of the collection. Hash the output immediately. These steps establish the integrity of the collected artifact and account for the incidental modifications. Full disk imaging is preferred when time permits, but live MFT extraction is a valid and commonly accepted collection method for triage and rapid response.

Verifying the extraction

Before parsing, verify the extracted MFT is complete and uncorrupted. Four checks:

Hash verification. Compare the SHA256 hash of the extracted file against the hash in the collection log (KAPE generates this automatically in its log file). If the hashes don't match, the file was modified after collection — re-extract from the source.

Size verification. The MFT file size must be an exact multiple of 1,024 (the MFT record size). If the file size is not a multiple of 1,024, the extraction was truncated or padded. Calculate the expected entry count: file size ÷ 1,024. This gives you the total number of MFT entries including reserved entries (0–23) and all user file entries.

Signature spot-check. Open the raw MFT in HxD and navigate to offset 0x000 (entry 0). The first four bytes should be 46 49 4C 45 ("FILE"). Navigate to offset 0x400 (entry 1) — same signature. Navigate to the last entry (file size - 1,024) — same signature. If any entry lacks the "FILE" signature, the MFT is damaged at that location.

Known entry verification. Entry 0 should be the $MFT file itself. Entry 5 should be the root directory. Entry 2 should be $LogFile. Verify by reading the $FILE_NAME attribute in each entry and confirming the expected system filename. If these known entries contain unexpected data, the extraction offset was wrong — the file may start from an incorrect position.

Parsing with MFTECmd

MFTECmd is Eric Zimmerman's MFT parser — the standard tool for converting raw MFT data into analysis-ready CSV output. The basic command:

MFTECmd.exe -f "D:\Evidence\C\$MFT" --csv "D:\Analysis" --csvf "mft_output.csv"

This produces a CSV file with one row per MFT entry (more accurately, one row per $FILE_NAME attribute — entries with multiple $FN attributes produce multiple rows). The key output columns include:

EntryNumber — the MFT entry number (0-indexed). SequenceNumber — the current sequence number for the entry. InUse — whether the entry is currently allocated (True) or freed (False). ParentEntryNumber and ParentSequenceNumber — the parent directory reference from $FN. ParentPath — the resolved path to the parent directory. FileName — the filename from $FN. FnFlags — the $FN namespace (Win32, DOS, Win32+DOS, POSIX).

Created0x10 — $SI Created timestamp. LastModified0x10 — $SI Modified timestamp. LastRecordChange0x10 — $SI Entry Modified timestamp. LastAccess0x10 — $SI Accessed timestamp.

Created0x30 — $FN Created timestamp. LastModified0x30 — $FN Modified timestamp. LastRecordChange0x30 — $FN Entry Modified timestamp. LastAccess0x30 — $FN Accessed timestamp.

The "0x10" columns are $STANDARD_INFORMATION timestamps (the ones attackers modify when timestomping). The "0x30" columns are $FILE_NAME timestamps (the ones set by the NTFS kernel that attackers rarely modify). Both sets are essential for timestomping detection — always ensure your MFTECmd output includes both.

Additional useful flags: --de includes detailed information about deleted entries and slack data. --body generates a bodyfile for timeline creation with tools like mactime. --bdl C: sets the drive letter prefix for paths in bodyfile output.

Decision point

You are parsing a large MFT (420,000+ entries) from an enterprise workstation. MFTECmd completes without errors and produces a CSV with 418,723 rows. The raw MFT file is 432,013,312 bytes. 432,013,312 ÷ 1,024 = 421,888 entries. The parser output has 3,165 fewer rows than expected entries.

Your options: (A) The discrepancy is normal — some MFT entries are reserved system entries that MFTECmd excludes from output. (B) The discrepancy may indicate MFTECmd skipped entries with damaged or unusual attribute structures. Spot-check a sample of the missing entries by calculating their offsets in the raw MFT and examining them in HxD. If they have valid "FILE" signatures and parseable attributes, the parser may have a bug with those specific records — manually examine them and document the parser limitation.

The correct approach is B, with a caveat. Some discrepancy is expected — MFTECmd may consolidate entries with multiple $FN attributes differently in certain output modes, and some freed entries with completely zeroed content may be omitted. But a gap of 3,165 entries warrants investigation. Calculate the entry numbers of the missing records by comparing the EntryNumber column in the CSV against the expected range (0 to 421,887). For a random sample of missing entries, navigate to their raw offset and check the signature. If they have valid "FILE" signatures and non-zero content, the parser skipped them — examine them manually and assess whether the skipped entries are forensically relevant to your investigation.

Loading output in Timeline Explorer

Timeline Explorer is Eric Zimmerman's CSV analysis tool designed for forensic data. Load the MFTECmd output CSV: File → Open → select the CSV. Timeline Explorer handles large files (400,000+ rows) and provides filtering, sorting, conditional formatting, and column management.

Essential first actions when loading MFT output:

Sort by InUse to separate active and deleted entries. Deleted entries (InUse = False) are your recovery candidates. Active entries (InUse = True) represent the current filesystem state.

Add conditional formatting for timestamp comparisons. Highlight rows where Created0x10 (SI Created) differs significantly from Created0x30 (FN Created) — these are timestomping candidates. Timeline Explorer supports conditional formatting rules that can flag these discrepancies visually.

Filter by ParentPath to focus on directories of interest. In the insider threat scenario, filter for the suspect's profile directory, the staging directories, and the USB volume paths. In the ransomware scenario, filter for temp directories, ProgramData, and the directories where the ransomware was deployed.

Save your column layout. Timeline Explorer allows you to save column configurations. Create a layout that includes both $SI and $FN timestamps, the entry number, sequence number, InUse status, filename, parent path, file size, and flags. Save this layout for reuse across investigations.

Try it: Complete MFT extraction and parsing workflow

Using the lab evidence provided with this module (or your own lab VM):

1. Extract: If using a live VM, run KAPE with the $MFT target. If using a forensic image, mount it with Arsenal Image Mounter and extract the $MFT with FTK Imager. 2. Verify: Check the file size is a multiple of 1,024. Calculate the entry count. Open in HxD and verify the "FILE" signature at offsets 0x000, 0x400, and the last entry. Record the SHA256 hash. 3. Parse: Run MFTECmd with --csv and --csvf flags. Note the record count reported by MFTECmd. Compare against your calculated entry count. 4. Load: Open the CSV in Timeline Explorer. Sort by EntryNumber. Verify entry 0 is $MFT, entry 5 is the root directory. 5. Validate: Pick three files you know exist on the system (any documents, executables, or system files). Find them in the MFTECmd output. Note the Created0x10 and Created0x30 timestamps. Open the raw MFT in HxD, navigate to the entry offset (EntryNumber × 1,024), locate the $SI and $FN attributes, and manually extract the timestamps. Compare to MFTECmd's output. They should match.

If all five steps complete successfully, your extraction and parsing workflow is validated. Document the hash, entry count, and validation results — this documentation becomes part of your methodology evidence for the investigation.

When MFTECmd output differs from raw data

MFTECmd is highly accurate — across hundreds of thousands of MFT records, the parser correctly interprets the vast majority of entries. But edge cases exist, and recognizing them is the purpose of the raw-first methodology taught in this course.

Path resolution errors. When a parent directory's MFT entry has been reallocated (sequence number mismatch), MFTECmd may resolve the path to the current directory at that entry rather than the original directory. The resolved path is wrong, but there's no indication in the CSV that the resolution failed. Verify by checking that the ParentSequenceNumber in the output matches the current sequence number of the parent entry.

Timestamp precision. MFTECmd outputs timestamps in ISO 8601 format with full nanosecond precision. In rare cases, CSV viewers or Excel may truncate the fractional seconds when loading the CSV, losing the precision needed for timestomping detection. Always use Timeline Explorer (which preserves full precision) rather than Excel for timestamp analysis.

Resident data omission. MFTECmd's default CSV output does not include the content of resident $DATA attributes. The data exists in the raw MFT but is not exported as part of the CSV. To recover resident file content, use MFTECmd's --de mode or extract it manually from the raw MFT at the calculated offset.

ADS handling. MFTECmd reports Alternate Data Streams as separate rows with the same entry number. The ADS name appears in the output, but the ADS content (like Zone.Identifier data) may not be fully extracted in all output modes. For ADS content extraction, manual analysis of the raw MFT record may be necessary.

You extract the MFT from a forensic image and the file size is 268,435,968 bytes. You run MFTECmd and it reports processing 262,144 entries. You calculate 268,435,968 ÷ 1,024 = 262,144.0 entries. The counts match exactly. You load the CSV in Timeline Explorer and find 267,891 rows. What explains the difference between 262,144 entries and 267,891 rows?

MFTECmd is duplicating some entries — the raw MFT only has 262,144 records so the CSV should have exactly 262,144 rows. The extra 5,747 rows are parser errors.

MFT entries with multiple $FILE_NAME attributes (one Win32 long name and one DOS 8.3 short name) produce multiple rows in MFTECmd output — one row per $FN attribute. The 5,747 additional rows represent files that have both a long name and a short name, each output as a separate row with the same EntryNumber but different FileName and FnFlags values. This is expected behavior, not an error. When counting files for your report, use distinct EntryNumber values rather than total row count.

The extra rows are ADS entries — 5,747 files on the volume have Alternate Data Streams that each produce an additional row.

The MFT was growing during extraction — NTFS added 5,747 new entries between when the extraction started and when it completed, and MFTECmd captured the additional entries.

Operational Artifact — MFT Extraction and Parsing Standard Operating Procedure

This subsection provides the complete MFT extraction, verification, and parsing workflow. Use this as your standard operating procedure for every investigation that involves MFT analysis. The workflow has four steps: collect (KAPE or FTK Imager with hash), verify (size check, signature spot-check, known entry verification), parse (MFTECmd with --csv including both $SI and $FN timestamps), and load (Timeline Explorer with forensic column layout). Document each step in your case notes: extraction tool and version, source hash, destination hash, entry count, MFTECmd version, and the raw hex validation results for key evidence files. This documentation establishes the reliability of your MFT analysis and demonstrates that you verified the extraction and parser accuracy — a methodology standard that will be expected by opposing counsel, regulators, and insurance assessors.

You've built the foundations of artifact-level forensic analysis.

WF0 gave you the taxonomy, NTFS architecture, and the five-step methodology. WF1 took you inside the MFT at the binary level — every attribute, every timestamp, every edge case. From here, every artifact category gets the same raw-first treatment.

WF2–WF10: every major Windows artifact decoded at binary level — USN Journal, Prefetch, Amcache, Shimcache, ShellBags, LNK, Jump Lists, SRUM, Event Logs, and the Registry hives
INC-NE-2026-0915 (WF13) — Insider data exfiltration capstone. Work the complete investigation from USB history to OneDrive exfiltration evidence
INC-NE-2026-1022 (WF14) — Ransomware capstone. Three-host triage (FIN01 → IT03 → FS01) across the 72-hour attack chain
The lab pack — 25+ realistic evidence files in 10 formats, simulated KAPE triage pre-populated, both capstones deployable to your own VM
Anti-forensic detection methodology — defeat timestomping, log clearing, and Prefetch deletion with cross-artifact correlation

Unlock with Specialist — £25/mo See Full Syllabus

Cancel anytime

← Previous Next →