In this module
Tool Validation — When EZ Tools Gets It Wrong
Figure WF0.8 — Four categories of parser errors. Structure errors produce wrong values. Encoding errors produce garbled text. Version errors produce wrong interpretations. Boundary errors produce skipped or corrupted records. Each category has a different detection method — raw hex comparison, visual inspection, OS version verification, or record count comparison.
Why good tools produce bad output
A forensic parser is a software implementation of a specification. The specification (NTFS on-disk format, Prefetch file format, EVTX record format) defines how the binary data should be structured. The parser reads the binary data and interprets it according to the specification. When the data conforms to the specification — which is the overwhelming majority of the time — the parser produces correct output.
Errors occur when the data deviates from what the parser expects. This happens in four scenarios: the data is structured in a valid but uncommon way that the parser doesn't handle (structure errors), the data uses a character encoding or data representation that the parser misinterprets (encoding errors), the data is from an OS version with a different format than the parser assumes (version errors), or the data is corrupted, truncated, or deliberately malformed (boundary errors).
The critical insight is that parser errors are silent. The tool does not flag the output as uncertain or potentially incorrect. The CSV row containing an incorrect timestamp looks identical to the CSV row containing a correct timestamp. The only detection mechanism is the examiner's knowledge — knowing when to validate, how to validate, and what a correct value looks like.
Structure misinterpretation: attribute ordering
The most common structure error in MFT parsing involves attribute ordering. The NTFS specification does not mandate a specific order for attributes within an MFT record. The common order is $STANDARD_INFORMATION, $FILE_NAME (long name), $FILE_NAME (short name), $SECURITY_DESCRIPTOR or $SECURITY reference, $DATA — and most parsers assume this ordering. But NTFS does not enforce it. After certain operations — disk repair with chkdsk, file system migration, backup/restore sequences, and some third-party backup tools — the attribute order within an MFT record can change.
When a parser assumes that the first $FILE_NAME attribute is the long filename and extracts it as the file's display name, it works correctly when the long name comes first. When the short name comes first (a valid but uncommon arrangement), the parser reports the 8.3 filename as the file's name. The file path in the CSV output shows CONFID~1.DOC instead of Confidential_Board_Report_Q3.docx. The examiner who trusts the tool output reports the wrong filename.
The validation is straightforward: when a filename looks like an 8.3 short name in tool output, check the raw MFT record for additional $FILE_NAME attributes. If multiple $FN attributes exist, identify the one with the WIN32 or WIN32_DOS namespace flag — that is the long filename. MFTECmd generally handles this correctly in current versions, but edge cases persist in unusual MFT configurations, and the examiner should know the underlying issue.
Encoding errors: when filenames break
Windows filenames are stored in UTF-16LE encoding in NTFS. Most forensic tools handle UTF-16LE correctly for Latin characters. Problems arise with: filenames containing characters outside the Basic Multilingual Plane (supplementary characters requiring surrogate pairs in UTF-16), filenames with embedded null characters (technically valid in NTFS but unusual), filenames with characters that have special meaning in CSV output (commas, quotes, newlines embedded in filenames), and filenames created by non-Windows systems that wrote to NTFS volumes using different encoding assumptions.
The practical impact is typically visual — garbled characters in the filename column of CSV output. The forensic impact is attribution: if you cannot correctly identify the filename, you cannot correctly attribute the file to a specific investigation artifact. A file named with Chinese, Arabic, or emoji characters that renders as ????.docx in tool output needs raw verification to determine the actual filename. Open the MFT record in a hex editor, locate the $FILE_NAME attribute, and read the UTF-16LE encoded bytes directly to identify the correct filename.
Version mismatches: the Shimcache problem
Version-specific behavior is the most dangerous parser error category because the tool reports a technically correct value that the examiner incorrectly interprets. The tool did not make an error — the examiner applied the wrong interpretation.
The Shimcache is the canonical example. On Windows XP and Windows 7, forensic practitioners correctly cited Shimcache entries as execution evidence — research and testing confirmed that entries appeared when programs were executed. On Windows 10 and 11, the behavior changed: entries are created during the Application Compatibility lookup, which occurs when an executable file's attributes are accessed (not necessarily when it is executed). A Shimcache entry with the execution flag set to TRUE on Windows 10 does indicate execution. An entry without the flag (or on a system where the flag is not populated) indicates only that the file was evaluated — browsing a directory containing an executable, copying it, or antivirus scanning it can create a Shimcache entry.
ShimCacheParser correctly reports the execution flag when it is present. The error occurs when the examiner treats presence in the Shimcache as execution proof without checking: (a) the Windows version of the evidence system, and (b) the value of the execution flag. This is an interpretation error, not a parser error — but it is caused by the examiner treating tool output as self-interpreting rather than as data requiring version-specific analysis.
MFTECmd output shows a file with Created timestamp "0001-01-01 00:00:00" — a null timestamp. The file is an attacker tool that you need to establish a creation date for. The MFT entry exists and the file is on disk.
Your options: (A) Report that the creation date is unknown — the null timestamp indicates the data is unavailable. (B) Open the raw MFT record in a hex editor. Navigate to the $STANDARD_INFORMATION attribute (offset from the first attribute header) and read the 8-byte FILETIME value at the Created timestamp position. If the raw bytes are all zeros, the timestamp was genuinely null (possibly set by the attacker). If the raw bytes contain a valid FILETIME that MFTECmd failed to parse, you have recovered the real timestamp. Also check the $FILE_NAME attribute's timestamps — $FN Created is set at file creation and rarely zeroed by attackers.
The correct approach is B. A null timestamp in tool output could be a genuine null OR a parsing failure. Raw verification distinguishes the two. The $FN timestamp is a secondary source that the attacker may not have thought to zero.
.\PECmd.exe -f "C:\Evidence\Prefetch\CMD.EXE-89305D47.pf" --csv "C:\Evidence\Output" --csvf prefetch_cmd.csvTry It — Validate a Prefetch Timestamp Against the Raw File
This exercise demonstrates the raw validation workflow using a Prefetch file — one of the simpler artifact formats to validate manually.
Step 1: Parse with PECmd. Run PECmd on a Prefetch file from your collection:
Note the "LastRun" timestamp from the CSV output.
Step 2: Open the Prefetch file in HxD. The Prefetch file format (versions 30 and 31 on Windows 10/11) starts with a compressed (MAM) header. For the raw exercise, use an uncompressed Prefetch file from Windows 7 or use the PECmd --json output which includes the raw parsed values.
For a Windows 10 Prefetch file (compressed format 30): PECmd decompresses the file internally. The decompressed structure contains the last execution timestamp at a known offset within the file header. The exact offset varies by version: - Version 17 (XP): one timestamp at offset 0x78 - Version 23 (Vista/7): one timestamp at offset 0x80 - Version 26 (8/8.1): eight timestamps starting at offset 0x80 - Version 30 (10/11): eight timestamps starting at offset 0x80 (after decompression)
Step 3: Compare values. PECmd's CSV output should show up to 8 "LastRun" timestamps (for Windows 8+ evidence). Convert the raw FILETIME value from the hex editor to a human-readable timestamp using a FILETIME converter and compare it to PECmd's reported value. They should match exactly.
When to do this in practice: You do not validate every Prefetch file. You validate the Prefetch files that are central to your findings — the malware executable's Prefetch, the lateral movement tool's Prefetch, the exfiltration tool's Prefetch. If the investigation has 5 critical Prefetch findings, you validate those 5.
When to validate and when to trust
Validating every artifact in every investigation is impractical and unnecessary. The validation framework follows a risk-based approach: validate artifacts that support critical findings, artifacts that show unusual or suspicious characteristics, and artifacts from evidence sources with known version-specific behaviors.
Always validate: Any timestamp that establishes a critical timeline point (first access, malware deployment, exfiltration window start/end). Any artifact that is the sole source for a critical finding (without corroboration). Any artifact that will be cited in testimony or a legal-facing report. Any artifact that shows characteristics inconsistent with its context (a timestamp that seems wrong for the file's location, an execution count that seems high or low for the program type).
Validate when suspicious: Filenames that appear truncated, garbled, or as 8.3 short names. Timestamps with zero nanoseconds (potential timestomping or programmatic setting). Records with unusual field values (very large run counts, impossibly early or future timestamps, null values in required fields). Tool output that reports errors, warnings, or skipped records during parsing.
Trust without validation: Routine artifact records that are used for context but do not support critical findings. Bulk data used for pattern analysis (thousands of MFT records establishing normal file system activity patterns). Artifacts from well-understood, stable format types on current OS versions with current tool versions.
The myth: Running two different tools on the same artifact — MFTECmd and X-Ways, or PECmd and a commercial forensic suite — and getting the same result validates the finding. If two independent tools agree, the output must be correct.
The reality: Multiple tools can agree on incorrect output if they share the same parsing logic, the same assumptions, or the same specification interpretation. Many forensic tools use common libraries or implement the same published specification — if the specification is ambiguous or incomplete on a particular edge case, multiple tools may make the same incorrect assumption. The definitive validation is comparing tool output against the raw binary data in a hex editor, not comparing tool output against other tool output. Two tools agreeing that a timestamp is "2026-01-15 09:00:00" means two parsers read the same bytes and produced the same result — it does not mean the result is correct for the forensic question being asked (is this a genuine timestamp or a timestomped one?). Raw validation is the only mechanism that answers the forensic question rather than the parsing question.
Troubleshooting
"I don't know hex well enough to read raw MFT records." You will by the end of WF1. This course teaches raw analysis progressively — WF0 introduces the concept, WF1 provides the MFT record walkthrough at the byte level, and subsequent modules cover each artifact format's binary structure. The investment in learning hex interpretation for a few key structures (MFT record header, $SI attribute, $FN attribute, USN record, EVTX record header) is manageable and pays returns across every investigation.
"What if I find a tool error — what do I do?" Document the error precisely: tool name and version, input artifact, expected output (from raw analysis), actual output (from the tool), and the specific discrepancy. Report the issue to the tool developer (Eric Zimmerman has a GitHub Issues page for each tool, and he is responsive to well-documented bug reports). In your forensic report, use the raw-validated value — not the tool's output. Note the discrepancy in your methodology section if it is relevant to the finding.
"Tool versions update frequently — how do I keep track of which versions have which known issues?" Use the current version. Update before each engagement. The EZ Tools changelog documents every fix — skim it when you update to understand what was corrected. For court testimony, document the exact tool version used in your examination notes. If a defense expert challenges a finding by citing a bug that existed in an earlier version, you can demonstrate you used a version where the bug was fixed.
You've built the foundations of artifact-level forensic analysis.
WF0 gave you the taxonomy, NTFS architecture, and the five-step methodology. WF1 took you inside the MFT at the binary level — every attribute, every timestamp, every edge case. From here, every artifact category gets the same raw-first treatment.
- WF2–WF10: every major Windows artifact decoded at binary level — USN Journal, Prefetch, Amcache, Shimcache, ShellBags, LNK, Jump Lists, SRUM, Event Logs, and the Registry hives
- INC-NE-2026-0915 (WF13) — Insider data exfiltration capstone. Work the complete investigation from USB history to OneDrive exfiltration evidence
- INC-NE-2026-1022 (WF14) — Ransomware capstone. Three-host triage (FIN01 → IT03 → FS01) across the 72-hour attack chain
- The lab pack — 25+ realistic evidence files in 10 formats, simulated KAPE triage pre-populated, both capstones deployable to your own VM
- Anti-forensic detection methodology — defeat timestomping, log clearing, and Prefetch deletion with cross-artifact correlation
Cancel anytime