In this module

Tool Validation — When EZ Tools Gets It Wrong

Module 0 · Free
Operational Objective
Eric Zimmerman's tools are the gold standard for Windows forensic artifact parsing. MFTECmd, PECmd, AmcacheParser, JLECmd, LECmd, SBECmd, RECmd, EvtxECmd, and the rest of the suite are free, actively maintained, widely validated, and used by forensic examiners globally. They are also software. Software has bugs, edge cases, and version-specific behaviors that produce incorrect output — and because the tools are trusted, incorrect output is rarely questioned. This is not a criticism of the tools. It is a statement about the nature of parsing complex binary formats: any parser that handles millions of records across hundreds of artifact variations will encounter edge cases that produce errors. The examiner's responsibility is not to distrust the tools but to understand when validation is needed, how to perform it, and what to do when the tool output and the raw artifact disagree. This subsection covers specific, documented cases where forensic tools have produced incorrect output, the categories of errors that parsers are susceptible to, and the validation workflow that detects these errors before they enter a forensic report.
Deliverable: Understanding of the four categories of parser errors (structure misinterpretation, encoding errors, version mismatches, boundary conditions), specific documented examples of tool errors, the raw validation workflow for critical findings, and the decision framework for when validation is necessary vs when tool output can be trusted without additional verification.
Estimated completion: 30 minutes
FOUR CATEGORIES OF PARSER ERRORS STRUCTURE Binary format misinterpretation Attribute ordering assumptions Variable-length field miscalculation Nested structure parsing errors Incorrect offset calculation Impact: wrong value reported Detection: hex comparison ENCODING Character/data representation Unicode filename truncation Non-ASCII path handling Timestamp format conversion Registry value type misread Impact: garbled or wrong text Detection: visual inspection VERSION OS/format version mismatch Prefetch format version (17-30) Amcache hive restructure Shimcache execution semantics New EVTX event fields Impact: wrong interpretation Detection: OS version check BOUNDARY Edge cases and limits Records spanning chunk boundaries Maximum field length exceeded Corrupt or truncated records Malformed input from anti-forensics Impact: skipped or wrong record Detection: record count comparison

Figure WF0.8 — Four categories of parser errors. Structure errors produce wrong values. Encoding errors produce garbled text. Version errors produce wrong interpretations. Boundary errors produce skipped or corrupted records. Each category has a different detection method — raw hex comparison, visual inspection, OS version verification, or record count comparison.

Why good tools produce bad output

A forensic parser is a software implementation of a specification. The specification (NTFS on-disk format, Prefetch file format, EVTX record format) defines how the binary data should be structured. The parser reads the binary data and interprets it according to the specification. When the data conforms to the specification — which is the overwhelming majority of the time — the parser produces correct output.

Errors occur when the data deviates from what the parser expects. This happens in four scenarios: the data is structured in a valid but uncommon way that the parser doesn't handle (structure errors), the data uses a character encoding or data representation that the parser misinterprets (encoding errors), the data is from an OS version with a different format than the parser assumes (version errors), or the data is corrupted, truncated, or deliberately malformed (boundary errors).

The critical insight is that parser errors are silent. The tool does not flag the output as uncertain or potentially incorrect. The CSV row containing an incorrect timestamp looks identical to the CSV row containing a correct timestamp. The only detection mechanism is the examiner's knowledge — knowing when to validate, how to validate, and what a correct value looks like.

Structure misinterpretation: attribute ordering

The most common structure error in MFT parsing involves attribute ordering. The NTFS specification does not mandate a specific order for attributes within an MFT record. The common order is $STANDARD_INFORMATION, $FILE_NAME (long name), $FILE_NAME (short name), $SECURITY_DESCRIPTOR or $SECURITY reference, $DATA — and most parsers assume this ordering. But NTFS does not enforce it. After certain operations — disk repair with chkdsk, file system migration, backup/restore sequences, and some third-party backup tools — the attribute order within an MFT record can change.

When a parser assumes that the first $FILE_NAME attribute is the long filename and extracts it as the file's display name, it works correctly when the long name comes first. When the short name comes first (a valid but uncommon arrangement), the parser reports the 8.3 filename as the file's name. The file path in the CSV output shows CONFID~1.DOC instead of Confidential_Board_Report_Q3.docx. The examiner who trusts the tool output reports the wrong filename.

The validation is straightforward: when a filename looks like an 8.3 short name in tool output, check the raw MFT record for additional $FILE_NAME attributes. If multiple $FN attributes exist, identify the one with the WIN32 or WIN32_DOS namespace flag — that is the long filename. MFTECmd generally handles this correctly in current versions, but edge cases persist in unusual MFT configurations, and the examiner should know the underlying issue.

Expand for Deeper Context

A related structure issue involves MFT records that use extension records. When a file has many attributes (multiple hard links creating multiple $FILE_NAME attributes, multiple alternate data streams creating multiple $DATA attributes, or an extremely long filename requiring additional space), the attributes may not fit in a single 1024-byte MFT record. NTFS allocates extension records — additional MFT entries linked to the base record via an $ATTRIBUTE_LIST attribute in the base record. The $ATTRIBUTE_LIST maps each attribute type and name to the MFT record that contains it.

Parsers must follow $ATTRIBUTE_LIST references to find all attributes for a file. If the parser fails to follow a reference (because the extension record is in an unusual location, or the $ATTRIBUTE_LIST is malformed), it may report incomplete data for the file — missing timestamps, missing data run information, or missing alternate data stream references. Current versions of MFTECmd handle extension records correctly, but earlier versions had edge cases with deeply nested extension chains.

Encoding errors: when filenames break

Windows filenames are stored in UTF-16LE encoding in NTFS. Most forensic tools handle UTF-16LE correctly for Latin characters. Problems arise with: filenames containing characters outside the Basic Multilingual Plane (supplementary characters requiring surrogate pairs in UTF-16), filenames with embedded null characters (technically valid in NTFS but unusual), filenames with characters that have special meaning in CSV output (commas, quotes, newlines embedded in filenames), and filenames created by non-Windows systems that wrote to NTFS volumes using different encoding assumptions.

The practical impact is typically visual — garbled characters in the filename column of CSV output. The forensic impact is attribution: if you cannot correctly identify the filename, you cannot correctly attribute the file to a specific investigation artifact. A file named with Chinese, Arabic, or emoji characters that renders as ????.docx in tool output needs raw verification to determine the actual filename. Open the MFT record in a hex editor, locate the $FILE_NAME attribute, and read the UTF-16LE encoded bytes directly to identify the correct filename.

Version mismatches: the Shimcache problem

Version-specific behavior is the most dangerous parser error category because the tool reports a technically correct value that the examiner incorrectly interprets. The tool did not make an error — the examiner applied the wrong interpretation.

The Shimcache is the canonical example. On Windows XP and Windows 7, forensic practitioners correctly cited Shimcache entries as execution evidence — research and testing confirmed that entries appeared when programs were executed. On Windows 10 and 11, the behavior changed: entries are created during the Application Compatibility lookup, which occurs when an executable file's attributes are accessed (not necessarily when it is executed). A Shimcache entry with the execution flag set to TRUE on Windows 10 does indicate execution. An entry without the flag (or on a system where the flag is not populated) indicates only that the file was evaluated — browsing a directory containing an executable, copying it, or antivirus scanning it can create a Shimcache entry.

ShimCacheParser correctly reports the execution flag when it is present. The error occurs when the examiner treats presence in the Shimcache as execution proof without checking: (a) the Windows version of the evidence system, and (b) the value of the execution flag. This is an interpretation error, not a parser error — but it is caused by the examiner treating tool output as self-interpreting rather than as data requiring version-specific analysis.

Decision point

MFTECmd output shows a file with Created timestamp "0001-01-01 00:00:00" — a null timestamp. The file is an attacker tool that you need to establish a creation date for. The MFT entry exists and the file is on disk.

Your options: (A) Report that the creation date is unknown — the null timestamp indicates the data is unavailable. (B) Open the raw MFT record in a hex editor. Navigate to the $STANDARD_INFORMATION attribute (offset from the first attribute header) and read the 8-byte FILETIME value at the Created timestamp position. If the raw bytes are all zeros, the timestamp was genuinely null (possibly set by the attacker). If the raw bytes contain a valid FILETIME that MFTECmd failed to parse, you have recovered the real timestamp. Also check the $FILE_NAME attribute's timestamps — $FN Created is set at file creation and rarely zeroed by attackers.

The correct approach is B. A null timestamp in tool output could be a genuine null OR a parsing failure. Raw verification distinguishes the two. The $FN timestamp is a secondary source that the attacker may not have thought to zero.

.\PECmd.exe -f "C:\Evidence\Prefetch\CMD.EXE-89305D47.pf" --csv "C:\Evidence\Output" --csvf prefetch_cmd.csv
Try It — Validate a Prefetch Timestamp Against the Raw File

This exercise demonstrates the raw validation workflow using a Prefetch file — one of the simpler artifact formats to validate manually.

Step 1: Parse with PECmd. Run PECmd on a Prefetch file from your collection:

Note the "LastRun" timestamp from the CSV output.

Step 2: Open the Prefetch file in HxD. The Prefetch file format (versions 30 and 31 on Windows 10/11) starts with a compressed (MAM) header. For the raw exercise, use an uncompressed Prefetch file from Windows 7 or use the PECmd --json output which includes the raw parsed values.

For a Windows 10 Prefetch file (compressed format 30): PECmd decompresses the file internally. The decompressed structure contains the last execution timestamp at a known offset within the file header. The exact offset varies by version: - Version 17 (XP): one timestamp at offset 0x78 - Version 23 (Vista/7): one timestamp at offset 0x80 - Version 26 (8/8.1): eight timestamps starting at offset 0x80 - Version 30 (10/11): eight timestamps starting at offset 0x80 (after decompression)

Step 3: Compare values. PECmd's CSV output should show up to 8 "LastRun" timestamps (for Windows 8+ evidence). Convert the raw FILETIME value from the hex editor to a human-readable timestamp using a FILETIME converter and compare it to PECmd's reported value. They should match exactly.

When to do this in practice: You do not validate every Prefetch file. You validate the Prefetch files that are central to your findings — the malware executable's Prefetch, the lateral movement tool's Prefetch, the exfiltration tool's Prefetch. If the investigation has 5 critical Prefetch findings, you validate those 5.

When to validate and when to trust

Validating every artifact in every investigation is impractical and unnecessary. The validation framework follows a risk-based approach: validate artifacts that support critical findings, artifacts that show unusual or suspicious characteristics, and artifacts from evidence sources with known version-specific behaviors.

Always validate: Any timestamp that establishes a critical timeline point (first access, malware deployment, exfiltration window start/end). Any artifact that is the sole source for a critical finding (without corroboration). Any artifact that will be cited in testimony or a legal-facing report. Any artifact that shows characteristics inconsistent with its context (a timestamp that seems wrong for the file's location, an execution count that seems high or low for the program type).

Validate when suspicious: Filenames that appear truncated, garbled, or as 8.3 short names. Timestamps with zero nanoseconds (potential timestomping or programmatic setting). Records with unusual field values (very large run counts, impossibly early or future timestamps, null values in required fields). Tool output that reports errors, warnings, or skipped records during parsing.

Trust without validation: Routine artifact records that are used for context but do not support critical findings. Bulk data used for pattern analysis (thousands of MFT records establishing normal file system activity patterns). Artifacts from well-understood, stable format types on current OS versions with current tool versions.

Compliance Myth: "Using multiple tools on the same artifact guarantees accuracy"

The myth: Running two different tools on the same artifact — MFTECmd and X-Ways, or PECmd and a commercial forensic suite — and getting the same result validates the finding. If two independent tools agree, the output must be correct.

The reality: Multiple tools can agree on incorrect output if they share the same parsing logic, the same assumptions, or the same specification interpretation. Many forensic tools use common libraries or implement the same published specification — if the specification is ambiguous or incomplete on a particular edge case, multiple tools may make the same incorrect assumption. The definitive validation is comparing tool output against the raw binary data in a hex editor, not comparing tool output against other tool output. Two tools agreeing that a timestamp is "2026-01-15 09:00:00" means two parsers read the same bytes and produced the same result — it does not mean the result is correct for the forensic question being asked (is this a genuine timestamp or a timestomped one?). Raw validation is the only mechanism that answers the forensic question rather than the parsing question.

Troubleshooting

"I don't know hex well enough to read raw MFT records." You will by the end of WF1. This course teaches raw analysis progressively — WF0 introduces the concept, WF1 provides the MFT record walkthrough at the byte level, and subsequent modules cover each artifact format's binary structure. The investment in learning hex interpretation for a few key structures (MFT record header, $SI attribute, $FN attribute, USN record, EVTX record header) is manageable and pays returns across every investigation.

"What if I find a tool error — what do I do?" Document the error precisely: tool name and version, input artifact, expected output (from raw analysis), actual output (from the tool), and the specific discrepancy. Report the issue to the tool developer (Eric Zimmerman has a GitHub Issues page for each tool, and he is responsive to well-documented bug reports). In your forensic report, use the raw-validated value — not the tool's output. Note the discrepancy in your methodology section if it is relevant to the finding.

"Tool versions update frequently — how do I keep track of which versions have which known issues?" Use the current version. Update before each engagement. The EZ Tools changelog documents every fix — skim it when you update to understand what was corrected. For court testimony, document the exact tool version used in your examination notes. If a defense expert challenges a finding by citing a bug that existed in an earlier version, you can demonstrate you used a version where the bug was fixed.

You parse the $MFT from a Windows 10 evidence system with MFTECmd and examine the CSV output for a suspicious file. The tool reports: FileName = "MALWAR~1.EXE", ParentPath = "Users\j.morrison\AppData\Local\Temp". You open the raw MFT record in HxD and find two $FILE_NAME attributes. The first (at offset 0x98) contains the filename "MALWAR~1.EXE" with the namespace flag 0x02 (DOS). The second (at offset 0x140) contains the filename "MalwareDropper_Stage2.exe" with the namespace flag 0x01 (WIN32). What is the correct assessment?
MFTECmd correctly reported the filename because the DOS 8.3 filename is the canonical name on NTFS — the WIN32 long filename is an alias stored for backward compatibility. The examiner should use "MALWAR~1.EXE" in the report as the file's actual name.
MFTECmd extracted the first $FILE_NAME attribute (DOS namespace, offset 0x98) as the display filename rather than the second (WIN32 namespace, offset 0x140) which contains the actual long filename. The correct filename is "MalwareDropper_Stage2.exe." This is an attribute ordering edge case — the DOS name appeared before the WIN32 name in the MFT record, and the parser selected the first $FN attribute. The examiner should report the WIN32 filename and note the discrepancy in examination notes. Current versions of MFTECmd typically handle this correctly by checking the namespace flag, so also verify you are running the latest version.
Both filenames are equally valid — NTFS stores both the 8.3 and long filename and either can be used in the report. The examiner should include both names to be comprehensive: "MALWAR~1.EXE (also known as MalwareDropper_Stage2.exe)."
The presence of two different $FILE_NAME attributes suggests the file was renamed — the original name was "MALWAR~1.EXE" and it was later renamed to "MalwareDropper_Stage2.exe." The examiner should report the rename and investigate when it occurred.

You've built the foundations of artifact-level forensic analysis.

WF0 gave you the taxonomy, NTFS architecture, and the five-step methodology. WF1 took you inside the MFT at the binary level — every attribute, every timestamp, every edge case. From here, every artifact category gets the same raw-first treatment.

  • WF2–WF10: every major Windows artifact decoded at binary level — USN Journal, Prefetch, Amcache, Shimcache, ShellBags, LNK, Jump Lists, SRUM, Event Logs, and the Registry hives
  • INC-NE-2026-0915 (WF13) — Insider data exfiltration capstone. Work the complete investigation from USB history to OneDrive exfiltration evidence
  • INC-NE-2026-1022 (WF14) — Ransomware capstone. Three-host triage (FIN01 → IT03 → FS01) across the 72-hour attack chain
  • The lab pack — 25+ realistic evidence files in 10 formats, simulated KAPE triage pre-populated, both capstones deployable to your own VM
  • Anti-forensic detection methodology — defeat timestomping, log clearing, and Prefetch deletion with cross-artifact correlation
Unlock with Specialist — £25/mo See Full Syllabus

Cancel anytime