In this module
The Artifact Analysis Methodology
Figure WF0.12 — The five-step artifact analysis methodology. Each step has defined inputs, outputs, and a quality gate that must be passed before proceeding. The methodology ensures systematic, reproducible, and defensible analysis.
Step 1: Identify — which artifacts answer this question?
Every analysis begins with a question — not with a tool. The investigation question determines which artifacts are relevant, which are primary, and which provide corroboration. Opening MFTECmd output before defining what you're looking for leads to data browsing. Defining the question first focuses the analysis.
The Identify step maps the investigation question to specific artifact categories using the artifact selection matrix from WF0.2 and the investigation scope from WF0.10. For the question "did the subject copy files to USB between March 1 and March 15?" the artifact mapping is: primary sources are USN Journal (FILE_CREATE events with parent references to removable media), MFT timestamps (files on the USB volume or copied from network to local), and SYSTEM registry USBSTOR (device connection history with timestamps). Corroborating sources are ShellBags (did the user navigate to the USB drive?), LNK files (did the user open files from the USB?), SRUM (data transfer volumes to the removable media device), and Event Logs (USB device connection events 20001/20003).
The quality gate for this step: at least two independent artifact sources must be identified as potentially answering the question before analysis proceeds. If only one source exists, the finding will be limited to a single-source confidence level — which should be documented in advance, not discovered at the end.
Step 2: Extract — collect the raw artifacts
The Extract step collects the specific artifacts identified in Step 1 from the evidence source. If the evidence is a KAPE collection, the artifacts may already be extracted — the $MFT, registry hives, Prefetch files, and Event Logs are in the KAPE output directory. If the evidence is a forensic image, mount the image read-only and extract the needed artifacts.
Document the extraction: what was extracted, from where (image name, partition, path), when (UTC), the extraction tool and version, and the hash of the extracted file. This documentation is your provenance chain — it traces every analysis result back to a specific location in the original evidence.
The quality gate: every extracted artifact file has a verified hash recorded in the examination log. If an artifact file is corrupted (bad hash, truncated, zero bytes), document the corruption and note which analysis capabilities are affected.
Step 3: Parse — process and validate
The Parse step processes raw artifacts into human-readable form using forensic tools, then validates critical findings against the raw data.
Run the appropriate tool: MFTECmd for the $MFT, PECmd for Prefetch files, SBECmd for ShellBags, EvtxECmd for Event Logs, RECmd or Registry Explorer for registry hives, SrumECmd for the SRUM database. Output to the case's output\ directory in CSV format for analysis in Timeline Explorer.
After parsing, identify the records that answer the investigation question. These are the critical records — the specific MFT entries, Prefetch files, ShellBag entries, or Event Log records that will support findings in the report. For each critical record, perform raw validation: open the original artifact in a hex editor, navigate to the record, and confirm the tool's interpretation matches the raw data. Document the validation result.
The quality gate: every critical finding has been raw-validated. Non-critical findings (contextual data, pattern analysis across thousands of records) do not require individual validation but should be spot-checked for consistency.
Step 4: Correlate — cross-reference across sources
The Correlate step is where single-source data points become multi-source evidence. For each critical finding from Step 3, identify the corroborating evidence from independent artifact sources and verify consistency.
Correlation works at the temporal level (do the timestamps from different sources agree within expected tolerances?), the entity level (do the file references, paths, and user identifiers from different sources refer to the same file, folder, or user?), and the logical level (do the findings from different sources tell a consistent narrative, or are there contradictions that need investigation?).
Conflicts between sources are informative. If the MFT timestamp says a file was created on January 15 but the USN Journal shows a FILE_CREATE entry on March 28, the conflict itself is a finding — it indicates either timestomping, a copy operation with timestamp preservation, or (rarely) a parsing error. Investigate the conflict before concluding.
The quality gate: every critical finding in the report is supported by at least two independent artifact sources, and any conflicts between sources have been investigated and resolved (or documented as unresolved with the impact on finding confidence stated).
Step 5: Conclude — state the finding with confidence
The Conclude step produces the finding statement: a precise claim about what the evidence proves, at what confidence level, supported by what sources, subject to what limitations, and having considered what alternative explanations.
A well-formed finding has five components: the claim (what the evidence proves — stated precisely), the confidence level (high, moderate-high, moderate, low — with rationale based on source reliability and corroboration), the evidence sources (specific artifacts, specific records, specific fields), the limitations (what the evidence does not prove, what artifacts were unavailable, what assumptions were made), and the alternative explanations (other interpretations of the evidence that were considered and assessed).
The quality gate: every finding has documented alternative explanations. This is not pro forma — the examiner genuinely considers how else the evidence could be interpreted and documents why the stated conclusion is more supported than the alternatives. This practice prevents confirmation bias and prepares the examiner for cross-examination, where the first question will be "is there an alternative explanation for this evidence?"
You are at the "Correlate" step of the five-step methodology (Identify → Extract → Parse → Correlate → Conclude). Your MFT analysis shows a file created at 02:13:02. Your USN analysis shows the same file's FILE_CREATE at 02:13:02. Your Prefetch shows the tool that created it executed at 02:12:55 — 7 seconds before the file appeared on disk. All three sources are consistent.
But your Event Log 4688 shows NO process creation event for the tool at 02:12:55. Process creation auditing was enabled on this system.
Your options: (A) Three sources agree (MFT, USN, Prefetch) — the missing Event Log entry is an anomaly but doesn't change the conclusion. (B) Investigate the gap. Three sources confirming and one source silent requires explanation. Possible causes: the process was spawned by a technique that evades 4688 logging (direct syscall, process hollowing into an existing process), the Event Log was selectively manipulated (check for record ID gaps around 02:12:55), or the process creation event was in a different log (Sysmon Event 1 if deployed). The missing 4688 entry is itself a finding — it may indicate an evasion technique that warrants documentation.
The correct approach is B. Correlation means reconciling ALL sources, including absences. A missing record where one is expected is as informative as a present record where none is expected.
Try It — Apply the Methodology to a Single Question
Apply the five-step methodology to the question: "Did David Chen (INC-NE-2026-0915) use 7-Zip to archive files on his workstation between August 1 and September 15, 2026?"
Step 1 — Identify: Which artifacts answer this question? Map to specific artifact types. (Hint: execution artifacts for 7z.exe, filesystem artifacts for .7z file creation, user activity artifacts for 7-Zip interaction.)
Step 2 — Extract: What would you extract from the KAPE collection? (Hint: Prefetch files, Amcache.hve, NTUSER.DAT for UserAssist, $MFT for .7z file creation, USN Journal for .7z file operations.)
Step 3 — Parse: What tools would you run, and what records would you look for in the output? (Hint: PECmd for 7Z.EXE-.pf, AmcacheParser for 7z.exe SHA1, MFTECmd filtered for .7z files, MFTECmd USN Journal filtered for .7z extensions.)
Step 4 — Correlate: How would you cross-reference the execution evidence with the file creation evidence? (Hint: Prefetch last-run timestamp should correlate with MFT creation timestamp for .7z files and USN Journal FILE_CREATE entries.)
Step 5 — Conclude: Write the finding statement with confidence level, sources, limitations, and alternative explanations.
Compare your approach with the module's analysis methodology. The methodology in this exercise is the same one you will apply in WF3 (execution artifacts), WF4 (user activity artifacts), and WF13 (the complete insider threat investigation).
The myth: Experienced forensic examiners don't need a formal methodology — they know what to look for based on experience. Methodology documentation is bureaucratic overhead that slows down analysis without improving outcomes.
The reality: Methodology documentation is what separates a professional forensic examination from an expert's opinion. Without a documented methodology: another examiner cannot reproduce the analysis (reproducibility failure), the examiner cannot explain their process in testimony beyond "I looked at the evidence and reached this conclusion" (defensibility failure), there is no quality assurance mechanism to ensure completeness (quality failure), and there is no way to determine, after the fact, whether a specific artifact was analyzed or overlooked (accountability failure). The five-step methodology takes no longer to execute than unstructured analysis — it takes longer to document. That documentation is the deliverable that courts, regulators, and insurance assessors evaluate. An unstructured analysis that produces a correct finding is less valuable than a documented methodology that produces the same finding, because the documented version can be verified, reproduced, and defended.
Troubleshooting
"The five-step process seems slow for triage situations." The methodology scales to the context. For SOC triage: Identify (which artifact answers "is this a true positive?" — 30 seconds), Extract (the artifact is already in the EDR or SIEM — 0 seconds), Parse (query or tool output — 1 minute), Correlate (check one additional source — 2 minutes), Conclude (true positive/false positive — 30 seconds). Total: 4 minutes. For a court-facing examination: each step takes hours. The methodology is the same; the depth scales to the consequence of the findings.
"What if I find something unexpected during Step 3 that changes the investigation questions?" Good — that's how real investigations work. Document the new finding, add the new question to the investigation scope, and loop back to Step 1 for the new question. The methodology is iterative, not linear. A ransomware investigation may start with "how did the attacker get in?" and, during MFT analysis, discover evidence of data exfiltration that adds "was personal data exfiltrated?" to the scope. Document the scope change and continue.
"How detailed should the examination documentation be?" Detailed enough that a different examiner can reproduce your analysis from your notes. This means: the specific tool commands you ran (or screenshots), the specific filters you applied, the specific records you examined, the raw validation results, and the correlation logic. If the finding is "Prefetch proves 7z.exe executed on March 15 at 14:22:18" your notes should include the PECmd command, the relevant CSV row, the raw validation of the timestamp, and the corroborating USN Journal entry. For non-critical contextual analysis, a summary is sufficient.
You've built the foundations of artifact-level forensic analysis.
WF0 gave you the taxonomy, NTFS architecture, and the five-step methodology. WF1 took you inside the MFT at the binary level — every attribute, every timestamp, every edge case. From here, every artifact category gets the same raw-first treatment.
- WF2–WF10: every major Windows artifact decoded at binary level — USN Journal, Prefetch, Amcache, Shimcache, ShellBags, LNK, Jump Lists, SRUM, Event Logs, and the Registry hives
- INC-NE-2026-0915 (WF13) — Insider data exfiltration capstone. Work the complete investigation from USB history to OneDrive exfiltration evidence
- INC-NE-2026-1022 (WF14) — Ransomware capstone. Three-host triage (FIN01 → IT03 → FS01) across the 72-hour attack chain
- The lab pack — 25+ realistic evidence files in 10 formats, simulated KAPE triage pre-populated, both capstones deployable to your own VM
- Anti-forensic detection methodology — defeat timestomping, log clearing, and Prefetch deletion with cross-artifact correlation
Cancel anytime