In this module

The Artifact Analysis Methodology

Module 0 · Free
Operational Objective
Forensic artifact analysis without a methodology is data browsing. The examiner opens MFTECmd output, scrolls through rows, notices something interesting, follows it, notices something else, follows that — and produces a report that reflects what caught their attention rather than what the investigation required. Key artifacts are missed because the examiner's attention was captured by a different finding. Anti-forensic indicators are overlooked because the examiner was not systematically checking for them. Findings lack corroboration because the examiner did not cross-reference across artifact sources. The report cannot explain the methodology because there was no methodology — only exploration. This subsection defines the five-step artifact analysis methodology used in every module of this course: Identify → Extract → Parse → Correlate → Conclude. Each step has specific inputs, outputs, and quality gates. The methodology ensures that analysis is systematic (nothing is skipped), reproducible (another examiner following the same steps reaches the same conclusions), and defensible (the methodology can be explained in testimony).
Deliverable: The five-step artifact analysis methodology with specific procedures for each step, the quality gate between each step that prevents premature conclusions, the documentation requirements that make the methodology reproducible, and a worked example applying the methodology to a single forensic question.
Estimated completion: 30 minutes
THE FIVE-STEP ARTIFACT ANALYSIS METHODOLOGY 1. IDENTIFY Which artifacts answer this investigation question? Input: investigation question Output: artifact source list Gate: ≥2 independent sources identified before proceeding 2. EXTRACT Collect the raw artifacts preserving integrity Input: artifact source list Output: extracted artifacts + hashes Gate: hash verification on all extracted files 3. PARSE Process with tools, validate critical findings against raw Input: extracted artifacts Output: parsed data + validated findings Gate: critical findings raw-validated before proceeding 4. CORRELATE Cross-reference findings across independent sources Input: parsed data from ≥2 sources Output: corroborated findings Gate: critical findings have ≥2 independent sources 5. CONCLUDE State finding with confidence level and limitations Input: corroborated findings Output: finding + confidence + limits Gate: alternative explanations documented and assessed

Figure WF0.12 — The five-step artifact analysis methodology. Each step has defined inputs, outputs, and a quality gate that must be passed before proceeding. The methodology ensures systematic, reproducible, and defensible analysis.

Step 1: Identify — which artifacts answer this question?

Every analysis begins with a question — not with a tool. The investigation question determines which artifacts are relevant, which are primary, and which provide corroboration. Opening MFTECmd output before defining what you're looking for leads to data browsing. Defining the question first focuses the analysis.

The Identify step maps the investigation question to specific artifact categories using the artifact selection matrix from WF0.2 and the investigation scope from WF0.10. For the question "did the subject copy files to USB between March 1 and March 15?" the artifact mapping is: primary sources are USN Journal (FILE_CREATE events with parent references to removable media), MFT timestamps (files on the USB volume or copied from network to local), and SYSTEM registry USBSTOR (device connection history with timestamps). Corroborating sources are ShellBags (did the user navigate to the USB drive?), LNK files (did the user open files from the USB?), SRUM (data transfer volumes to the removable media device), and Event Logs (USB device connection events 20001/20003).

The quality gate for this step: at least two independent artifact sources must be identified as potentially answering the question before analysis proceeds. If only one source exists, the finding will be limited to a single-source confidence level — which should be documented in advance, not discovered at the end.

Step 2: Extract — collect the raw artifacts

The Extract step collects the specific artifacts identified in Step 1 from the evidence source. If the evidence is a KAPE collection, the artifacts may already be extracted — the $MFT, registry hives, Prefetch files, and Event Logs are in the KAPE output directory. If the evidence is a forensic image, mount the image read-only and extract the needed artifacts.

Document the extraction: what was extracted, from where (image name, partition, path), when (UTC), the extraction tool and version, and the hash of the extracted file. This documentation is your provenance chain — it traces every analysis result back to a specific location in the original evidence.

The quality gate: every extracted artifact file has a verified hash recorded in the examination log. If an artifact file is corrupted (bad hash, truncated, zero bytes), document the corruption and note which analysis capabilities are affected.

Step 3: Parse — process and validate

The Parse step processes raw artifacts into human-readable form using forensic tools, then validates critical findings against the raw data.

Run the appropriate tool: MFTECmd for the $MFT, PECmd for Prefetch files, SBECmd for ShellBags, EvtxECmd for Event Logs, RECmd or Registry Explorer for registry hives, SrumECmd for the SRUM database. Output to the case's output\ directory in CSV format for analysis in Timeline Explorer.

After parsing, identify the records that answer the investigation question. These are the critical records — the specific MFT entries, Prefetch files, ShellBag entries, or Event Log records that will support findings in the report. For each critical record, perform raw validation: open the original artifact in a hex editor, navigate to the record, and confirm the tool's interpretation matches the raw data. Document the validation result.

The quality gate: every critical finding has been raw-validated. Non-critical findings (contextual data, pattern analysis across thousands of records) do not require individual validation but should be spot-checked for consistency.

Step 4: Correlate — cross-reference across sources

The Correlate step is where single-source data points become multi-source evidence. For each critical finding from Step 3, identify the corroborating evidence from independent artifact sources and verify consistency.

Correlation works at the temporal level (do the timestamps from different sources agree within expected tolerances?), the entity level (do the file references, paths, and user identifiers from different sources refer to the same file, folder, or user?), and the logical level (do the findings from different sources tell a consistent narrative, or are there contradictions that need investigation?).

Conflicts between sources are informative. If the MFT timestamp says a file was created on January 15 but the USN Journal shows a FILE_CREATE entry on March 28, the conflict itself is a finding — it indicates either timestomping, a copy operation with timestamp preservation, or (rarely) a parsing error. Investigate the conflict before concluding.

The quality gate: every critical finding in the report is supported by at least two independent artifact sources, and any conflicts between sources have been investigated and resolved (or documented as unresolved with the impact on finding confidence stated).

Expand for Deeper Context

The correlation step is where this course's approach differs most from tool-centric forensic practice. A tool-centric approach runs MFTECmd, reads the output, and reports what it found. This course's approach runs MFTECmd, validates critical records, then cross-references those records against the USN Journal (did the USN record the file operation at the same time?), the Prefetch (did the program that created the file execute at the same time?), the Event Logs (did the user authenticate at the same time?), and the ShellBags (did the user navigate to the relevant directory?). Each additional source that corroborates the finding increases confidence. Each source that contradicts reveals something the single-source analysis would have missed.

The time investment for correlation is proportional to the number of critical findings, not the number of artifacts. A typical investigation has 5-15 critical findings. Correlating each against 2-3 sources adds a few hours to the analysis. The return is findings that are substantially stronger — the difference between "MFTECmd says the file was created on March 28" (single source, tool-dependent) and "the file was created on March 28 as confirmed by MFT $FN timestamps, USN Journal FILE_CREATE entry, and Amcache first-execution record" (three independent sources, raw-validated).

Step 5: Conclude — state the finding with confidence

The Conclude step produces the finding statement: a precise claim about what the evidence proves, at what confidence level, supported by what sources, subject to what limitations, and having considered what alternative explanations.

A well-formed finding has five components: the claim (what the evidence proves — stated precisely), the confidence level (high, moderate-high, moderate, low — with rationale based on source reliability and corroboration), the evidence sources (specific artifacts, specific records, specific fields), the limitations (what the evidence does not prove, what artifacts were unavailable, what assumptions were made), and the alternative explanations (other interpretations of the evidence that were considered and assessed).

The quality gate: every finding has documented alternative explanations. This is not pro forma — the examiner genuinely considers how else the evidence could be interpreted and documents why the stated conclusion is more supported than the alternatives. This practice prevents confirmation bias and prepares the examiner for cross-examination, where the first question will be "is there an alternative explanation for this evidence?"

Decision point

You are at the "Correlate" step of the five-step methodology (Identify → Extract → Parse → Correlate → Conclude). Your MFT analysis shows a file created at 02:13:02. Your USN analysis shows the same file's FILE_CREATE at 02:13:02. Your Prefetch shows the tool that created it executed at 02:12:55 — 7 seconds before the file appeared on disk. All three sources are consistent.

But your Event Log 4688 shows NO process creation event for the tool at 02:12:55. Process creation auditing was enabled on this system.

Your options: (A) Three sources agree (MFT, USN, Prefetch) — the missing Event Log entry is an anomaly but doesn't change the conclusion. (B) Investigate the gap. Three sources confirming and one source silent requires explanation. Possible causes: the process was spawned by a technique that evades 4688 logging (direct syscall, process hollowing into an existing process), the Event Log was selectively manipulated (check for record ID gaps around 02:12:55), or the process creation event was in a different log (Sysmon Event 1 if deployed). The missing 4688 entry is itself a finding — it may indicate an evasion technique that warrants documentation.

The correct approach is B. Correlation means reconciling ALL sources, including absences. A missing record where one is expected is as informative as a present record where none is expected.

Try It — Apply the Methodology to a Single Question

Apply the five-step methodology to the question: "Did David Chen (INC-NE-2026-0915) use 7-Zip to archive files on his workstation between August 1 and September 15, 2026?"

Step 1 — Identify: Which artifacts answer this question? Map to specific artifact types. (Hint: execution artifacts for 7z.exe, filesystem artifacts for .7z file creation, user activity artifacts for 7-Zip interaction.)

Step 2 — Extract: What would you extract from the KAPE collection? (Hint: Prefetch files, Amcache.hve, NTUSER.DAT for UserAssist, $MFT for .7z file creation, USN Journal for .7z file operations.)

Step 3 — Parse: What tools would you run, and what records would you look for in the output? (Hint: PECmd for 7Z.EXE-.pf, AmcacheParser for 7z.exe SHA1, MFTECmd filtered for .7z files, MFTECmd USN Journal filtered for .7z extensions.)

Step 4 — Correlate: How would you cross-reference the execution evidence with the file creation evidence? (Hint: Prefetch last-run timestamp should correlate with MFT creation timestamp for .7z files and USN Journal FILE_CREATE entries.)

Step 5 — Conclude: Write the finding statement with confidence level, sources, limitations, and alternative explanations.

Compare your approach with the module's analysis methodology. The methodology in this exercise is the same one you will apply in WF3 (execution artifacts), WF4 (user activity artifacts), and WF13 (the complete insider threat investigation).

Compliance Myth: "A forensic methodology is just paperwork — the real skill is in the analysis"

The myth: Experienced forensic examiners don't need a formal methodology — they know what to look for based on experience. Methodology documentation is bureaucratic overhead that slows down analysis without improving outcomes.

The reality: Methodology documentation is what separates a professional forensic examination from an expert's opinion. Without a documented methodology: another examiner cannot reproduce the analysis (reproducibility failure), the examiner cannot explain their process in testimony beyond "I looked at the evidence and reached this conclusion" (defensibility failure), there is no quality assurance mechanism to ensure completeness (quality failure), and there is no way to determine, after the fact, whether a specific artifact was analyzed or overlooked (accountability failure). The five-step methodology takes no longer to execute than unstructured analysis — it takes longer to document. That documentation is the deliverable that courts, regulators, and insurance assessors evaluate. An unstructured analysis that produces a correct finding is less valuable than a documented methodology that produces the same finding, because the documented version can be verified, reproduced, and defended.

Troubleshooting

"The five-step process seems slow for triage situations." The methodology scales to the context. For SOC triage: Identify (which artifact answers "is this a true positive?" — 30 seconds), Extract (the artifact is already in the EDR or SIEM — 0 seconds), Parse (query or tool output — 1 minute), Correlate (check one additional source — 2 minutes), Conclude (true positive/false positive — 30 seconds). Total: 4 minutes. For a court-facing examination: each step takes hours. The methodology is the same; the depth scales to the consequence of the findings.

"What if I find something unexpected during Step 3 that changes the investigation questions?" Good — that's how real investigations work. Document the new finding, add the new question to the investigation scope, and loop back to Step 1 for the new question. The methodology is iterative, not linear. A ransomware investigation may start with "how did the attacker get in?" and, during MFT analysis, discover evidence of data exfiltration that adds "was personal data exfiltrated?" to the scope. Document the scope change and continue.

"How detailed should the examination documentation be?" Detailed enough that a different examiner can reproduce your analysis from your notes. This means: the specific tool commands you ran (or screenshots), the specific filters you applied, the specific records you examined, the raw validation results, and the correlation logic. If the finding is "Prefetch proves 7z.exe executed on March 15 at 14:22:18" your notes should include the PECmd command, the relevant CSV row, the raw validation of the timestamp, and the corroborating USN Journal entry. For non-critical contextual analysis, a summary is sufficient.

You are at Step 4 (Correlate) of your analysis for the question "when was the ransomware executable first placed on the patient zero workstation?" You have three findings from Step 3: (1) MFT $FN Created timestamp for the ransomware executable shows 2026-10-20 02:17:43 UTC. (2) USN Journal FILE_CREATE entry for the same file shows 2026-10-20 02:17:43 UTC. (3) Amcache entry for the executable shows a timestamp of 2026-10-20 02:19:11 UTC. What is the correct correlation assessment?
The three sources conflict because the Amcache timestamp (02:19:11) differs from the MFT and USN timestamps (02:17:43) by 88 seconds. This conflict undermines the finding — the examiner should report the discrepancy as unresolved and assign low confidence to the file creation time.
The three sources are consistent and corroborate the finding. The MFT $FN Created and USN Journal FILE_CREATE agree precisely (02:17:43), confirming the file was created on disk at that time — two independent sources at high confidence. The Amcache timestamp is 88 seconds later (02:19:11), which is expected: Amcache is populated when the executable is first run or processed by the Application Compatibility subsystem, not when the file is created on disk. The 88-second gap between file creation and first execution is consistent with the attacker staging the file then executing it. The finding is: the ransomware executable was placed on disk at 02:17:43 UTC and first executed approximately 88 seconds later at 02:19:11 UTC — corroborated by three independent sources at high confidence.
The MFT and USN timestamps are the same because they are derived from the same source (NTFS writes both simultaneously), so they count as one source, not two. The Amcache provides the only independent source. The finding has single-source confidence for creation time and needs additional corroboration — check Prefetch for execution timing.
The Amcache timestamp is the most reliable because Amcache records are created by a protected system service. The MFT $FN timestamp could have been affected by file tunneling, and the USN Journal entry could have been injected. Use the Amcache timestamp as the authoritative file creation time.

You've built the foundations of artifact-level forensic analysis.

WF0 gave you the taxonomy, NTFS architecture, and the five-step methodology. WF1 took you inside the MFT at the binary level — every attribute, every timestamp, every edge case. From here, every artifact category gets the same raw-first treatment.

  • WF2–WF10: every major Windows artifact decoded at binary level — USN Journal, Prefetch, Amcache, Shimcache, ShellBags, LNK, Jump Lists, SRUM, Event Logs, and the Registry hives
  • INC-NE-2026-0915 (WF13) — Insider data exfiltration capstone. Work the complete investigation from USB history to OneDrive exfiltration evidence
  • INC-NE-2026-1022 (WF14) — Ransomware capstone. Three-host triage (FIN01 → IT03 → FS01) across the 72-hour attack chain
  • The lab pack — 25+ realistic evidence files in 10 formats, simulated KAPE triage pre-populated, both capstones deployable to your own VM
  • Anti-forensic detection methodology — defeat timestomping, log clearing, and Prefetch deletion with cross-artifact correlation
Unlock with Specialist — £25/mo See Full Syllabus

Cancel anytime