In this section

TH1.15 Quality Assurance: Peer Review and Hunt Validation

3-4 hours · Module 1 · Free

Operational Objective

A hunt is only as good as its weakest query. A single misconfigured filter, a missed join condition, or a baseline contaminated by attacker activity can produce a false negative that masks a compromise. Quality assurance — reviewing hunts before they close and validating that the methodology was followed correctly — catches these errors before they become undetected intrusions. This subsection defines a lightweight QA process that improves hunt quality without creating bureaucratic overhead.

Deliverable: A peer review checklist for hunt records, a validation process for detection rules produced by hunts, and the discipline to review every hunt before marking it complete.

⏱ Estimated completion: 20 minutes

The error you do not catch

A hunt for inbox rule manipulation queries CloudAppEvents for New-InboxRule operations. The query returns zero results. The analyst concludes: "No malicious inbox rules in the last 30 days." The hunt record is filed. The detection rule is deployed.

But the attacker used the Graph API to create the inbox rule, not the Outlook client. The Graph API creation path produces a different operation name in CloudAppEvents — or may appear only in MicrosoftGraphActivityLogs. The query was correct for one creation method and blind to another. The conclusion was false negative. The detection rule has the same blind spot.

// Detection rule validation: does the new rule cover
// what the hunt query covered?
// Compare the hunt query results against the rule's results
// over the same time window
// Hunt query result count (from hunt record): ____
// Run the detection rule query manually over the hunt's time window:
// If the rule returns fewer results than the hunt query,
// the rule's filters or exclusions may be too aggressive
// If the rule returns MORE results, the rule is less precise
// and may need additional exclusions from the FP analysis

Expand for Deeper Context

A 5-minute peer review would have caught this. "Did you check all inbox rule creation paths — Outlook, OWA, PowerShell, EWS, and Graph API?" The question identifies the gap. The analyst runs an additional query. The conclusion changes.

Three review points

Review point 1: Before the hunt — hypothesis and scope review. Before the first query runs, a second analyst (or the SOC lead) reviews the hypothesis and scope definition for 5 minutes.

Review questions: Is the hypothesis testable with the scoped data sources? Are all relevant data tables included? Is the time window appropriate for the technique? Is the population correctly defined? Are there technique variants that the scope should cover but does not?

This review catches the inbox rule example above — the reviewer asks about creation paths before the hunt starts, not after it concludes.

Review point 2: Before closing — hunt record review. Before the hunt is marked complete, a second analyst reviews the hunt record for completeness and methodology adherence.

Review questions: Were all four query funnel steps executed (orientation, indicator, enrichment, pivot)? Were false positives analyzed and documented? Were exclusions justified with specific evidence? Is the conclusion supported by the analysis? If the conclusion is "refuted" (no finding), does the scope cover the technique adequately — or could the technique have been present but invisible due to scope gaps?

Review point 3: Before deploying — detection rule review. Before the hunt-derived detection rule goes into production, a second analyst reviews the rule for correctness, exclusion appropriateness, and threshold calibration.

Review questions: Does the rule's time window and frequency avoid gaps? Are the exclusions from the FP analysis documented and justified? Could any exclusion be exploited by an attacker (e.g., excluding an IP range that an attacker could route through)? Is the entity mapping correct? Is the severity appropriate?

The peer review checklist

Figure TH1.15 — Three review points for hunt quality assurance. Total time: approximately 20 minutes per hunt. The investment prevents false negatives that undermine the entire program.

For solo hunters

If your team has only one analyst who hunts, peer review is not available. Three adaptations:

Self-review with a checklist. Before closing a hunt, walk through the checklist below. Check each item honestly. The checklist compensates for the absence of a second perspective.

Time-delayed review. Complete the hunt. Wait 24 hours. Re-read the hunt record with fresh eyes. The overnight gap provides cognitive distance that catches errors you missed during the hunt session.

Periodic batch review. Every 3 months, review the last quarter's hunt records as a batch. Look for patterns — are you consistently missing certain data sources? Are your exclusions getting more permissive over time? Are your conclusions consistently reaching one outcome (always refuted, never confirmed)? Patterns in your own work reveal systematic biases that individual self-review misses.

Try it yourself

Exercise: Review your first hunt record

If you completed the exercises from TH1.1 through TH1.7, you have a hunt record. Walk through the peer review checklist below against your own record.

For each checklist item, answer honestly: did the hunt do this? If any item is unchecked, consider whether the gap could have affected the conclusion. If it could have, add a note to the hunt record documenting the gap and its potential impact.

This self-review exercise builds the QA habit that becomes automatic after a few campaigns.

Peer review in practice

Hunt peer review follows the same principle as code review: a second analyst reviews the hunt methodology, queries, and conclusions before the findings are published or acted upon. The reviewer checks: does the hypothesis match the data source queried? Do the KQL queries correctly implement the detection logic described in the hypothesis? Are the findings supported by the query results, or is the conclusion a stretch? Were alternative explanations considered? At NE, every hunt producing a positive finding goes through peer review before escalation. This prevents two failure modes: false confidence (the hunter sees what they expected to find rather than what the data shows) and missed context (the reviewer may know about a legitimate business process that explains the anomalous activity).

Peer review also builds team capability through knowledge transfer. The reviewer learns the hunter's analytical approach — the specific KQL patterns, the data source selection logic, the interpretation methodology. Over time, the team develops a shared analytical vocabulary and a library of validated patterns that any member can apply. This knowledge distribution is a resilience mechanism: when the team's primary hunter is unavailable, other analysts can execute hunts using the same validated methodology.

⚠ Compliance Myth: "Peer review slows down hunting — we should hunt fast and review later"

The myth: QA adds overhead that reduces the number of hunts completed. The priority is volume — more hunts means more coverage improvement.

The reality: A hunt with a methodology error — a missed data source, a contaminated baseline, an overly narrow scope — produces a false negative that is worse than no hunt at all. The false negative creates documented (but incorrect) assurance that a technique was searched for and not found. Future analysts may deprioritize that technique based on the false negative, allowing the compromise to persist. Twenty minutes of QA per hunt prevents errors that undermine the program's credibility and operational value. The priority is not volume — it is accuracy. Ten accurate hunts per year outperform twenty flawed hunts.

Extend this process

As the hunting program matures, the peer review process naturally evolves. Initial reviews focus on methodology compliance (did you follow the Hunt Cycle?). Mature reviews focus on analytical quality (did you interpret the data correctly? Did you miss a correlation? Could you have enriched further?). The highest-level reviews focus on strategic questions (are we hunting the right techniques? Is the backlog prioritized correctly? Are we improving coverage in the areas that matter most?). TH15 (Phase 3) covers mature review practices for established hunting programs.

📋 Operational Artifact — Peer Review Checklist

Hunt [ID] — QA Review

Review 1: Hypothesis and Scope

☐ Hypothesis is specific, testable, grounded, actionable

☐ All relevant data sources are included (no technique variants missed)

☐ Time window matches technique's expected dwell time

☐ Population covers the full attack surface (or exclusions are justified)

☐ Baseline window is separated from detection window (gap prevents contamination)

Review 2: Hunt Record

☐ All four query funnel steps documented (orientation, indicator, enrichment, pivot)

☐ Every query has purpose, result count, and assessment recorded

☐ FP analysis documented with categories and exclusion implications

☐ Enrichment covers multiple dimensions (not single-dimension conclusions)

☐ Conclusion is explicitly stated and supported by the analysis

☐ If refuted: scope was sufficient to confidently rule out the technique

Review 3: Detection Rule

☐ Core KQL matches the hunt query logic

☐ Time window and frequency avoid detection gaps

☐ Exclusions are documented with evidence from FP analysis

☐ No exclusion creates an exploitable hiding place

☐ Entity mapping and severity are correct

☐ ATT&CK technique is mapped in rule metadata

Reviewer: _____ | Date: _____ | Result: [Approved / Revisions needed: ____]

References Used in This Subsection

Course cross-references: TH1.2 (scope — technique variant coverage), TH1.7 (hunt record template), TH1.6 (detection rule conversion), TH15 (mature review practices)

NE environmental considerations

NE's detection environment includes specific factors that influence this rule's operation:

Device diversity: 768 P2 corporate workstations with full Defender for Endpoint telemetry, 58 P1 manufacturing workstations with basic cloud-delivered protection, and 3 RHEL rendering servers with Syslog-only coverage. Rules targeting DeviceProcessEvents operate with full fidelity on P2 devices but may have reduced visibility on P1 devices. Manufacturing workstations in Sheffield and Sunderland represent a detection gap for endpoint-level detections.

Expand for Deeper Context

Network topology: 11 offices connected via Palo Alto SD-WAN with full-mesh connectivity. The SD-WAN firewall logs feed CommonSecurityLog in Sentinel. Cross-site lateral movement generates firewall allow events that correlate with DeviceLogonEvents — enabling multi-source detection that single-table rules cannot achieve.

User population: 810 users with distinct behavioral profiles — office workers (predictable hours, consistent applications), field engineers (variable hours, travel patterns), IT administrators (elevated privilege, broad access patterns), and manufacturing operators (fixed shifts, limited application access). Each user population has different detection baselines.

Decision point

You have time for one hunt this quarter. Do you hunt for the threat in the latest advisory or for the gap in your ATT&CK coverage matrix?

Hunt the coverage gap. Advisories describe threats that are CURRENT but may not target NE. Coverage gaps describe techniques that COULD target NE and would succeed undetected. The coverage gap hunt produces a detection rule (closing the gap permanently). The advisory-driven hunt produces a point-in-time assessment (confirming the specific threat is not present today). Both are valuable — but the coverage gap hunt has a longer-lasting impact because it produces a permanent detection improvement.

A hunt query returns 200 results. You have 4 hours remaining in the hunt window. You can investigate 20 results thoroughly or review all 200 superficially. Which approach produces better hunt outcomes?

Review all 200 — you might miss a critical finding in the 180 you skip.

Investigate 20 thoroughly. A superficial review of 200 results produces 200 'looked at it, seemed okay' assessments that provide no investigative value and no documentation for future reference. A thorough investigation of 20 results produces: confirmed findings (true positives requiring remediation), confirmed benign patterns (documented baselines for future comparison), and inconclusive results (flagged for monitoring). Prioritise the 20 by: highest anomaly score, highest-value assets involved, and highest-risk users involved. Document why the remaining 180 were not investigated and recommend a follow-up hunt with refined query criteria to reduce the result set.

Investigate 20 — but only if they are from the most recent 24 hours.

Neither — refine the query first to reduce the result set below 50.

You understand the detection gap and the hunt cycle.

TH0 showed you what detection rules fundamentally cannot catch. TH1 gave you the hypothesis-driven methodology that closes that gap. Now you run the hunts.

10 complete hunt campaigns — from hypothesis through KQL execution through finding disposition, each campaign based on a real TTP
70 production hunt queries — every one mapped to MITRE ATT&CK and tested against realistic telemetry
Advanced KQL for hunting — UEBA composite risk scoring, retroactive IOC sweeps, and hunt management metrics
Hypothesis-Driven Hunt Toolkit lab pack — 30 days of realistic M365 and endpoint telemetry with multiple attack patterns seeded in
TH16 — Scaling hunts across a team — the operating model for a production hunt program

Unlock the full course with Premium See Full Syllabus

← Previous Next →