In this section

TH0.14 Hunting Program Metrics Dashboard

3-4 hours · Module 0 · Free

Operational Objective

A hunting program without metrics is a program that cannot demonstrate value, cannot identify where it is improving, and cannot justify continued investment. This subsection provides the KQL queries for a hunting program metrics dashboard — deployable in a Sentinel workbook on day one — that tracks the four metrics from TH0.7 plus operational health indicators.

Deliverable: A set of production-ready KQL queries that track hunting program effectiveness, deployable as a Sentinel workbook or run individually for quarterly reporting.

⏱ Estimated completion: 30 minutes

Measure what matters

TH0.7 defined four metrics: detection coverage gap closure rate, hunt discovery rate, dwell time compression, and MTTD trend. This subsection provides the KQL for each, plus three operational health metrics that tell you whether the program itself is functioning.

Metric 1: Detection coverage trend

Track this quarterly. The numerator comes from your Sentinel analytics rules with ATT&CK mappings. The denominator is your relevant technique set (defined once in TH3, updated annually).

// Detection coverage trend — run quarterly and record
// Numerator: distinct ATT&CK techniques with at least one rule
SecurityAlert
| where TimeGenerated > ago(90d)
| where ProviderName == "ASI Scheduled Alerts"
| extend Techniques = tostring(
    parse_json(ExtendedProperties).["Techniques"])
| where isnotempty(Techniques) and Techniques != "[]"
| summarize by Techniques
| summarize CoveredTechniques = count()
// Record this number quarterly alongside your denominator
// Plot: Q1: 22/95 = 23% → Q2: 28/95 = 29% → Q3: 34/95 = 36%
// The upward trend is directly attributable to hunt-derived rules

Metric 2: Hunt-derived detection rules

// Hunt-derived detection rule inventory
SecurityAlert
| where TimeGenerated > ago(365d)
| where ProviderName == "ASI Scheduled Alerts"
| where AlertName startswith "HUNT-"
| summarize
    TotalAlerts = count(),
    FirstAlert = min(TimeGenerated),
    LastAlert = max(TimeGenerated),
    TruePositiveEstimate = countif(
        AlertSeverity in ("High", "Medium"))
    by AlertName
| extend DaysActive = datetime_diff(
    'day', now(), FirstAlert)
| sort by FirstAlert asc
// Each row = one detection rule produced by hunting
// DaysActive shows how long each rule has been in production
// TruePositiveEstimate (High+Medium alerts) indicates rule quality
// Target: +1 rule per month, growing over the program lifetime

// Hunt discovery rate — proportion of incidents found by hunting
SecurityIncident
| where TimeGenerated > ago(180d)
| where Status == "Closed"
| extend Discovery = iff(
    Title has "HUNT-" or tostring(Labels) has "hunt",
    "Proactive Hunt", "Automated Detection")
| summarize Count = count() by Discovery
| extend Total = toscalar(
    SecurityIncident
    | where TimeGenerated > ago(180d)
    | where Status == "Closed" | count)
| extend Percentage = round(100.0 * Count / Total, 1)
// Even 5% hunt discovery = 5% of incidents invisible without hunting
// Track quarterly — rate should remain stable or increase as
//   hunts target techniques with no automated detection

// Dwell time comparison: hunt-found vs rule-found
SecurityIncident
| where TimeGenerated > ago(365d)
| where Status == "Closed"
| extend EarliestEvidence = todatetime(
    parse_json(tostring(AdditionalData)).firstActivityTimeUtc)
| where isnotempty(EarliestEvidence)
| extend DwellDays = datetime_diff(
    'day', CreatedTime, EarliestEvidence)
| where DwellDays >= 0 and DwellDays < 365
| extend Discovery = iff(
    Title has "HUNT-" or tostring(Labels) has "hunt",
    "Proactive Hunt", "Automated Detection")
| summarize
    MedianDwell = percentile(DwellDays, 50),
    P90Dwell = percentile(DwellDays, 90),
    Count = count()
    by Discovery
// The comparison tells the story:
// Automated: median X days, P90 Y days
// Hunting: median A days, P90 B days
// If A < X, hunting is compressing dwell time as expected

// Hunt program cadence — are hunts happening on schedule?
// This query assumes hunt-derived rules are named HUNT-THx-NNN
SecurityAlert
| where TimeGenerated > ago(365d)
| where ProviderName == "ASI Scheduled Alerts"
| where AlertName startswith "HUNT-"
| extend RuleDeployMonth = startofmonth(min_of(TimeGenerated))
| summarize NewRulesDeployed = dcount(AlertName)
    by RuleDeployMonth
| sort by RuleDeployMonth asc
// Each row = new hunt-derived rules deployed that month
// Target: ≥1 per month for a monthly cadence program
// Months with 0 = hunting did not produce a rule (or did not execute)
// Consistent gaps indicate the program is stalling

Expand for Deeper Context

Count the detection rules that exist because a hunt produced them. This is the tangible output of the program.

Metric 3: Hunt discovery rate

What percentage of incidents were discovered through proactive hunting versus automated detection? This metric requires consistent labeling — tag hunt-discovered incidents with "HUNT-" prefix or a "hunt-discovered" label when escalating.

Metric 4: Dwell time by discovery source

Compare dwell time for hunt-discovered incidents versus rule-detected incidents. If hunting is working, hunt-discovered incidents should have shorter dwell times on average — because hunting found them before they would have been detected by other means.

Operational health: program cadence

Is the hunting program actually executing on schedule? Track hunts completed per month.

Figure TH0.14 — Hunting program metrics. Four value metrics for leadership reporting. Three health metrics for internal program management.

Try it yourself

Exercise: Establish your baseline metrics

Run each of the five KQL queries in this subsection against your Sentinel workspace. Record the results as your baseline — the starting point before the hunting program begins (or the current state if hunting has already started).

If hunt-derived rules (HUNT-* naming) do not exist yet, metrics 2–5 will return empty results. That is your HMM0/HMM1 baseline. After executing your first three campaigns, re-run and compare.

If you want to deploy these as a persistent dashboard, create a Sentinel workbook with each query as a separate visualization. TH16 covers workbook creation in detail, but the queries above are ready to paste into workbook query tiles today.

Metrics that prove ROI

Three metrics justify hunting program investment to leadership: (1) Unique findings — threats discovered by hunting that no existing detection rule caught. Each unique finding represents a gap in automated detection that the hunting program filled. (2) Detection rules created from hunt findings — hunts that produce new analytics rules provide permanent defensive improvement, not just a one-time finding. (3) Mean time to detection improvement — measure MTTD for threats in categories where hunting operates versus categories without hunting coverage. If hunted threat categories show faster detection, the hunting program demonstrably accelerates the SOC's response capability.

The dashboard should avoid vanity metrics: hunts conducted per quarter (measures activity, not outcomes), hours spent hunting (measures input, not output), and data sources queried (measures scope, not effectiveness). A team conducting 20 hunts per quarter with 0 unique findings is less effective than a team conducting 4 hunts with 2 unique findings that became production detection rules.

The queries developed during this exercise become reusable templates in your personal hunting library. Parameterise the hardcoded values (user names, IP addresses, time windows) and add a header comment explaining the hypothesis each query tests. A mature hunting program maintains 50-100 parameterised query templates that any team member can execute — reducing the per-hunt preparation time from hours to minutes and ensuring consistent methodology across analysts.

Dashboard refresh cadence

The hunting metrics dashboard should update automatically from Sentinel data — not require manual data entry. Build the dashboard as a Sentinel workbook with parameterised KQL queries that pull hunt results, finding counts, and detection rule conversion rates directly from the workspace. Manual dashboards decay because the analyst responsible for updating them inevitably falls behind. Automated dashboards reflect reality because they query the same data the analysts work with every day.

⚠ Compliance Myth: "Hunting metrics are only for internal use — auditors do not care about them"

The myth: Hunting program metrics are operational data. Auditors want policies and procedures, not KQL query outputs.

The reality: Auditors want evidence that controls are operating effectively. A hunting program with documented metrics — hunts completed, coverage improved, incidents discovered, detection rules produced — provides stronger evidence of proactive monitoring than a policy that says "we will conduct threat hunting" without proof of execution. The quarterly metrics report is audit evidence. The hunt records referenced by those metrics are audit evidence. The detection rules deployed from hunts are audit evidence. Metrics are not operational overhead — they are the proof that the program exists beyond a document.

Extend this dashboard

The metrics here are the minimum viable set. Organizations with mature hunting programs often add: hypothesis source distribution (which of the six sources generates the most productive hypotheses?), false positive rate for hunt-derived rules (are hunt-based rules better tuned than non-hunt rules?), analyst skill development tracking (which analysts produce the most findings per hunt hour?), and technique recurrence (do techniques found by hunting reappear after remediation?). Add these as the program matures and the baseline metrics stabilize. Start with the seven described here.

📋 Operational Artifact — Quarterly Hunting Program Report Template

Hunting Program Report — Q[N] [Year]

Coverage: [X]% → [Y]% (+[Z] percentage points). [N] new techniques covered by hunt-derived rules.

Hunts completed: [N] campaigns. [N] confirmed findings, [N] refuted (negative), [N] inconclusive.

Detection rules produced: [N] new HUNT-* rules deployed. [N] total hunt-derived rules in production.

Incidents discovered through hunting: [N] ([%] of all closed incidents this quarter).

Dwell time: Hunt-discovered median [A] days vs automated median [B] days.

Program health: [N] hunts per month vs target of [N]. Backlog depth: [N] hypotheses. Rule deployment time: avg [N] days from hunt to production.

Next quarter focus: [Top 3 hypotheses from backlog with priority rationale].

References Used in This Subsection

Microsoft. "Create Workbooks in Microsoft Sentinel." Microsoft Learn. https://learn.microsoft.com/en-us/azure/sentinel/monitor-your-data
Course cross-references: TH0.7 (metric definitions), TH16 (workbook deployment), TH15 (reporting)

NE environmental considerations

NE's detection environment includes specific factors that influence this rule's operation:

Device diversity: 768 P2 corporate workstations with full Defender for Endpoint telemetry, 58 P1 manufacturing workstations with basic cloud-delivered protection, and 3 RHEL rendering servers with Syslog-only coverage. Rules targeting DeviceProcessEvents operate with full fidelity on P2 devices but may have reduced visibility on P1 devices. Manufacturing workstations in Sheffield and Sunderland represent a detection gap for endpoint-level detections.

Expand for Deeper Context

Network topology: 11 offices connected via Palo Alto SD-WAN with full-mesh connectivity. The SD-WAN firewall logs feed CommonSecurityLog in Sentinel. Cross-site lateral movement generates firewall allow events that correlate with DeviceLogonEvents — enabling multi-source detection that single-table rules cannot achieve.

User population: 810 users with distinct behavioral profiles — office workers (predictable hours, consistent applications), field engineers (variable hours, travel patterns), IT administrators (elevated privilege, broad access patterns), and manufacturing operators (fixed shifts, limited application access). Each user population has different detection baselines.

Decision point

Your hunt found a confirmed threat. The finding should become a detection rule. Do you build the rule yourself or hand it to the detection engineering team?

Hand it to the detection engineering team with a complete handoff: the KQL query, the entity mapping, the expected FP patterns you observed during the hunt, the severity recommendation, and the suggested response action. The hunter's expertise is in hypothesis generation and data exploration. The detection engineer's expertise is in rule optimization, FP management, and production deployment. The handoff template ensures the detection engineer has everything they need to build a production-quality rule without re-investigating the finding.

A hunt query returns 200 results. You have 4 hours remaining in the hunt window. You can investigate 20 results thoroughly or review all 200 superficially. Which approach produces better hunt outcomes?

Review all 200 — you might miss a critical finding in the 180 you skip.

Investigate 20 thoroughly. A superficial review of 200 results produces 200 'looked at it, seemed okay' assessments that provide no investigative value and no documentation for future reference. A thorough investigation of 20 results produces: confirmed findings (true positives requiring remediation), confirmed benign patterns (documented baselines for future comparison), and inconclusive results (flagged for monitoring). Prioritise the 20 by: highest anomaly score, highest-value assets involved, and highest-risk users involved. Document why the remaining 180 were not investigated and recommend a follow-up hunt with refined query criteria to reduce the result set.

Investigate 20 — but only if they are from the most recent 24 hours.

Neither — refine the query first to reduce the result set below 50.

You understand the detection gap and the hunt cycle.

TH0 showed you what detection rules fundamentally cannot catch. TH1 gave you the hypothesis-driven methodology that closes that gap. Now you run the hunts.

10 complete hunt campaigns — from hypothesis through KQL execution through finding disposition, each campaign based on a real TTP
70 production hunt queries — every one mapped to MITRE ATT&CK and tested against realistic telemetry
Advanced KQL for hunting — UEBA composite risk scoring, retroactive IOC sweeps, and hunt management metrics
Hypothesis-Driven Hunt Toolkit lab pack — 30 days of realistic M365 and endpoint telemetry with multiple attack patterns seeded in
TH16 — Scaling hunts across a team — the operating model for a production hunt program

Unlock the full course with Premium See Full Syllabus

← Previous Next →