DE0.8 Measuring What Matters

2-3 hours · Module 0 · Free
Operational Objective
The Metrics Problem: Detection engineering programs fail when they cannot demonstrate value. The CISO asks "are we better than last quarter?" The CFO asks "what did the detection investment produce?" Without metrics, the answers are anecdotal. This subsection defines the six metrics that quantify detection program effectiveness, explains how to measure each one in a Microsoft environment, and shows how to construct the quarterly report that justifies continued investment.
Deliverable: The six detection program metrics, how to calculate each from Sentinel data, and the reporting template for leadership communication.
⏱ Estimated completion: 20 minutes

The six metrics

Detection engineering produces measurable outcomes. Unlike many security investments that require faith (“we deployed this tool and nothing bad happened — was it the tool or were we not targeted?”), detection engineering provides concrete before-and-after measurements. The six metrics below are the minimum set that a detection program should track.

DETECTION PROGRAM METRICS — THE SIX THAT MATTER1. COVERAGE %Techniques detected / relevant techniquesNE Baseline: 10.3% raw (30% filtered) → Target: 80% filteredMeasures: breadth of detection capability2. MTTDMean Time to Detect (alert to triage)NE Baseline: unknown → Target: <30 min for criticalMeasures: speed of detection3. TP RATETrue positives / total alerts per ruleNE Baseline: ~40-60% on templates → Target: >70%Measures: accuracy of detection4. FP RATEFalse positives / total alerts per ruleNE Baseline: ~40-60% → Target: <30%Measures: noise (analyst fatigue driver)5. RULES PER ANALYSTActive rules / SOC analyst headcountNE Baseline: 23/2 = 11.5 → sustainable: ~30-50Measures: operational sustainability6. COST PER DETECTIONSentinel monthly cost / TP incidents detectedNE: $2,700/mo ingestion ÷ TP count = $/detectionMeasures: value for moneyTop row: coverage, speed, accuracy → "Are we detecting threats?"Bottom row: noise, capacity, cost → "Is the program sustainable?"

Figure DE0.5 — The six detection program metrics. Top row measures detection effectiveness. Bottom row measures operational sustainability. Both are required for a complete program assessment.

Metric 1: Detection coverage percentage

Coverage is the headline metric. It answers the question every CISO and every auditor wants answered: “What proportion of relevant attack techniques can we detect?”

The calculation is straightforward: distinct ATT&CK techniques with at least one confirmed detection rule divided by the total relevant technique set for your environment. Northgate Engineering’s baseline: 15 confirmed techniques / 145 relevant = 10.3%.

The target is not 100%. Full ATT&CK coverage is neither achievable nor necessary. Some techniques are prevented by controls (conditional access blocks the sign-in before detection is needed). Some techniques are not relevant to your environment. Some techniques cannot be detected with available telemetry. A mature detection program targets 70-80% coverage of the RELEVANT technique set (the filtered subset identified through threat modeling in DE2), with concentration in the tactics that cause the most damage: Initial Access, Credential Access, Persistence, and Lateral Movement. Against the full ATT&CK matrix, this typically corresponds to 25-35% — a number that sounds low but represents comprehensive coverage of the techniques that actually matter for your environment.

Coverage is measured quarterly using the ATT&CK Navigator. Export your Sentinel analytics rules, map each to its ATT&CK technique, and generate a Navigator layer. The visual shows where you are strong and where the gaps remain. The quarterly report to leadership includes this visualization with a comparison to the previous quarter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Coverage assessment: list all analytics rules with ATT&CK mappings
SecurityAlert
| where TimeGenerated > ago(90d)
| where ProviderName == "ASI Scheduled Alerts"
| extend Techniques = parse_json(ExtendedProperties).["Techniques"]
| mv-expand Technique = Techniques
| summarize
    RuleCount = dcount(AlertName),
    Rules = make_set(AlertName)
    by tostring(Technique)
| sort by RuleCount desc
// Compare output against your relevant technique set
// Techniques with RuleCount > 0 = covered
// Techniques absent = gap

Metric 2: Mean Time to Detect (MTTD)

MTTD measures how quickly the detection library identifies a threat after it begins. The measurement starts when the attack technique executes (the telemetry timestamp) and ends when the alert fires (the alert creation timestamp). The difference is the detection delay.

For scheduled rules, MTTD includes the rule’s query frequency. A rule that runs every 5 minutes with a 5-minute lookback has a maximum detection delay of 5 minutes (the event occurred just after the last query ran) and an average delay of 2.5 minutes. A rule that runs every hour has an average delay of 30 minutes. NRT rules have an average delay under 1 minute.

MTTD varies by rule type and should be measured per severity. Critical severity rules (pre-encryption ransomware indicators, confirmed credential compromise) should target MTTD under 5 minutes — which requires NRT rules or 5-minute scheduled rules. Medium severity rules (suspicious inbox rule, anomalous sign-in properties) can tolerate 15-30 minute MTTD.

Metric 3: True positive rate

The true positive rate measures detection accuracy: of all alerts generated by a rule, what percentage were actual security events requiring action? A rule with a 90% TP rate generates 9 actionable alerts for every 1 false positive. A rule with a 30% TP rate generates 3 actionable alerts for every 7 false positives — the analyst spends more time dismissing noise than investigating threats.

TP rate is measured per rule, not as a program average. A program average of 70% TP rate could mean every rule performs at 70% (healthy) or it could mean half the rules are at 95% and the other half are at 45% (the bottom half needs tuning). Per-rule TP rate identifies which specific rules need attention.

The minimum acceptable TP rate depends on the rule’s severity and the cost of missing the technique. A high-severity rule detecting ransomware pre-encryption indicators can tolerate a 50% TP rate because every true positive prevents catastrophic damage. A low-severity rule detecting “unusual PowerShell execution” at a 50% TP rate is generating unsustainable noise. Target 70%+ for medium severity and 50%+ for high/critical severity.

Metric 4: False positive rate

The inverse of TP rate, measured separately because the remediation action is different. TP rate tells you how effective the rule is. FP rate tells you how much analyst time the rule wastes.

FP rate drives tuning decisions. A rule with a 60% FP rate is a tuning priority — it wastes more analyst time than it saves. A rule with a 10% FP rate is healthy. The monthly tuning review (DE9) identifies rules with FP rates above threshold and schedules them for refinement.

FP classification matters. Not all false positives are the same. A “benign true positive” (the rule correctly detected the activity, but the activity was authorized — an IT admin running recon commands during a patching operation) requires a watchlist exclusion for the IT admin’s account. An “environmental FP” (the rule fires on normal activity unique to your environment — manufacturing USB usage triggering removable media alerts) requires a filter specific to your environment. A “logic FP” (the KQL is wrong — the threshold is too low or the join condition is too broad) requires a rule rewrite. DE9 teaches this classification in depth.

⚠ Compliance Myth: "A high false positive rate means the rule is too sensitive — we should raise the threshold"

The myth: False positives are solved by increasing thresholds. If a password spray rule fires at 5 failed logins per hour and generates FPs, raise it to 20.

The reality: Raising the threshold reduces false positives AND true positives. An attacker doing a low-and-slow spray at 8 attempts per hour now falls below the threshold. The correct approach depends on the FP type. If the FPs are benign true positives (IT admins triggering the rule), the fix is a watchlist exclusion for IT admin accounts — not a threshold increase. If the FPs are environmental (a specific application generates authentication failures during health checks), the fix is a filter for that application’s service principal. Threshold adjustment is the fix for logic FPs where the base rate of the event is higher than expected. DE9 teaches how to diagnose which fix applies.

Metrics 5 and 6: Sustainability and cost

Rules per analyst measures whether the detection library is operationally sustainable. Each rule that fires generates alerts that require triage. A team of 2 SOC analysts (Northgate’s L1 staff: Tom and Priya) can sustainably triage alerts from approximately 30-50 well-tuned rules (TP rate >70%) during a standard shift. If the detection library grows to 100 rules but the SOC team stays at 2, either alert fatigue degrades response quality or lower-severity rules get deprioritized. This metric ensures the detection library grows in proportion to the team’s capacity to operate it.

Cost per detection divides the monthly Sentinel ingestion cost by the number of true positive incidents detected. Northgate’s ingestion is approximately $2,700/month. If the detection library produces 30 true positive incidents per month, the cost per detection is $90. If the same ingestion produces 5 true positive incidents, the cost per detection is $540. This metric demonstrates value — the CFO understands “$90 per threat detected” better than “we deployed 15 new analytics rules.” As the detection library grows and produces more true positives from the same ingestion volume, cost per detection decreases — the investment becomes more efficient over time.

Try it yourself

Exercise: Calculate your baseline metrics

If you have access to a Sentinel workspace with analytics rules in production:

1. Count your active analytics rules. Map each to ATT&CK techniques. Calculate coverage %.

2. Run the following query to measure alert volume by rule over the past 30 days:

SecurityAlert | where TimeGenerated > ago(30d) | summarize AlertCount = count() by AlertName | sort by AlertCount desc

3. For the top 5 noisiest rules, estimate the FP rate by reviewing the last 10 alerts from each rule. How many were true positives? How many were false positives?

If you do not have access, use the Northgate Engineering baselines from this subsection for the course exercises.

The quarterly report

Every quarter, the detection engineer produces a report for the CISO (and through the CISO, for the CFO and board). The report contains: coverage percentage with ATT&CK heatmap (before and after), new rules deployed this quarter, rules tuned this quarter, MTTD by severity tier, TP rate by rule category, FP rate trend, cost per detection, and the detection backlog status (what is planned for next quarter).

This report is the deliverable that sustains the program. Without it, detection engineering is invisible work — rules get built, alerts fire, analysts triage, but nobody outside the security team sees the improvement. With it, the detection engineering investment has a measurable return and a clear trajectory.

DE11 (the capstone) produces this report for Northgate Engineering. You will build it from the rules you develop across the course, populate it with metrics from the alert-simulator exercises, and present it as the program’s 90-day outcome.

Check your understanding

Your CISO asks: "We spent $32,400 on Sentinel ingestion last year. What did we get for that?" How do you answer using detection program metrics?

Answer: "In the past 12 months, our detection library generated [X] true positive incidents — threats that were identified and investigated because our rules detected them. That is a cost per detection of $32,400 / [X] = $[Y] per threat detected. Our coverage expanded from [A]% to [B]% of relevant ATT&CK techniques. Our MTTD improved from [C] to [D] minutes for critical alerts. Without these rules, these threats would have gone undetected until damage occurred — at an average incident cost of $[Z] per breach for our industry. The detection investment prevented estimated losses of $[X × Z] against a cost of $32,400." The answer uses metrics to translate security investment into business value.

Troubleshooting: “We do not track TP/FP rates”

This is normal at Level 1 maturity. Most organizations do not track TP/FP rates per rule because they do not have a systematic triage process that classifies alert outcomes. Analysts close alerts without recording whether they were TP, FP, or benign TP.

The minimum viable tracking: Add a comment or tag to each Sentinel incident when closing it: “TP”, “FP”, or “BTP” (benign true positive). This takes 2 seconds per incident and provides the data needed to calculate rates per rule. After 30 days, run a query against SecurityIncident that groups by AlertProductNames and classification to produce per-rule rates.

The better approach (DE10): Automate classification tracking via automation rules that prompt the analyst for classification at incident closure and log the classification to a custom table. The monthly metrics query runs against this table automatically.


References used in this subsection

  • Course cross-references: DE2 (coverage assessment methodology), DE9 (tuning and FP management), DE10 (program operations and reporting), DE11 (capstone — full quarterly report)

You're reading the free modules of Detection Engineering

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus