Is Your Detection Program Effective? And How Would You Know?

Ask a security team whether their detection program is effective and you'll get one of three answers. "We have 47 active rules" — which tells you nothing about what those rules catch. "We haven't had a breach" — which is either luck, good detection, or an undetected compromise. "I think so?" — which is the honest answer for most organizations, and the one that should concern you.

An effective detection program is not one that has rules. It's one that can prove, with data, that it catches the attacks that matter for its specific environment.

What an effective detection program actually looks like

Three characteristics separate a program from a rule collection.

It's measured. The team knows their ATT&CK coverage percentage — not their rule count. They know which techniques they detect, which they don't, and why the gaps exist. When the CISO asks "are we protected against the attack that hit our competitor?" the answer is a coverage map, not a reassurance.

It's maintained. A monthly tuning cadence classifies every false positive by root cause and applies targeted fixes. Rules that haven't fired in 90 days get investigated — is the technique absent, or is the data source broken?

It's engineered. Every rule starts with a hypothesis, gets documented with a specification, gets tested against attack data, and gets deployed through a versioned pipeline. Detection-as-code — championed by Florian Roth (creator of Sigma and YARA) and the SpecterOps team — is the operational model that makes this sustainable.

Here's a quick test. Run this against your SIEM — Sentinel, Splunk, Elastic, whatever you operate. The concept is the same on every platform. This is the Sentinel KQL version:

KQL — Sentinel

// How many ATT&CK techniques do your rules actually cover?
SecurityAlert
| where TimeGenerated > ago(90d)
| where Status != "Dismissed"
| mv-expand technique = todynamic(ExtendedProperties).["Techniques"]
| where isnotempty(technique)
| summarize CoveredTechniques = dcount(tostring(technique))

The Splunk equivalent:

SPL — Splunk

index=notable earliest=-90d
| mvexpand mitre_technique
| stats dc(mitre_technique) as CoveredTechniques

If your number is under 20, you're in the majority. Most organizations that have calculated their coverage land between 5% and 15%. The ATT&CK Enterprise matrix contains 222 techniques across 15 tactics. Your relevant set — filtered for your industry, your technology stack, and your threat actors — is typically 80–150. A coverage number of 15 against a relevant set of 145 is 10.3%.

That number is not a failing grade. It's a starting point. And it's more than most organizations have — because most organizations have never calculated it.

The skills that make detection programs work

Detection engineering is not a tool skill. Knowing KQL, SPL, or Lucene is necessary but not sufficient — the same way knowing Python doesn't make someone a software engineer. Five skills separate an effective detection engineer from someone who writes queries.

Threat modeling for detection

Which attack techniques are relevant to your organization? Not all 222 — the subset your threat actors use, your technology stack enables, and your critical assets make impactful. The MITRE ATT&CK framework provides the taxonomy. Threat intelligence from Mandiant, CrowdStrike, CISA, and sector ISACs provides the actor profiles. The detection engineer bridges the two.

Signal identification

The hardest skill. An attacker creates an inbox rule. A legitimate user creates an inbox rule. The telemetry looks identical. The detection engineer identifies the distinguishing characteristic — the forwarding destination, the timing relative to a sign-in anomaly, the rule naming patterns attackers use. This can't be templated because the legitimate baseline differs at every organization.

Engineering discipline

Every rule gets a specification before the query is written. Here's what a rule specification looks like in Sigma — the vendor-agnostic detection format that compiles to KQL, SPL, Lucene, and others:

Sigma

title: Password Spray via Failed Auth from Single IP
id: 8c4d35c9-7b3d-4a17-9e7e-1b6d0c3f8a2e
status: production
description: |
  Detects a single IP failing auth against 10+ distinct users
  within 10 minutes — the structural constraint of credential spray.
  Distinct-user ratio is the stable signal. IP reputation is not.
author: Detection Engineering Team
tags:
  - attack.credential_access
  - attack.t1110.003
logsource:
  product: azure
  service: signinlogs
detection:
  selection:
    ResultType: '50126'
  timeframe: 10m
  condition: selection | count(UserPrincipalName) by IpAddress > 10
level: high
falsepositives:
  - Shared VPN exit IPs with 15+ users authenticating simultaneously
  - NAT gateways at branch offices

The Sigma rule captures the detection logic portably. But the engineering context — why this threshold, what the expected false positives are, what the analyst should do when it fires, how it fits the threat model — lives in the specification document alongside it. Tools like Sigma, Elastic Detection Rules, and Splunk Security Content all codify detection logic in shareable, testable, versionable formats.

Adversarial thinking

Before deploying a rule, the detection engineer asks: "How would I bypass this?" If the rule detects spray by counting failures from a single IP, the attacker distributes across 50 residential proxies. The engineer who anticipates this targets the behavioral pattern — many distinct users failing from any source — rather than the infrastructure. This thinking, practiced through purple team exercises and frameworks like Atomic Red Team (open-source from Red Canary), is what makes rules resilient.

Communication

The CISO needs a coverage percentage and a trend line. The board needs a risk reduction narrative. The SOC needs a response procedure for every rule. The detection engineer who builds excellent rules but can't demonstrate their value runs a program that loses funding. The metrics in the next section are the communication language.

The four numbers — with benchmarks

Most detection programs have zero of these. The organizations that have all four can answer "is it effective?" with data.

Coverage: what proportion of relevant attacks can you detect?

Distinct techniques with active detection, divided by the relevant technique set. Kyle Bailey's Detection Engineering Maturity Matrix and Kaspersky's Threat Profile Alignment metric both formalize this measurement.

Industry Benchmarks

No detection engineering function: 5–15%. Established program (6–12 months): 40–65%. The gap between 10% and 50% is the work of one focused detection engineer operating for two quarters.

Speed: how fast do you detect after execution?

MTTD — minutes from attack execution to first alert. The 2026 benchmarks show self-detected incidents at hours to days, externally reported incidents at weeks to months. MTTD only measures rules that fire — the 90% of techniques with no rule have infinite MTTD.

The real number

A 5-minute MTTD across 10% coverage means 90% of attacks are never detected at all, regardless of speed. Moving techniques from "infinite" to "minutes" is the most impactful improvement a detection program delivers.

Accuracy: can analysts trust the alert queue?

False positive rate. What percentage of alerts are noise that the SOC has learned to ignore?

Industry Benchmarks

Untuned libraries: 40–60% FP. After 6 months of monthly tuning: 15–25%. Above 40%, the SOC has been trained to ignore alerts — every false positive is a teaching moment in the wrong direction.

Health: how many of your rules are actually working?

Healthy rules divided by total active rules. A rule that hasn't fired in 90 days because its data source disconnected gives false confidence. A rule that fires 50 times a day and gets auto-closed is noise occupying a slot.

Run this to check rule health on any Sentinel workspace:

KQL — Sentinel

// Rule health classification — which rules are actually working?
SecurityAlert
| where TimeGenerated > ago(90d)
| summarize
    LastFired = max(TimeGenerated),
    AlertCount = count(),
    DailyAvg = round(count() / 90.0, 1)
    by AlertName = DisplayName
| extend Health = case(
    DailyAvg > 5, "Noisy",
    LastFired < ago(60d), "Dormant",
    AlertCount < 3, "Low-volume",
    "Healthy")
| summarize count() by Health

Industry Benchmarks

Without maintenance: 30–50% healthy. With monthly tuning cadence: above 80%. The difference is someone checking whether rules are firing, data sources are ingesting, and thresholds still match the environment.

How to start

You don't need a detection engineering program to calculate these numbers. You need the queries above and 15 minutes.

If your rules don't have ATT&CK technique tags, that's your first finding: the detection library can't be measured because nobody tagged the rules. The fix takes 2–3 hours and immediately makes coverage calculable.

Once you have the four numbers, you have the baseline — the "before" that makes every subsequent improvement demonstrable. A program that takes coverage from 10% to 50% over six months can prove it. Without the baseline, the improvement is invisible — and invisible improvements don't get funded.

The discipline, not the tool

Detection engineering is a discipline, not a product. The ecosystem is open and collaborative: Sigma for vendor-agnostic rules, ATT&CK Navigator for coverage visualization, Atomic Red Team for detection testing, Kyle Bailey's Detection Engineering Maturity Matrix for program assessment, and the awesome-detection-engineering community list for the broader ecosystem.

The organizations that build effective detection programs invest in the engineering function: a person or team whose job is to model threats, build rules, test them, deploy them, tune them, measure the result, and communicate the value. The tool stack matters — Sentinel, Splunk, Elastic, CrowdStrike, whatever your environment runs — but the discipline is the same everywhere.

If you can't answer "is your detection program effective?" with four numbers and a trend line, the problem isn't your tools. It's the absence of the engineering function that measures, maintains, and improves what you've built.

Next week: KQL queries for Entra ID sign-in log analysis — what ResultType != 0 actually tells you, and the queries SOC analysts paste into Google.

Is Your Detection Program Effective? And How Would You Know?

What an effective detection program actually looks like

The skills that make detection programs work

Threat modeling for detection

Signal identification

Engineering discipline

Adversarial thinking

Communication

The four numbers — with benchmarks

Coverage: what proportion of relevant attacks can you detect?

Speed: how fast do you detect after execution?

Accuracy: can analysts trust the alert queue?

Health: how many of your rules are actually working?

How to start

The discipline, not the tool

Related Articles

Credential Access Detection Beyond LSASS — The Five Techniques Your Rules Are Missing

Five KQL Threat Hunts Every M365 SOC Should Run This Month

The M365 Detections Microsoft Doesn't Give You