TH0.4 Why Detection Engineering Is Not Enough

3-4 hours · Module 0 · Free
Operational Objective
Detection engineering is valuable, necessary, and structurally insufficient. It catches what you anticipated. It misses what you did not. This subsection explains the five structural limitations that prevent even the best detection engineering program from providing complete threat visibility — and why hunting is the operational complement, not the competitor, to detection.
Deliverable: The ability to articulate — to leadership, to peers, and to your own planning — exactly why detection rules alone cannot close the gap, regardless of how many you build or how well you tune them.
⏱ Estimated completion: 25 minutes

The limits are structural, not staffing

If detection engineering’s shortcomings were a resource problem — not enough engineers, not enough time, not enough budget — the answer would be simple: hire more, spend more, build more. But the limitations are architectural. They are properties of how detection rules work, and they persist regardless of how much investment you make.

This is not an argument against detection engineering. It is an argument for understanding what it can do, what it cannot do, and what fills the gap.

Limitation 1: Rules encode anticipation

A detection rule is a statement of anticipation: “If an attacker does X, then the data will contain Y, and this query will find it.” The rule works when the attacker does exactly what the rule author anticipated. The rule fails silently when the attacker does something the author did not anticipate — even if that “something” is a minor variation of the anticipated technique.

Consider a Sentinel analytics rule designed to detect inbox rule creation for email forwarding. The rule monitors for New-InboxRule operations with ForwardTo or RedirectTo actions. A BEC operator who knows this detection exists — because it is published in Microsoft’s community detection rules on GitHub — can use MoveToFolder instead of ForwardTo, redirecting financial emails to the RSS Feeds folder where the legitimate user will never look. Same objective (hiding emails from the user), same attack category (inbox rule abuse), same ATT&CK technique (T1564.008). But the rule does not fire because the action type is different from what the author anticipated.

This is not a failure of the detection engineer. The rule is correct for the pattern it specifies. The limitation is structural: the rule can only catch what it was designed to catch. Every variation it does not specify is a gap.

Attackers know this. Offensive tool developers specifically study published detection rules and engineer their tools to avoid the exact patterns those rules match. EvilGinx developers modified their AiTM proxy to avoid triggering Microsoft’s anomalousToken risk detection. PowerShell obfuscation techniques evolve specifically to bypass AMSI detection patterns. Cobalt Strike operators configure their malleable C2 profiles to match the network characteristics of legitimate cloud services. The detection rule is published; the evasion is engineered.

Limitation 2: Rules require known telemetry

A detection rule can only query data that exists in your SIEM. If the telemetry is not ingested, the rule cannot fire — no matter how well-written it is.

This sounds obvious, but the practical implications are significant. Most M365 environments ingest a subset of available telemetry into Sentinel. The decisions about what to ingest are driven by cost (every GB/day costs money), compliance requirements (certain logs must be retained), and perceived value (the security team prioritizes the logs they already know how to use).

The result is blind spots. Common ones in M365 environments:

AADServicePrincipalSignInLogs. Service principal authentication — applications signing in with their own credentials. If you do not ingest this table, you cannot detect a compromised application credential being used by an attacker. The entire OAuth persistence technique (TH6) is invisible.

AADProvisioningLogs. User provisioning events from automated identity lifecycle workflows. If you do not ingest this table, you cannot detect an attacker creating backdoor accounts through a compromised provisioning pipeline.

CloudAppEvents with Defender for Cloud Apps enabled. Detailed SaaS application activity — file access, sharing events, OAuth consent. Without this data source, the shadow IT hunt (TH11) and large portions of the email manipulation hunt (TH5) have no data to query.

MicrosoftGraphActivityLogs. Graph API call activity — what applications and users are doing through the API. This was introduced in 2024 and many organizations have not enabled it. Without it, you cannot detect an attacker using the Graph API to access mailbox data, enumerate the directory, or exfiltrate files — even though the attacker’s activity is recorded by Microsoft.

Detection engineering cannot overcome the absence of data. If the logs are not there, the rules are blind. Hunting — when combined with a data source review — surfaces these blind spots. The first time you try to run a hunt and discover that the table you need is not ingested, you have identified a logging gap that has been undermining your detection capability without anyone noticing.

Limitation 3: False positive economics

Every detection rule operates at a point on the sensitivity-specificity tradeoff. Make the rule more sensitive (catch more true positives) and it also catches more false positives. Make it more specific (reduce false positives) and it misses more true positives. There is no rule configuration that is simultaneously maximally sensitive and maximally specific — this is a mathematical certainty, not an engineering constraint.

In practice, detection engineers tune toward specificity because false positives have an immediate operational cost: every false positive consumes analyst time, erodes trust in the alert system, and contributes to alert fatigue. A rule that fires 50 times per day with a 2% true positive rate generates 49 wasted investigations and 1 real detection. The SOC lead will ask for the rule to be tuned, and the detection engineer will raise the threshold, add exclusions, or narrow the scope until the false positive rate drops to an acceptable level.

That tuning is operationally correct. But it creates gaps. Every exclusion is a potential hiding place. Every raised threshold is a volume below which the attacker can operate without detection. The attacker who exfiltrates 95 files from SharePoint when the rule threshold is 100 passes undetected — not because the rule is broken, but because the threshold was set to avoid the legitimate users who download 80 files before a business trip.

Hunting does not operate on the sensitivity-specificity tradeoff because hunting does not fire alerts. A hunting query that returns 500 results does not generate 500 incidents. It generates a dataset that a human analyst reviews, enriches with context, and makes judgments about. The analyst can tolerate a noisy dataset because they are investigating, not triaging. They can examine the 500 results, identify the 3 that are suspicious based on contextual factors (user role, time of day, recent sign-in anomaly), and investigate those 3 in depth. The 497 legitimate results are not wasted alerts — they are context that helps the analyst understand what normal looks like in this environment.

This is why hunting can find threats that detection rules cannot: hunting can afford to operate at sensitivity levels that would be operationally destructive as automated alerts.

Limitation 4: Rules are point-in-time

A detection rule is authored at a specific point in time, against a specific understanding of the attack technique, using specific telemetry that is available at that moment. The rule does not adapt as the technique evolves, as the environment changes, or as the attacker adjusts their approach.

TH0.1 covered detection decay — the process by which rule effectiveness degrades as techniques evolve, environments drift, and rules atrophy. The underlying issue is that detection rules are static. They do what they were built to do, indefinitely, without self-evaluation.

There is no mechanism in Sentinel or Defender XDR for a detection rule to evaluate its own effectiveness. A rule that has not fired in 12 months cannot tell you whether it is a well-targeted detection for a rare event or a broken rule that would not fire even if the attack occurred. A rule that fires frequently cannot tell you whether it is catching real threats or generating noise from a legitimate workflow that was deployed after the rule was written.

Hunting provides the evaluation mechanism. When you hunt for a technique that should be covered by an existing detection rule and find evidence the rule missed, you have simultaneously identified a compromise and a detection failure. When you hunt for a technique and find no evidence, you have validated that either the technique is not present or your hunting methodology needs refinement — both are useful findings. The hunt is the test that the detection rule cannot perform on itself.

The following query identifies your detection rules that have not fired in 90 days — the candidates for validation through hunting:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Detection rules that may have decayed  candidates for hunt validation
let activeRules = SecurityAlert
| where TimeGenerated > ago(90d)
| where ProviderName == "ASI Scheduled Alerts"
| distinct AlertName;
// Rules that HAVE fired in 90 days are confirmed active
// Rules in your analytics workspace that do NOT appear in this list
// have not generated an alert in 90 days
// For each missing rule: is it working, or is it broken?
// The only way to answer that question is to hunt for the technique
// the rule was designed to detect  and see if evidence exists
// that the rule should have caught
activeRules
| summarize RulesThatFired = count()
// Compare this count to your total deployed rule count
// The difference = rules with no confirmed efficacy in 90 days

Limitation 5: Rules do not generate context

A detection rule fires an alert. The alert says “this event matched this pattern.” It does not say why the event occurred, whether the event is part of a larger attack chain, or what the analyst should investigate next.

Context is what transforms an alert from a notification into an investigation. And context requires understanding the broader activity around the alerted event: what else was this user doing? What else happened on this device? What other accounts accessed this resource? How does this event relate to the sign-in anomaly from yesterday?

Detection rules generate alerts. Hunting generates understanding.

When you run a hunting campaign across 30 days of authentication data, you do not just find the anomalous sign-in. You understand the baseline — what normal authentication looks like for each user in your environment. You understand the distribution — how many users sign in from new devices in a typical week. You understand the false positive landscape — which users legitimately travel, which use VPNs that change IP addresses, which have roles that involve accessing unusual resources.

That understanding makes your detection engineering better because you can write rules with informed thresholds and meaningful exclusions rather than guessing at what “normal” looks like. It makes your incident response better because you can quickly contextualize an alert against a known baseline. And it makes your anomaly detection better because the baselines you construct during hunting campaigns are the reference points that anomaly detection uses to flag deviations.

Detection rules do not build this understanding. They match patterns. Hunting builds the understanding that makes pattern matching effective.

FIVE STRUCTURAL LIMITATIONS OF DETECTION ENGINEERINGANTICIPATIONRules catch whatthe author predicted.Variants pass through.Hunting: tests hypothesesTELEMETRYRules need data.Missing tables =invisible techniques.Hunting: surfaces data gapsFALSE POSITIVESRules tuned to reducenoise create thresholdsattackers exploit.Hunting: tolerates noiseSTATICRules do not adapt.Cannot evaluate theirown effectiveness.Hunting: validates rulesNO CONTEXTRules fire alerts.They do not explainwhat is happening.Hunting: builds understandingHUNTING ADDRESSES ALL FIVENot by replacing detection — by filling the gaps detection createsThe limitations are architectural. They persist regardless of budget, tooling, or team size.The answer is not more rules. It is more layers.

Figure TH0.4 — Five structural limitations of detection engineering and how hunting addresses each. Detection engineering and hunting are complementary — not competing — capabilities.

The complement, not the competitor

This subsection makes the case against detection engineering as a sole strategy. It does not make the case against detection engineering. The distinction matters because the organizational politics of security operations sometimes frame hunting and detection engineering as competing for the same budget. They are not.

Detection engineering handles the known-known layer efficiently, automatically, and at scale. It operates 24/7 without analyst fatigue. It catches the majority of threats by volume. Hunting cannot replace it.

Hunting handles the known-unknown layer effectively, contextually, and with human judgment. It finds what rules miss. It validates what rules claim to detect. It produces the rules that detection engineering deploys. Detection engineering cannot replace it.

The argument is not “detection engineering does not work.” The argument is “detection engineering works, and it is not enough.” The gap between what rules catch and what attackers do is structural. Hunting fills the gap.

Try it yourself

Exercise: Identify your specific detection limitations

Pick one detection rule from your Sentinel analytics workspace — preferably a high-severity rule that has been deployed for at least 6 months.

Anticipation: List two variations of the technique this rule would miss. What would an attacker change to evade it?

Telemetry: What data table does the rule query? Is that table ingested with full fidelity, or are there filtering or sampling configurations that might cause events to be dropped?

False positives: What exclusions are configured? Each exclusion is a potential hiding place for an attacker who matches the excluded pattern.

Static: When was the rule last updated? Has the technique it detects evolved since the rule was written?

Context: When this rule fires, does the alert contain enough information for an analyst to begin investigating — or do they need to run additional queries to understand what happened?

This exercise reveals the specific gaps in one rule. Multiply those gaps across every rule in your environment. That is the hunting surface.

⚠ Compliance Myth: "If we had more detection engineers, we would not need threat hunting"

The myth: The detection gap is a staffing problem. Hire more detection engineers, build more rules, and the gap closes.

The reality: More detection engineers produce more rules, which increases coverage of the known-known layer. That is valuable. But it does not address the five structural limitations. More rules still encode anticipation (limitation 1), still require ingested telemetry (limitation 2), still trade sensitivity for specificity (limitation 3), still degrade over time (limitation 4), and still generate alerts without context (limitation 5). A detection engineering team of 50 with 1,000 rules still has a known-unknown layer that no rule covers. The limitations are properties of the method, not the team. Hunting is the method that addresses what detection engineering structurally cannot.

Extend this analysis

The five structural limitations apply to every detection technology, not only SIEM analytics rules. EDR behavioral rules face the same anticipation constraint — they catch the behaviors they were programmed to identify. Network detection and response (NDR) rules face the same telemetry constraint — they can only inspect traffic they can see, and encrypted traffic to legitimate cloud services is opaque. Email security gateways face the same false positive constraint — they must balance blocking malicious email against delivering legitimate email. In every case, the automated detection layer has structural limits that only human-driven investigation can complement.


References Used in This Subsection

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus