In this section

DE0.9 The Detection Engineering Discipline

8-10 hours · Module 0 · Free
What you already know

Section 1 introduced the six-stage lifecycle. This section goes deeper into each stage — the thinking behind it, the artifacts it produces, and how the stages connect into a continuous improvement cycle. You'll also learn the adversarial mindset that separates effective detection engineers from competent rule writers.

The six stages in practice

Scenario

You write a detection rule on Monday. The KQL looks right. You deploy it directly to production. By Wednesday, the SOC has auto-closed 40 false positive alerts from your rule. By Friday, the SOC lead asks you to disable it. Your rule detected a real attack on Thursday — but the analyst who received it had already learned to ignore alerts from that rule name. Which stage of the lifecycle did you skip, and what did it cost?

Section 1 listed the stages: hypothesize, design, build, test, deploy, tune. Here you'll see what each stage actually produces and why skipping any stage degrades the program.

Estimated time: 35 minutes.

THE DETECTION ENGINEERING LIFECYCLE — SIX STAGES HYPOTHESIZE Threat → signal → data source DESIGN Specification + FP analysis BUILD KQL + entity map + rule config TEST Attack data + noise validation DEPLOY Report-only → production TUNE Monthly FP review cadence Tuning feeds back to hypothesizing — the cycle is continuous DE2 DE1 DE3DE8 DE9 DE10

Figure DE0.9 — The six-stage detection engineering lifecycle. Each stage produces a specific artifact. Tuning feeds back to hypothesizing — the cycle is continuous. The bottom bar shows which course module teaches each stage.

Hypothesize — start with a threat, not a query

Every detection rule starts with a hypothesis. The hypothesis connects a specific threat to a specific detection signal in a specific data source. "If an attacker uses AiTM phishing to capture a session token, SigninLogs will show a non-interactive token refresh with a different DeviceDetail than the interactive sign-in that created the session, within 30 minutes."

The hypothesis comes from somewhere concrete — a threat advisory, an ATT&CK technique, an incident post-mortem, a hunt finding, or a gap in the coverage map. It does not come from browsing the Sentinel content hub for interesting templates.

A good hypothesis has three properties. It's specific enough to implement in KQL — you can translate the words directly into operators and filters.

It's specific enough to test — you can determine whether it fires on attack data and whether it fires on legitimate noise. And it's specific enough to be wrong — if testing shows the signal doesn't exist or doesn't distinguish attack from legitimate, the hypothesis fails and you learn something for the next attempt.

"Detect credential attacks" is not a hypothesis. "Detect more than 15 failed authentication attempts from a single IP against more than 5 distinct user accounts within 10 minutes in SigninLogs, where the IP is not in the corporate or VPN range" is a hypothesis. The first is a wish. The second is an engineering specification.

Design — document before you build

The rule specification captures everything about the rule that isn't in the KQL itself. The hypothesis. The ATT&CK technique mapping. The data source and the fields queried.

The detection logic in plain language. The entity mapping rationale — which entities to extract and why they matter for investigation. The severity rationale — why this rule is high severity instead of medium, grounded in business impact. The expected false positive sources — what legitimate activity will match, identified before deployment rather than discovered after. The analyst response procedure — what the SOC analyst should do when this rule fires, step by step.

The specification serves three purposes. First, it forces complete thinking before implementation — listing expected false positive sources during design is cheaper than discovering them through 50 false alerts in the first week.

Second, it makes the rule reviewable — another engineer can evaluate the logic without reading the KQL. Third, it makes the rule maintainable — when the original engineer leaves, the specification tells their successor everything they need to tune, troubleshoot, or retire the rule.

Here is the rule specification template — the engineering document every rule in this course produces:

JSON
{
  "rule_id": "DE3-003",
  "technique": "T1110.003",
  "technique_name": "Brute Force: Password Spraying",
  "hypothesis": "More than 10 distinct users failing auth from a single IP
    within 30 minutes indicates credential spray, not legitimate lockout",
  "data_source": "SigninLogs",
  "fields": ["IPAddress", "UserPrincipalName", "ResultType", "TimeGenerated"],
  "severity": "High",
  "severity_rationale": "Credential spray is the entry point for CHAIN-MESH.
    Successful spray leads to lateral movement within 2-4 hours.",
  "entity_mapping": {
    "IP": "IPAddress",
    "Account": "UserPrincipalName"
  },
  "expected_fp_sources": [
    "Shared VPN exit IPs — 15+ users authenticate from same IP legitimately",
    "NAT gateways at branch offices — multiple users share one public IP"
  ],
  "tuning_plan": "Exclude IPs in named location 'Corporate VPN' and
    'Branch Office NAT'. Monitor FP rate weekly for first month.",
  "response_procedure": "1. Identify successful auth from spray IP.
    2. Check if compromised account has MFA. 3. Revoke sessions.
    4. Reset password. 5. Check for post-compromise activity."
}

Every field matters. The hypothesis is testable. The expected FP sources are identified before the rule reaches production. The tuning plan is pre-planned, not reactive. The response procedure tells the SOC analyst exactly what to do — they don't have to figure it out while the attacker is active. Module 1 teaches the full specification process. Every rule in DE3DE8 starts with this document.

Build — KQL as implementation

Write the query. Map entities. Configure the scheduled rule: frequency (how often it runs), lookback window (how far back it searches), trigger threshold (how many results create an alert), alert grouping (how multiple alerts are combined into incidents), severity, and MITRE technique assignment. Module 1 teaches rule architecture in detail. Modules 3-8 build 71 rules.

The build stage is where platform-specific skill matters. You need to know KQL's join semantics, time-series functions, dynamic column handling, and performance considerations.

You need to know Sentinel's entity mapping schema, alert grouping logic, and rule scheduling constraints. These are learnable skills with concrete specifications — they're taught in Module 1 and practiced in every subsequent module.

Test — verify it works

Two questions, two test methods. Does the rule fire on the attack it claims to detect? Import the lab pack's attack data and verify the rule produces an alert with the correct entities and severity. Does the rule fire on legitimate activity that resembles the attack? Run the rule against a 30-day window of production data and examine every result.

A rule that passes the first test and fails the second needs tuning before deployment. A rule that fails the first test needs redesign — the hypothesis may be wrong, the KQL may not correctly express the hypothesis, or the data source may not contain the expected signal. A rule that passes both tests is ready for report-only deployment.

Testing is where most ad-hoc rule writing fails. Rules are deployed directly to production without testing, and the first time anyone evaluates the rule is when it fires on something — which may be an attack, may be noise, and may be nothing if the data source is empty.

The detection engineering lifecycle requires testing before deployment because every false positive that reaches the SOC erodes trust. A rule that produces ten false positives in its first day teaches the SOC to ignore that rule — a lesson that's hard to unlearn even after the rule is tuned.

The lab pack provides structured testing data for both test types. Evidence data contains attack events for every technique the course covers — import the relevant events and verify the rule fires.

Background noise data contains 14 days of simulated legitimate activity — run the rule against it and verify it doesn't fire. Together, the two datasets validate both precision (fires on attacks) and specificity (doesn't fire on noise). Module 9 teaches the testing methodology in detail, including how to create your own test data for custom rules.

Deploy — through a pipeline, not the portal

In a mature detection engineering practice, rules deploy through a detection-as-code pipeline — Git repository, pull request, code review, automated testing, CI/CD deployment to Sentinel. The pipeline provides version history (what changed when and by whom), peer review (another engineer evaluates the logic before it reaches production), rollback capability (a broken rule can be reverted in one command), and automated validation (the CI pipeline runs the rule against test data before deployment, catching errors before they affect the SOC).

In organizations building their first detection engineering program, the pipeline may not exist yet — Module 10 teaches you to build it from scratch. The interim process is report-only deployment: configure the rule in Sentinel with all settings correct but with the response action set to log-only.

The rule runs against live data, generates alerts visible in the SecurityAlert table, but doesn't create incidents in the SOC queue. The SOC doesn't see the rule's output during report-only mode, so false positives during the validation period don't affect operational trust.

After 7-14 days of report-only validation, review the alerts the rule produced. If the precision is acceptable (primarily true positives, few false positives), promote the rule to full incident creation — it now generates incidents that appear in the SOC queue.

If the precision is poor, tune the rule in report-only mode until it's acceptable. This staged deployment process — report-only validation before production — is the minimum deployment discipline. The detection-as-code pipeline adds version control, peer review, and automation on top of it.

Tune — monthly, evidence-based, documented

Rules in production encounter real-world data. Some alerts are true positives. Some are false positives. Some are benign positives — legitimate but suspicious activity that's authorized. Tuning reviews each alert from the previous month, classifies it by root cause, and applies the targeted fix.

The monthly cadence matters. Weekly tuning is too frequent — you don't accumulate enough data to identify patterns. Quarterly is too infrequent — noisy rules erode SOC trust for three months before anyone addresses them. Monthly provides enough data to classify false positive sources reliably and enough frequency to keep the FP rate under control.

Tuning feeds back into hypothesizing. A false positive reveals something about the environment that the original hypothesis didn't account for. A rule that produces unexpected benign positives reveals authorized activity that future hypotheses should exclude from the start. The cycle from tune back to hypothesize is what makes the detection program a learning system.

The adversarial mindset

One habit separates effective detection engineers from competent rule writers: after building every rule, they ask "how would I bypass this?"

the inbox rule template checks for forwarding rules. The CHAIN-HARVEST attacker used a folder-move rule instead. A detection engineer builds the folder-move detection — and immediately asks: "If I were the attacker and knew this rule existed, would I use a transport rule instead? A Power Automate flow? A client-side Outlook rule that doesn't appear in OfficeActivity?"

This adversarial thinking doesn't produce perfect rules — no rule is evasion-proof. It produces resilient rules that cover common variants and degrade gracefully against novel evasion. It also builds a mental model of adversary behavior that improves every subsequent rule.

The adversarial mindset extends beyond individual rules. When the detection engineer reviews their coverage map, they think like an attacker: "If I needed to move laterally in this environment and I could see the detection rules, which lateral movement technique would I use? The one that's covered or the one that isn't?" The answer identifies the next rule to build.

Consider the credential dumping detection from CHAIN-MESH. The detection engineer builds a rule for T1003.001 — processes accessing LSASS memory with PROCESS_VM_READ rights. The adversarial question: "How would I dump credentials without accessing LSASS directly?" Answers include: T1003.003 (NTDS.dit extraction from Active Directory), T1003.004 (LSA Secrets from the registry), T1558.003 (Kerberoasting to extract service account password hashes), and T1552.006 (Group Policy Preferences containing cached credentials).

Each answer is a new hypothesis. Each hypothesis is a new rule. The adversarial mindset doesn't produce one rule — it produces a family of rules that covers the technique category, not just the specific variant the first hypothesis addressed.

You'll develop this mindset throughout the course. DE3-DE8 include explicit evasion analysis for every technique — what the attacker does when the first detection works. DE9 teaches you to test your own rules for bypass potential. By the capstone, adversarial evaluation of your own detections becomes automatic.

Measurement as the discipline's accountability

The engineering practices above — version control, code review, automated testing — make the detection program maintainable. Measurement makes it accountable. Without measurement, the program has no evidence that it's working. With measurement, the program produces its own proof.

The four metrics from Section 5 (coverage, MTTD, FP rate, rule health) are calculated monthly. The trend lines tell the story. Coverage increasing month over month means new rules are closing gaps.

FP rate decreasing means tuning is working. MTTD improving means high-impact rules are being moved to NRT scheduling. Rule health staying above 80% means the monthly maintenance cadence is keeping rules functional.

When a metric moves in the wrong direction, it triggers investigation. Coverage dropped this month — did a data source disconnect, making previously covered techniques undetectable?

FP rate spiked — did a new application deployment introduce legitimate behavior that matches an existing rule's pattern? Rule health declined — did a platform update change a table schema that multiple rules depend on? Each metric movement has a root cause, and identifying that root cause is part of the monthly tuning cadence.

The discipline of measurement distinguishes detection engineering from every other approach to SIEM rule management. Template deployment has no measurement — rules are enabled and forgotten. Ad-hoc rule writing has no measurement — rules are created in response to incidents and never evaluated.

Managed SOC partnerships typically measure alert volume and response time, not detection coverage or rule health. Detection engineering measures the things that determine whether the program actually detects the attacks that matter. That measurement is what makes the board report credible and the investment case defensible.

Engineering practices from software engineering

Detection engineering borrows three practices from software engineering that distinguish it from ad-hoc rule writing.

Version control

Every rule is stored in a Git repository as code — KQL file, specification document, test data. Changes are tracked with commit history. A rule that broke after last week's update can be reverted to the previous version in one command.

A rule that was retired can be examined in the commit history to understand why it was retired, what it detected, and whether it should be reinstated for a new threat. The commit log becomes the institutional memory of the detection program — it records every decision, every change, and every rationale in a format that survives personnel changes.

Code review

Before a rule deploys, another engineer reviews the specification and the KQL. They evaluate the hypothesis for gaps (does it account for the common evasion variants?), the detection logic for correctness (does the KQL actually express the hypothesis?), the entity mapping for completeness (will the SOC have the information they need to investigate?), and the false positive analysis for realism (are the identified FP sources based on actual environment knowledge?).

Code review catches errors that the author's familiarity with their own work makes invisible — the false positive source they didn't think of, the entity field they forgot to map, the time window that's too narrow for the scheduled frequency.

In a solo detection engineering practice (one person), code review can be replaced by structured self-review against a checklist — the rule specification template from Module 1 includes the review criteria. The checklist is less effective than a second pair of eyes but more effective than no review.

Automated testing

The CI/CD pipeline runs the rule against test data before deployment. The attack test data verifies the rule fires on the technique it claims to detect. The noise test data verifies the rule doesn't fire on legitimate activity.

If the rule doesn't fire on the attack data or fires on the noise data, the pipeline fails and the rule doesn't deploy. Automated testing prevents broken rules from reaching production — a problem that manual deployment processes can't prevent consistently because human attention is finite and the number of rules grows faster than the time available to test them manually.

These practices are taught in Module 10 (detection-as-code) and used throughout the capstone.

Detection Engineering Principle

Every stage of the lifecycle exists because skipping it has a specific cost. Skipping the hypothesis produces a rule without a threat model. Skipping the specification produces a rule nobody else can maintain. Skipping testing produces false positives that erode SOC trust. Skipping tuning lets that erosion compound. The lifecycle is not bureaucracy — it is the engineering discipline that makes detection reliable.

Next

Section 10 covers the tools and capabilities you'll use throughout the course — Microsoft Sentinel, Defender XDR Advanced Hunting, KQL, the lab pack, and the detection engineering toolkit.

Unlock the Full Course See Full Course Agenda