DE1.10 Automation Rules and Response Integration

2-3 hours · Module 1 · Free

Operational Objective

The Response Gap: A detection rule that fires an alert and waits for human triage has a response latency equal to the analyst's triage time — typically 15-30 minutes during business hours, potentially hours during off-hours or overnight. For critical detections (ransomware pre-encryption, confirmed credential compromise), that response latency may exceed the attacker's time to complete their objective. Automation rules and playbooks bridge this gap by executing predefined actions within seconds of alert creation — without waiting for human triage. This subsection teaches the automation architecture, the decision framework for what to automate, and the NE automation strategy.

Deliverable: An automation strategy that accelerates response for high-confidence detections while avoiding the risks of automating low-confidence rules.

⏱ Estimated completion: 25 minutes

The automation architecture

Sentinel automation operates in two layers. Automation rules are lightweight triggers: when an incident is created (or updated) that matches specified conditions (severity, rule name, entity values), execute one or more actions. Actions include: assign to an analyst, change severity, add tags, change status, and trigger a playbook (Logic App).

Playbooks (Logic Apps) are full workflow engines: API calls, conditional logic, email notifications, Teams messages, Entra ID actions (disable account, revoke sessions), Defender actions (isolate device), ServiceNow ticket creation, and any other API-accessible action. Playbooks execute the actual containment and notification actions that automation rules trigger.

The separation is intentional. Automation rules define WHEN to act (conditions). Playbooks define WHAT to do (actions). This two-layer architecture lets you change the conditions without rebuilding the workflow, and reuse the same playbook across multiple automation rules.

Figure DE1.10 — Two-layer automation architecture. Automation rules trigger on incident conditions. Playbooks execute the response workflow. NE uses a tiered approach: auto-contain for high-confidence critical rules, auto-enrich for everything else.

The automation decision framework

The fundamental question: should the detection rule trigger automated containment (disable account, isolate device, revoke sessions) or automated enrichment (add context, assign analyst, send notification)?

Automate containment when ALL three conditions are met:

Detection confidence exceeds 90% TP rate. If the rule fires, it is almost certainly a true positive. The ransomware pre-encryption NRT rule meets this — non-SYSTEM processes deleting shadow copies is malicious in virtually all cases.
The response action is reversible. Revoking sessions (the user re-authenticates with MFA), isolating a device (IT reconnects after investigation), or disabling an account (re-enable if FP). Do NOT automate irreversible actions: data deletion, password resets, or permanent account deletion.
Delayed response causes disproportionate damage. Ransomware encryption proceeds at machine speed — every minute of delay means more encrypted files. Credential dumping means every minute the token is valid, the attacker can escalate. If the detection fires at Medium severity and the SOC triages within 2 hours, the 2-hour delay does not meaningfully increase damage.

Automate enrichment for everything else. When the detection fires, the automation rule: extracts the Account entity from the alert, queries Entra ID for the user’s department, manager, and employment status, checks device compliance state, queries the last 24 hours of sign-in risk for the user, adds all of this context as comments on the incident, tags the incident with the relevant attack chain code (CHAIN-HARVEST, CHAIN-MESH), and assigns to the on-call analyst. The analyst receives a fully contextualized incident — not a raw alert that requires 10 minutes of manual enrichment.

NE automation strategy

For Northgate Engineering, the automation strategy is tiered:

Tier 1 — Auto-contain (3-5 rules): Ransomware pre-encryption → playbook isolates the device via Defender API and sends a Teams message to the SOC channel. LSASS credential dump → playbook revokes all sessions for the affected user via Graph API and disables the account pending investigation. Active C2 beacon → playbook isolates the device.

Tier 2 — Auto-enrich + assign (15-25 rules): AiTM token theft, suspicious inbox rule, PIM activation anomaly, lateral RDP to new server, bulk file access. The playbook enriches the incident with user context (department, manager, employment status, recent risk signals) and assigns to Tom or Priya (NE’s L1 SOC analysts) based on the on-call rotation.

Tier 3 — Assign only (remaining rules): Low and Medium severity rules. The automation rule assigns the incident to the SOC queue and tags it with the relevant category. No enrichment — the analyst reviews during batch triage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// KQL to verify automation rule coverage
SecurityIncident
| where TimeGenerated > ago(30d)
| extend HasAutomation = isnotempty(Labels)  // Tags added by automation
| summarize
    TotalIncidents = count(),
    AutomatedIncidents = countif(HasAutomation),
    ManualIncidents = countif(not(HasAutomation))
| extend AutomationRate = round(AutomatedIncidents * 100.0 / TotalIncidents, 1)
// Target: >80% of incidents processed by at least Tier 3 automation

⚠ Compliance Myth: "Automated response means we can reduce SOC headcount"

The myth: If automation handles containment for critical detections and enrichment for high-severity detections, the SOC needs fewer analysts.

The reality: Automation reduces RESPONSE LATENCY, not analyst WORKLOAD. The analyst still investigates every incident (what happened, what was the scope, what needs remediation), reviews automated actions (was the containment appropriate, did it affect legitimate users), manages exceptions (the automation blocked the CEO during a board presentation — now the CEO is locked out and calling IT), and handles detections that do not have automation (Medium/Low severity, novel patterns). The analyst’s role shifts from “triage and respond” (reactive, time-pressured) to “investigate and remediate” (analytical, thoroughness-focused). This requires the same headcount with different skills — not fewer people.

Additionally, automated containment actions generate their own incidents: “device isolated by playbook” requires someone to verify the isolation was appropriate and eventually reconnect the device. Each automated action creates follow-up work.

Try it yourself

Exercise: Design your automation tiers

List your current analytics rules by severity. For each Critical and High severity rule, assess: (1) Is the TP rate above 90%? (2) Is the desired response reversible? (3) Does delayed response cause disproportionate damage? Rules meeting all three conditions are Tier 1 (auto-contain) candidates. All other High severity rules are Tier 2 (auto-enrich). Medium and Low severity rules are Tier 3 (assign only).

If you do not yet have TP rate data (you have not tracked classifications), start with Tier 2 and Tier 3 only. Add Tier 1 automation after 30 days of tracking confirms which rules exceed 90% TP rate.

Check your understanding

A detection rule for "suspicious MFA registration from new device" has a 65% TP rate after 30 days in production. Your SOC lead proposes automating account disabling when this rule fires. Should you approve?

Answer: No. A 65% TP rate means 35% of the time the rule fires, it will disable a legitimate user's account — a user who is simply registering MFA on a new phone or after a device replacement. Over 30 days, if the rule fires 50 times, 17-18 legitimate users are locked out of their accounts. The helpdesk tickets, user frustration, and executive escalations will erode trust in the security team faster than the automation improves response time. At 65% TP, automate Tier 2 (enrich + assign): add user context, recent sign-in history, and device compliance to the incident, then assign to the analyst for manual triage. Automate containment only after tuning raises the TP rate above 90%.

Troubleshooting: Automation issues

“Automation rule is configured but the playbook does not run.” Check: (1) the playbook (Logic App) is in “Enabled” state, (2) the automation rule’s condition matches the incident properties, (3) the Sentinel managed identity has permission to run the Logic App, and (4) the Logic App trigger type matches the automation rule action type (incident trigger vs alert trigger).

“Playbook runs but the containment action fails.” Check the Logic App run history for the specific error. Common causes: insufficient Graph API permissions for the managed identity (needs User.ReadWrite.All for account disable, Directory.ReadWrite.All for session revocation), Defender API permission missing (Machine.Isolate for device isolation), or the entity mapping produced an unexpected value (empty AccountObjectId because the query output used UPN instead of ObjectId).

“Automated containment locked out a legitimate user (false positive).” This is why Tier 1 requires >90% TP rate and reversible actions. Re-enable the account immediately. Add the false positive pattern to the detection rule’s exclusion list. If this is the second FP from the same rule, demote it from Tier 1 (auto-contain) to Tier 2 (auto-enrich) until the TP rate is restored through tuning.

📋 Operational Artifact — Automation Strategy

Tier 1 — Auto-contain rules (>90% TP, reversible, time-critical):
Rule: ___ → Action: ___ | Playbook: ___
Rule: ___ → Action: ___ | Playbook: ___
Tier 2 — Auto-enrich rules (High severity, context acceleration):
Enrichment playbook: ___ (adds: department, compliance, risk history)
Assignment logic: ___ (on-call analyst, round-robin, skill-based)
Tier 3 — Assign only (Medium/Low severity):
Assignment: SOC queue | Tag: rule category
Automation coverage target: >80% of incidents processed by automation

References used in this subsection

Microsoft Sentinel automation rules documentation
Course cross-references: DE1.5 (severity — determines automation tier), DE1.6 (NRT + automation for Critical), DE1.8 (Defender native response as alternative), DE9 (TP rate tracking prerequisite for Tier 1)

NE operational context

This detection operates within NE’s 18 GB/day Sentinel ingestion environment across 20 connected data sources. The rule’s alert volume, TP rate, and SOC triage burden are calibrated for NE’s 3-person SOC team handling 7-16 incidents per day. The detection engineer (Rachel) reviews this rule’s health during the monthly tuning review (DE9.9) and adjusts thresholds, exclusions, and entity mapping as the environment evolves.

The rule’s position in the overall detection library means it correlates with rules from adjacent kill chain phases — an alert from this rule gains significance when combined with alerts from earlier or later phases targeting the same entity.

Integration with the NE detection library

This rule operates within the 66-rule detection library, contributing to NE’s cumulative ATT&CK coverage. The SOC triages alerts from this rule alongside adjacent kill chain detections — correlation across modules transforms individual alerts into attack chain narratives. Monthly health monitoring (DE9.8) ensures this rule maintains its target TP rate as the environment evolves.

This detection contributes to NE’s systematic coverage across the ATT&CK framework, correlating with adjacent-phase rules to identify multi-stage attacks. The monthly tuning review monitors its operational effectiveness.

You're reading the free modules of Detection Engineering

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus

← DE1.9 Alert Grouping and Incident Creation DE1.11 The Rule Specification Template →