In this section
0.6 NE's Automation Landscape
Scenario
Rachel Okafor asks for a complete inventory of NE's automation capabilities. She wants to know what runs automatically today, what works, what is broken, and what is missing. The SOC team's answer is "Defender does some stuff" and "we have a VirusTotal playbook somewhere." That answer is the problem. You cannot build a 90-day automation program on top of automation you haven't inventoried.
The invisible automation you already have
Every M365 E5 tenant ships with automation running before anyone on the security team configures a single rule. Defender XDR attack disruption is the most consequential. It correlates signals across endpoints, identities, email, and cloud apps to identify high-confidence attack patterns, then automatically contains compromised assets. The three patterns it recognizes (adversary-in-the-middle, business email compromise, and human-operated ransomware) are the three attack types most likely to cause catastrophic damage if containment is delayed by even thirty minutes.
Attack disruption operates in three stages. The correlation engine aggregates signals from multiple Defender products into a single high-confidence incident. The response engine selects the containment action: device containment through Defender for Endpoint, user disablement through Defender for Identity, or IP containment for undiscovered devices. The action executes automatically, typically within three minutes of initial signal correlation. IDC's 2025 XDR assessment confirmed that three-minute containment window as the benchmark for Defender XDR deployments.
NE enabled Defender for Endpoint and Defender for Identity during the initial E5 deployment in 2023. Nobody reviewed whether attack disruption prerequisites were met. The Defender for Endpoint agent version requirement (minimum v10.8470 for the Contain User action) was satisfied on 812 of 865 endpoints. Fifty-three devices, all running an older agent version on the manufacturing floor, cannot be automatically contained. The SOC team doesn't know this because nobody audited the capability.
Figure 0.6a: NE's automation landscape. Most of the running automation is invisible to the SOC team. The configured automation is partially broken. The capabilities the SOC actually needs don't exist.
Auditing Defender XDR: the Action Center
The Defender portal Action Center (security.microsoft.com → Actions & submissions → Action center) is the single pane for every automated action Defender XDR has taken. NE's Action Center shows 847 actions in the last 90 days. Breaking that number down reveals the scale of the visibility gap.
Email actions dominate: 614 quarantined messages (Safe Attachments detonation and ZAP remediation), 189 blocked URLs (Safe Links time-of-click), and 32 file blocks (Defender for Endpoint). Those 835 actions are working correctly and require no SOC intervention. The remaining 12 are account containment actions from attack disruption — Defender for Identity disabled 12 user accounts during suspected AiTM and BEC attacks.
Those 12 containment actions are the audit finding. Three of the disabled accounts belonged to NE's finance team. One was the CFO's executive assistant. Nobody on the SOC team knew the actions occurred. Nobody verified whether the containment was appropriate. Nobody checked whether the affected users were locked out of business-critical applications for hours while the automation silently resolved the threat.
The audit methodology here is straightforward. Open the Action Center, filter to the last 90 days, and sort by action type. Separate the routine actions (email, URL, file) from the containment actions (device isolate, user disable, IP block). Containment actions require human review because they affect user productivity. A quarantined email has near-zero blast radius. A disabled account has significant blast radius, especially for VIP users and service accounts.
Auditing Sentinel automation rules
Sentinel automation rules sit in the Defender portal under Microsoft Sentinel → Configuration → Automation. NE has four. One is valuable. One is redundant. One is harmful. One is fragile.
Rule 1: AiTM severity override. Changes severity from Medium to High for any alert containing "AiTM" in the title. This is the only rule that adds operational value. AiTM detections from Entra ID Protection sometimes fire at Medium severity when the risk score is moderate, but AiTM is never moderate. The technique involves real-time credential relay, which means the attacker has active access to the session. Overriding to High ensures the alert surfaces at the top of the queue. Keep this rule. Document why it exists.
Rule 2: Entra connector routing. Assigns all incidents from the Microsoft Entra ID Protection connector to the SOC queue. This does nothing. Incidents from all connectors route to the SOC queue by default when no other assignment rule exists. The rule creates the illusion of automation without changing any behavior. Delete it. False confidence in non-functional automation is worse than having no automation at all.
Rule 3: Informational auto-close. Closes all incidents with Informational severity after 72 hours. This is actively harmful. Sentinel's correlation engine can elevate an Informational incident to High when a second alert on the same entity arrives. Auto-closing the Informational incident before that correlation occurs breaks the attack chain reconstruction. A multi-stage attack where the initial reconnaissance fires as Informational and the credential access fires as High two days later will appear as two unrelated incidents instead of a single correlated attack. Delete this rule immediately.
Rule 4: Single-analyst assignment. Assigns all High severity incidents to j.morrison. When Morrison is available, this works. When Morrison is on leave, sick, or in a meeting, High severity incidents sit unassigned until someone manually notices the queue. The average time to detect an unassigned High incident at NE is four hours, roughly the length of a morning shift before queue review. Replace this with round-robin assignment or a team distribution logic app.
The dead playbook problem
Six months ago, an analyst built a Logic App playbook that queried VirusTotal's API for file hashes extracted from Defender for Endpoint alerts. The playbook ran on an incident trigger, extracted the SHA256 hash from the alert entities, called the VirusTotal v3 API, and appended the reputation result to the incident comments.
The playbook worked for two weeks. Then VirusTotal changed their free-tier API rate limit from 4 requests per minute to 2 requests per minute. During a surge of file-based alerts from a software deployment, the playbook exceeded the new rate limit. Every subsequent API call returned HTTP 429 (Too Many Requests). The Logic App retry policy, configured with the default exponential backoff, retried each failed call four times, burning through the daily quota in hours.
Nobody noticed for six months because the playbook had no health monitoring. The SentinelHealth table records whether a playbook was launched successfully, but it doesn't record whether the playbook's internal actions succeeded. The Logic App launched. The VirusTotal API call inside it failed. From Sentinel's perspective, the playbook ran. From the analyst's perspective, no enrichment data appeared in the incident comments, but nobody was looking for it because nobody documented what the playbook was supposed to produce.
This is the pattern that kills SOC automation programs. A single playbook fails silently, someone says "we tried automation and it didn't work," and the team reverts to manual processes for the next eighteen months. The failure was never the automation concept. The failure was deploying automation without monitoring, without a runbook, and without an owner. You need all three before the playbook goes live.
Anti-Pattern
Deploying playbooks without lifecycle governance
The dead playbook pattern repeats in every organization that treats automation as a project instead of a capability. An analyst builds a playbook during a quiet week. It works. Nobody documents it. Nobody monitors it. The analyst changes roles. The playbook breaks. Nobody notices. Months later, someone audits the environment and finds three dead playbooks, two redundant automation rules, and zero documentation. The conclusion is always "automation is unreliable" when the real conclusion is "automation without governance is unreliable." Every playbook needs an owner, a runbook, health monitoring, and a scheduled review before it enters production.
The gap inventory
The current-state audit reveals NE's automation covers exactly one tier of the framework from Section 0.3: the platform handles some Tier 3 containment automatically (attack disruption) and some Tier 1 enrichment partially (Safe Attachments, Safe Links). Everything between those extremes is manual.
Zero enrichment playbooks. When an AiTM alert fires, the analyst manually queries sign-in history, IP reputation, device compliance, alert history, and geographic anomaly data. Five queries, two minutes each, repeated for every alert. At 500 alerts per day and an average of 144 alerts reaching human triage (after the 71% that are never touched), that is 24 hours of manual enrichment daily. NE's three-analyst SOC has 24 operational hours per day. Enrichment alone consumes 100% of available capacity, leaving zero time for investigation.
Zero evidence collection playbooks. Volatile evidence (active session tokens, running processes, network connections, mailbox rule state) disappears within minutes to hours of an alert. Azure AD access tokens expire in one hour by default. An analyst who triages an AiTM alert 45 minutes after it fires finds the attacker's session token has already expired. The forensic evidence of what that session accessed during the active compromise is in the audit logs, but the real-time session state is gone.
Zero notification playbooks. The SOC notifies stakeholders by manually composing emails during active incidents. The analyst switches context from the investigation to Outlook, writes a summary, addresses it to the correct distribution list, and switches back. Context switching during incident response adds fifteen minutes per notification and increases the probability of missing a critical finding.
Zero identity or endpoint containment playbooks. All containment is manual: open the Entra admin center, find the user, revoke sessions, remove the compromised MFA method, check for inbox rules, check for mailbox delegates, and document every action. This takes 15 to 25 minutes per incident. During those 15 minutes, the attacker may be adding persistence mechanisms that the containment procedure doesn't address.
The 90-day target architecture
The audit tells you where you are. The target architecture tells you where the course takes you. This is not an aspirational roadmap. It is the specific set of playbooks, automation rules, and governance mechanisms that the remaining modules of this course build, test, and deploy.
Month 1, Tier 1: enrichment and triage acceleration. Six enrichment playbooks covering the data sources every incident requires: IP reputation from threat intelligence feeds, user risk score from Entra ID Protection, device compliance state from Intune, alert correlation history from Sentinel, geographic anomaly analysis from SigninLogs, and mailbox rule audit from Exchange Online. Each playbook fires on incident creation via automation rule, runs in under 30 seconds, and appends structured enrichment data to the incident comments. The analyst opens an investigation-ready incident instead of a raw alert.
Month 2, Tier 2: evidence collection and notification. Three evidence collection playbooks (AiTM identity evidence, endpoint volatile state, email header and transport data) that capture time-sensitive artifacts at alert time, before analyst triage delay allows evidence to expire. A notification pipeline with three channels: Teams adaptive card to the SOC channel for all incidents, email to the CISO for Critical severity, and ServiceNow ticket creation for tracking. After-hours escalation routing based on incident severity and the on-call schedule.
Month 3, Tier 3: semi-automated containment. Identity containment for confirmed AiTM: session revocation plus MFA method reset, gated by the composite confidence threshold from Section 0.4 (97% minimum) and the blast radius assessment from Section 0.5. Endpoint containment for workstations with confirmed malware or ransomware indicators, gated by device role classification (workstation auto-contain, server routes to approval). VIP watchlist integration for both playbooks — VIP entities route to human approval regardless of confidence score, with rollback playbooks for every containment action.
Ongoing: governance layer. Automation health dashboard using SentinelHealth and AzureDiagnostics tables, monitoring playbook success rate, execution latency, and containment action counts. Monthly review: tune confidence thresholds using the 30-day false action rate, retire underperforming playbooks, promote validated enrichments to broader trigger conditions. Every playbook has a runbook, an owner, and a next-review date.
Automation Principle
Audit before you build. Every automation program starts with an inventory of what already runs, what works, what is broken, and what is missing. The inventory exposes the gaps the framework from Sections 0.1 through 0.5 is designed to fill. Without the audit, you are building automation on top of automation you don't understand, and the first failure you can't diagnose will set the program back by months.
Get weekly detection and investigation techniques
KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.
No spam. Unsubscribe anytime. ~2,000 security practitioners.