SA0.5 The Blast Radius Assessment
Figure SA0.5 — Blast radius categories and the pre-automation assessment checklist. Every Tier 3 playbook must assess blast radius before executing containment.
Every containment action breaks something
This is the truth that automation enthusiasts skip: every containment action has side effects. Session revocation forces re-authentication — if the user is presenting to a client via Teams, the call drops. Endpoint isolation cuts network access — if the endpoint is a developer’s workstation running a deployment pipeline, the release fails. Account disable prevents authentication — if the account is a service account running the payroll batch job at midnight, 810 employees do not get paid on Friday.
The question is not whether containment has side effects. The question is whether the side effects are acceptable given what the attacker would do without containment.
A useful mental model: containment is surgery. It is necessary, it is immediate, and it causes pain. But unnecessary surgery — operating on the wrong patient, or performing an aggressive procedure when a conservative one would suffice — causes harm without benefit. The blast radius assessment is the diagnostic that precedes the surgery.
The four blast radius categories
Low blast radius (1 user or 1 non-critical device). Session revocation affects one user for one re-authentication cycle. MFA method removal requires one user to re-register their authenticator. Password reset requires one user to create a new password. Workstation isolation takes one non-critical device offline. Recovery time: 5 minutes. The user experiences brief disruption and the helpdesk may receive one ticket. For low-blast actions, auto-containment is appropriate at 95%+ confidence with no additional safeguards beyond the standard VIP check.
Medium blast radius (team or service). Account disable prevents the user from authenticating to any service. If the user manages a shared mailbox, a Teams channel, or a departmental SharePoint site, their team loses the manager’s access and permissions until the account is re-enabled. OAuth app revocation affects every user who consented to the application. IP block affects every user behind that IP (potentially a NAT gateway with hundreds of users). Recovery time: 15-30 minutes for account re-enable, potentially longer for OAuth re-consent. For medium-blast actions, auto-containment should include a confirmation check: is this a shared account? Does this IP map to a known corporate egress point?
High blast radius (department or production service). Server isolation takes an entire application offline. If SRV-NGE-DB01 (the ERP database server) is isolated, the ERP application stops working for all 810 users. Conditional access policy changes that affect all users can lock everyone out of corporate resources. Subnet firewall blocks can cut an entire office from the network. Service account disable can break automated processes that the organisation depends on — backup jobs, monitoring agents, integration pipelines. Recovery time: 1-4 hours, often requiring change management processes to reverse. For high-blast actions, auto-containment should be gated by human approval except in extreme cases (active ransomware encryption).
Critical blast radius (entire organisation). DNS sinkhole redirects all organisational DNS, potentially breaking every internet-dependent service. Global conditional access lockdown blocks all authentication except emergency break-glass accounts. KRBTGT password reset (double reset) invalidates every Kerberos ticket in the domain, forcing every user and service to re-authenticate. Network-wide isolation disconnects the entire organisation. Recovery time: 4-24 hours, requires coordinated effort across infrastructure, security, and operations teams. These actions are never fully automated. They are initiated by the IR lead with management approval, executed via pre-tested runbooks, and coordinated with all affected teams.
Dynamic blast radius assessment in playbooks
Static blast radius categories help you design automation. Dynamic blast radius checks make the automation context-aware at runtime.
The difference: a static assessment says “endpoint isolation is medium blast radius.” A dynamic assessment queries, at the moment the playbook fires, whether this specific endpoint is a workstation (low blast) or a server (high blast), whether it is the only device providing a specific service, whether it is currently running a critical job, and whether the user is a VIP.
Dynamic checks in a containment playbook look like this (conceptual Logic App flow):
Step 1: Trigger fires on high-confidence endpoint alert. Step 2: Query the device: is it a server or workstation? (Check device type in MDE device inventory.) Step 3: If server — route to approval gate. If workstation — continue to auto-isolate. Step 4: Check VIP watchlist: is the device’s primary user on the VIP list? Step 5: If VIP — route to approval gate regardless of device type. Step 6: If workstation, non-VIP — execute isolation, trigger evidence collection, post notification to SOC channel.
The Logic App makes a different decision for SRV-NGE-DB01 (server → approval gate) than for DESKTOP-NGE042 (workstation → auto-isolate). Both are endpoint isolation, but the blast radius is different, and the playbook adapts.
SA6 implements this decision tree in a production Logic App. This sub establishes the concept. The implementation follows.
The myth: Automated isolation of production servers would violate the SLA committed to customers or internal business units. If the automation incorrectly isolates a production server (false positive), the downtime counts against the SLA.
The reality: Your SLA almost certainly includes an exception for security incidents. Review the specific language — most SLAs exclude “planned or unplanned maintenance required for security purposes” or “service interruptions caused by security threats.” If yours does not include this exception, add it at the next renewal.
More importantly: the attacker does not respect your SLA. A ransomware encryption that takes the production server offline causes far more downtime than a 30-minute containment-and-verify cycle. The choice is not between “uptime” and “containment.” The choice is between “brief, controlled containment” and “uncontrolled attacker-driven outage.” Frame the blast radius conversation with this comparison: isolation causes 30 minutes of downtime with controlled recovery. Ransomware causes 2-14 days of downtime with uncertain recovery.
Blast radius applied to NE’s critical systems
Walk through Northgate Engineering’s infrastructure and assess the blast radius for containment of each critical system:
SRV-NGE-DC01 / DC02 (Domain Controllers). Isolation blast radius: CRITICAL. Isolating a domain controller breaks authentication for every domain-joined device and user. NEVER auto-isolate a domain controller. Containment for compromised DCs is a manual, coordinated process involving KRBTGT reset, DC rebuild, and forest recovery — not endpoint isolation.
SRV-NGE-DB01 (ERP Database). Isolation blast radius: HIGH. The ERP application serves all 810 users. Isolation breaks ERP access completely. Auto-isolate only on confirmed ransomware (VSS deletion + encryption activity). For credential theft or lateral movement indicators, route to approval gate with context: “SRV-NGE-DB01 serves the ERP application for 810 users. Isolating will take ERP offline. Approve / Reject / Delay 30 min.”
DESKTOP-NGE042 (Standard Workstation). Isolation blast radius: LOW. One user loses network access. Their work is interrupted for the duration of isolation. Auto-isolate on high-confidence endpoint threats. The user calls helpdesk, helpdesk sees the Sentinel incident, and the SOC team either confirms the containment or releases the device.
SRV-NGE-FS01 (File Server). Isolation blast radius: HIGH. Shared file access for multiple departments. Auto-isolate only on active ransomware encryption. For other threats, contain the user account that accessed the server rather than the server itself — this stops the attacker’s access without taking the server offline.
d.chen@northgateeng.com (Standard User). Session revocation blast radius: LOW. One user re-authenticates. Account disable blast radius: MEDIUM (d.chen manages no shared resources, so medium is conservative — it is effectively low for this specific user). Auto-revoke sessions on AiTM confirmation. Auto-disable only with VIP check.
svc_sql@northgateeng.com (Service Account). Disable blast radius: HIGH. This account runs the SQL Server integration service. Disabling it breaks the data pipeline between the ERP and the reporting system. NEVER auto-disable service accounts without checking their dependencies. SA5 builds the service account dependency check into the containment playbook.
Blast Radius Assessment Template — Northgate Engineering
System Type Blast Category Auto-Containment Approval Required Notes SRV-NGE-DC01/02 Domain Controller CRITICAL Never Always (IR Lead + CTO) KRBTGT reset only SRV-NGE-DB01 ERP Database HIGH Ransomware only Yes (SOC Lead) ERP offline for 810 users SRV-NGE-FS01 File Server HIGH Ransomware only Yes (SOC Lead) Prefer user containment DESKTOP-NGE* Workstation LOW Yes (95%+ confidence) No VIP check only Standard users Identity LOW Yes (session revoke) No VIP check only Service accounts Identity HIGH Never (auto-disable) Yes (IR Lead) Check dependencies first VIP users Identity MEDIUM Session revoke only Yes (SOC Lead) No auto-disable
Decision point: An AiTM alert fires for the CFO (VIP watchlist). The enrichment confirms AiTM with 97% confidence — MFA claim in token, Amsterdam IP, python-requests user agent. The playbook reaches the containment step. VIP check fires: this is the CFO. The playbook routes to the approval gate: “AiTM confirmed for CFO. Sessions active from 203.0.113.45. Approve session revocation?” The SOC analyst sees the adaptive card in Teams, reads the enrichment data, and clicks Approve. Sessions are revoked in under 2 minutes — not the 45 minutes of full manual triage, but not the 30-second auto-containment either. The VIP check adds a 1-2 minute delay but prevents the scenario where the CFO’s session drops during a board presentation without human judgment confirming it is necessary.
Try it: Build your blast radius table
Create a table like the NE example above for your environment. For each critical system and account type:
- What is the system’s role? (What breaks if it goes offline?)
- What blast radius category? (Low / Medium / High / Critical)
- Can any containment action be automated? Which ones?
- What approval is required?
- Who needs to be notified before execution?
If you cannot determine the blast radius for a system, that is a gap in your asset inventory — and a gap in your IR readiness. You cannot automate containment for systems you do not understand.
Where this goes deeper. SA6 implements the dynamic blast radius assessment in a production endpoint containment playbook — with Logic App conditions that query device type, check service dependencies, and route to the appropriate approval path. SA7 builds cross-environment blast radius assessment that evaluates the combined impact of simultaneous containment across identity + endpoint + network. The Incident Triage course (TR8) covers the manual blast radius assessment process that this automation replaces.
You're reading the free modules of this course
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.