SA0.5 The Blast Radius Assessment

5 hours · Module 0 · Free
BLAST RADIUS — CONTAINMENT ACTION IMPACT ASSESSMENTLOW BLAST1 user / 1 deviceSession revocationSingle endpoint isolatePassword resetMFA method removalRecovery: 5 minutesMEDIUM BLASTTeam / serviceAccount disableOAuth app revokeIP block (single)Group removalRecovery: 15-30 minutesHIGH BLASTDepartment / orgServer isolationCA policy (all users)Subnet firewall blockService account disableRecovery: 1-4 hoursCRITICAL BLASTEntire organizationDNS sinkholeGlobal CA lockdownKRBTGT resetNetwork-wide isolationRecovery: 4-24 hoursBLAST RADIUS ASSESSMENT — PRE-AUTOMATION CHECKLIST1. WHO is affected? One user, a team, a department, the entire organization?2. WHAT breaks? Which applications, services, or workflows stop working?3. HOW LONG to recover? Can the action be undone in 5 minutes or 5 hours?4. WHO needs to know? Helpdesk, management, the affected user, the MSSP?5. IS THERE A SAFER ALTERNATIVE? Session revoke instead of account disable?6. WHAT IS THE COST OF NOT ACTING? If you don't contain, what does the attacker do next?Answer all 6 before deploying any Tier 3 automation. Embed questions 1-3 as dynamic checks in the playbook.

Figure SA0.5 — Blast radius categories and the pre-automation assessment checklist. Every Tier 3 playbook must assess blast radius before executing containment.

Operational Objective
Automated containment is only valuable if the containment does not cause more damage than the attack. Isolating a file server that runs the ERP application for 810 users is a containment action — it stops the attacker. It also stops the business. The blast radius assessment quantifies the operational impact of every containment action BEFORE the action executes, and builds the impact assessment into the automation itself so the playbook makes context-aware decisions at runtime.
Deliverable: The Blast Radius Assessment Template — a 6-question framework for evaluating the operational impact of containment actions, applied to NE's critical systems, with Logic App implementation patterns for dynamic blast radius checks.
⏱ Estimated completion: 30 minutes

Every containment action breaks something

This is the truth that automation enthusiasts skip: every containment action has side effects. Session revocation forces re-authentication — if the user is presenting to a client via Teams, the call drops. Endpoint isolation cuts network access — if the endpoint is a developer’s workstation running a deployment pipeline, the release fails. Account disable prevents authentication — if the account is a service account running the payroll batch job at midnight, 810 employees do not get paid on Friday.

The question is not whether containment has side effects. The question is whether the side effects are acceptable given what the attacker would do without containment.

A useful mental model: containment is surgery. It is necessary, it is immediate, and it causes pain. But unnecessary surgery — operating on the wrong patient, or performing an aggressive procedure when a conservative one would suffice — causes harm without benefit. The blast radius assessment is the diagnostic that precedes the surgery.

The four blast radius categories

Low blast radius (1 user or 1 non-critical device). Session revocation affects one user for one re-authentication cycle. MFA method removal requires one user to re-register their authenticator. Password reset requires one user to create a new password. Workstation isolation takes one non-critical device offline. Recovery time: 5 minutes. The user experiences brief disruption and the helpdesk may receive one ticket. For low-blast actions, auto-containment is appropriate at 95%+ confidence with no additional safeguards beyond the standard VIP check.

Medium blast radius (team or service). Account disable prevents the user from authenticating to any service. If the user manages a shared mailbox, a Teams channel, or a departmental SharePoint site, their team loses the manager’s access and permissions until the account is re-enabled. OAuth app revocation affects every user who consented to the application. IP block affects every user behind that IP (potentially a NAT gateway with hundreds of users). Recovery time: 15-30 minutes for account re-enable, potentially longer for OAuth re-consent. For medium-blast actions, auto-containment should include a confirmation check: is this a shared account? Does this IP map to a known corporate egress point?

High blast radius (department or production service). Server isolation takes an entire application offline. If SRV-NGE-DB01 (the ERP database server) is isolated, the ERP application stops working for all 810 users. Conditional access policy changes that affect all users can lock everyone out of corporate resources. Subnet firewall blocks can cut an entire office from the network. Service account disable can break automated processes that the organisation depends on — backup jobs, monitoring agents, integration pipelines. Recovery time: 1-4 hours, often requiring change management processes to reverse. For high-blast actions, auto-containment should be gated by human approval except in extreme cases (active ransomware encryption).

Critical blast radius (entire organisation). DNS sinkhole redirects all organisational DNS, potentially breaking every internet-dependent service. Global conditional access lockdown blocks all authentication except emergency break-glass accounts. KRBTGT password reset (double reset) invalidates every Kerberos ticket in the domain, forcing every user and service to re-authenticate. Network-wide isolation disconnects the entire organisation. Recovery time: 4-24 hours, requires coordinated effort across infrastructure, security, and operations teams. These actions are never fully automated. They are initiated by the IR lead with management approval, executed via pre-tested runbooks, and coordinated with all affected teams.

Dynamic blast radius assessment in playbooks

Static blast radius categories help you design automation. Dynamic blast radius checks make the automation context-aware at runtime.

The difference: a static assessment says “endpoint isolation is medium blast radius.” A dynamic assessment queries, at the moment the playbook fires, whether this specific endpoint is a workstation (low blast) or a server (high blast), whether it is the only device providing a specific service, whether it is currently running a critical job, and whether the user is a VIP.

Dynamic checks in a containment playbook look like this (conceptual Logic App flow):

Step 1: Trigger fires on high-confidence endpoint alert. Step 2: Query the device: is it a server or workstation? (Check device type in MDE device inventory.) Step 3: If server — route to approval gate. If workstation — continue to auto-isolate. Step 4: Check VIP watchlist: is the device’s primary user on the VIP list? Step 5: If VIP — route to approval gate regardless of device type. Step 6: If workstation, non-VIP — execute isolation, trigger evidence collection, post notification to SOC channel.

The Logic App makes a different decision for SRV-NGE-DB01 (server → approval gate) than for DESKTOP-NGE042 (workstation → auto-isolate). Both are endpoint isolation, but the blast radius is different, and the playbook adapts.

SA6 implements this decision tree in a production Logic App. This sub establishes the concept. The implementation follows.

⚠ Compliance Myth: "We cannot automate containment of production servers because our SLA guarantees 99.9% uptime"

The myth: Automated isolation of production servers would violate the SLA committed to customers or internal business units. If the automation incorrectly isolates a production server (false positive), the downtime counts against the SLA.

The reality: Your SLA almost certainly includes an exception for security incidents. Review the specific language — most SLAs exclude “planned or unplanned maintenance required for security purposes” or “service interruptions caused by security threats.” If yours does not include this exception, add it at the next renewal.

More importantly: the attacker does not respect your SLA. A ransomware encryption that takes the production server offline causes far more downtime than a 30-minute containment-and-verify cycle. The choice is not between “uptime” and “containment.” The choice is between “brief, controlled containment” and “uncontrolled attacker-driven outage.” Frame the blast radius conversation with this comparison: isolation causes 30 minutes of downtime with controlled recovery. Ransomware causes 2-14 days of downtime with uncertain recovery.

Blast radius applied to NE’s critical systems

Walk through Northgate Engineering’s infrastructure and assess the blast radius for containment of each critical system:

SRV-NGE-DC01 / DC02 (Domain Controllers). Isolation blast radius: CRITICAL. Isolating a domain controller breaks authentication for every domain-joined device and user. NEVER auto-isolate a domain controller. Containment for compromised DCs is a manual, coordinated process involving KRBTGT reset, DC rebuild, and forest recovery — not endpoint isolation.

SRV-NGE-DB01 (ERP Database). Isolation blast radius: HIGH. The ERP application serves all 810 users. Isolation breaks ERP access completely. Auto-isolate only on confirmed ransomware (VSS deletion + encryption activity). For credential theft or lateral movement indicators, route to approval gate with context: “SRV-NGE-DB01 serves the ERP application for 810 users. Isolating will take ERP offline. Approve / Reject / Delay 30 min.”

DESKTOP-NGE042 (Standard Workstation). Isolation blast radius: LOW. One user loses network access. Their work is interrupted for the duration of isolation. Auto-isolate on high-confidence endpoint threats. The user calls helpdesk, helpdesk sees the Sentinel incident, and the SOC team either confirms the containment or releases the device.

SRV-NGE-FS01 (File Server). Isolation blast radius: HIGH. Shared file access for multiple departments. Auto-isolate only on active ransomware encryption. For other threats, contain the user account that accessed the server rather than the server itself — this stops the attacker’s access without taking the server offline.

d.chen@northgateeng.com (Standard User). Session revocation blast radius: LOW. One user re-authenticates. Account disable blast radius: MEDIUM (d.chen manages no shared resources, so medium is conservative — it is effectively low for this specific user). Auto-revoke sessions on AiTM confirmation. Auto-disable only with VIP check.

svc_sql@northgateeng.com (Service Account). Disable blast radius: HIGH. This account runs the SQL Server integration service. Disabling it breaks the data pipeline between the ERP and the reporting system. NEVER auto-disable service accounts without checking their dependencies. SA5 builds the service account dependency check into the containment playbook.

Blast Radius Assessment Template — Northgate Engineering

SystemTypeBlast CategoryAuto-ContainmentApproval RequiredNotes
SRV-NGE-DC01/02Domain ControllerCRITICALNeverAlways (IR Lead + CTO)KRBTGT reset only
SRV-NGE-DB01ERP DatabaseHIGHRansomware onlyYes (SOC Lead)ERP offline for 810 users
SRV-NGE-FS01File ServerHIGHRansomware onlyYes (SOC Lead)Prefer user containment
DESKTOP-NGE*WorkstationLOWYes (95%+ confidence)NoVIP check only
Standard usersIdentityLOWYes (session revoke)NoVIP check only
Service accountsIdentityHIGHNever (auto-disable)Yes (IR Lead)Check dependencies first
VIP usersIdentityMEDIUMSession revoke onlyYes (SOC Lead)No auto-disable

Decision point: An AiTM alert fires for the CFO (VIP watchlist). The enrichment confirms AiTM with 97% confidence — MFA claim in token, Amsterdam IP, python-requests user agent. The playbook reaches the containment step. VIP check fires: this is the CFO. The playbook routes to the approval gate: “AiTM confirmed for CFO. Sessions active from 203.0.113.45. Approve session revocation?” The SOC analyst sees the adaptive card in Teams, reads the enrichment data, and clicks Approve. Sessions are revoked in under 2 minutes — not the 45 minutes of full manual triage, but not the 30-second auto-containment either. The VIP check adds a 1-2 minute delay but prevents the scenario where the CFO’s session drops during a board presentation without human judgment confirming it is necessary.

Try it: Build your blast radius table

Create a table like the NE example above for your environment. For each critical system and account type:

  1. What is the system’s role? (What breaks if it goes offline?)
  2. What blast radius category? (Low / Medium / High / Critical)
  3. Can any containment action be automated? Which ones?
  4. What approval is required?
  5. Who needs to be notified before execution?

If you cannot determine the blast radius for a system, that is a gap in your asset inventory — and a gap in your IR readiness. You cannot automate containment for systems you do not understand.

Your automation playbook is about to isolate an endpoint. The dynamic blast radius check queries the MDE device inventory and discovers the device is tagged as "Server" with the label "Manufacturing SCADA Interface." What should the playbook do?
Isolate the server — the alert is high confidence and containment must be fast. Isolating a SCADA interface server without human approval could halt manufacturing operations. Speed does not override blast radius assessment for critical infrastructure.
Route to human approval with full context: device role, blast radius (manufacturing impact), enrichment data, and recommended action. The SOC analyst or IR lead evaluates the production impact against the security risk and makes the containment decision. The playbook provides the context; the human provides the judgment.
Isolate the server but immediately notify the manufacturing team. Notifying after isolation does not reduce the blast radius — the manufacturing line is already disrupted. The approval must happen BEFORE isolation for high-blast-radius systems.
Skip isolation and only revoke the user account that accessed the server. This may be the correct alternative action — but the playbook should present both options (isolate server vs contain user) to the human approver with blast radius context for each, not unilaterally choose the lower-impact action. The attacker may have persistence on the server that user containment does not address.

Where this goes deeper. SA6 implements the dynamic blast radius assessment in a production endpoint containment playbook — with Logic App conditions that query device type, check service dependencies, and route to the appropriate approval path. SA7 builds cross-environment blast radius assessment that evaluates the combined impact of simultaneous containment across identity + endpoint + network. The Incident Triage course (TR8) covers the manual blast radius assessment process that this automation replaces.

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus