SA0.2 The Automation Spectrum

5 hours · Module 0 · Free

Figure SA0.2 — The automation spectrum. Risk increases from left to right. SOCs should automate left-to-right, building confidence at each tier before advancing to higher-risk actions.

Operational Objective

Most automation conversations start in the wrong place. A SOC manager says "we need to automate incident response" and the team immediately designs a playbook that isolates endpoints and disables accounts — the highest-risk actions on the spectrum. When that playbook misfires (and it will, because it was built without the intermediate experience), automation is abandoned entirely. This sub teaches the full automation spectrum from passive enrichment through active containment, establishes why left-to-right progression matters, and gives you the framework to identify where each action sits on the spectrum.

Deliverable: The ability to classify any proposed automation action on the spectrum, assess its risk tier, and determine the appropriate safeguards — or whether it should be automated at all.

⏱ Estimated completion: 25 minutes

Automation is not binary

The question is not “should we automate?” The question is “what should we automate, and how far along the spectrum should each action go?”

Security automation is a spectrum with five positions. Each position represents a different type of action, a different risk level, and a different set of safeguards.

Position 1: Enrich. Add context to an alert without changing anything. Query an IP against threat intelligence feeds. Check a user’s risk score in Entra ID Protection. Look up a device’s compliance status in Intune. Pull the user’s last 10 sign-ins. None of these actions modify anything. They are read-only operations that transform a raw alert into an investigation-ready incident. There is no blast radius. An incorrect enrichment result does not disable a user, isolate an endpoint, or block a network connection — it adds incorrect context that the analyst can verify and discard. Automate everything at this position without hesitation.

Position 2: Collect. Capture evidence automatically at alert time. Export the user’s SigninLogs for the last 24 hours. Trigger a Defender for Endpoint investigation package. Download the mailbox audit log for the last 7 days. Collect the user’s group memberships and OAuth consent grants. These are read operations that produce files — they do not modify the environment. The risk is minimal: the worst case is collecting unnecessary data (false positive) or failing to collect (playbook error). Neither causes an outage. The value is enormous: evidence captured at alert time preserves volatile data that may be gone by the time an analyst opens the incident.

Position 3: Notify. Alert the right people through the right channels. Send a Teams adaptive card to the SOC channel with incident summary and entity details. Email the CISO for Critical severity incidents. Create a ServiceNow ticket. Escalate to the on-call analyst if the incident is High severity and it is outside business hours. Notify BlueVoyant (the managed SOC partner) for coordinated response. The risk here is notification fatigue — too many notifications, sent to too many people, for too many low-severity incidents, and everyone learns to ignore them. The safeguard is severity-based routing and deduplication.

Position 4: Contain. Stop the attacker by modifying the environment. Revoke the user’s sessions. Isolate the endpoint from the network. Disable the compromised account. Add the attacker’s IP to the firewall block list. Reset the user’s MFA methods. Each of these actions has blast radius — it affects a real user or system. If the detection is a false positive, the automation disrupts a legitimate user or takes a production system offline. This position requires confidence thresholds: only auto-contain when the detection confidence is high enough that the false positive risk is acceptable. For lower-confidence detections, the automation enriches, collects, notifies, and presents the containment action as a recommendation for human approval.

Position 5: Remediate. Restore the environment to a known-good state. Re-enable the account with a new password. Release the endpoint from isolation after confirming it is clean. Remove the attacker’s persistence mechanisms (inbox rules, OAuth grants, scheduled tasks). Patch the vulnerability that was exploited. Each remediation action assumes the investigation is complete and the attacker is fully contained. Premature remediation — removing an inbox rule while the attacker still has session access — allows the attacker to recreate the persistence. Most remediation is semi-automated: the playbook executes the mechanical steps, but a human confirms the investigation is complete before triggering it.

Why left-to-right matters

The progression from enrichment to remediation is not arbitrary. Each position builds operational confidence that makes the next position safer.

When you deploy enrichment automation and it runs successfully for 30 days, you learn three things. First, you understand how Sentinel triggers work — what fires the playbook, what data is available in the trigger payload, and how entity extraction behaves (it fails more often than the documentation suggests). Second, you develop monitoring habits — you check the Logic App run history daily, you notice when a playbook fails, and you fix it before the failure compounds. Third, you build trust with the SOC team — the analysts see enrichment data appearing automatically on their incidents and they start relying on it. They suggest additional enrichment sources. They become advocates for automation instead of skeptics.

That operational confidence — the understanding of how the automation behaves in production, the monitoring habits, and the team trust — is the prerequisite for containment automation. A team that has never deployed a playbook should not start with auto-isolation. A team that has run enrichment successfully for 90 days has the operational muscle to deploy containment safely.

The teams that skip the progression — building a containment playbook as their first automation — consistently hit the same failure mode. The playbook works in testing. It deploys to production. It fires on a false positive that the team did not anticipate because they have never seen how false positives manifest in their environment at scale. The containment action disrupts a legitimate user or system. The automation is disabled. The team returns to manual operations and refuses to try again.

Classifying automation actions

Every proposed automation action maps to a position on the spectrum. The classification determines the safeguards required.

Ask three questions about any automation candidate:

Does it change anything? If no — it is enrichment or collection (Positions 1-2). Automate freely. If yes — it is containment or remediation (Positions 4-5). Apply safeguards. If it sends a message — it is notification (Position 3). Apply deduplication and routing logic.

What happens if it fires on a false positive? For enrichment: nothing harmful — incorrect context is visible and correctable. For collection: wasted storage — evidence collected for a non-incident is deleted during triage. For notification: alert fatigue — manageable with severity routing. For containment: user disruption or service outage — requires confidence thresholds and approval gates. For remediation: evidence destruction or premature changes — requires investigation completion confirmation.

What is the blast radius? One user (session revocation), one device (endpoint isolation), one network segment (firewall rule), all users (conditional access policy change), all endpoints (fleet-wide indicator block), the entire organization (DNS sinkhole, KRBTGT reset). The wider the blast radius, the higher the confidence threshold and the more safeguards required.

⚠ Compliance Myth: "Automated containment violates change management policies because it modifies production systems without a change ticket"

The myth: ITIL change management requires that all modifications to production systems go through a change advisory board (CAB) review. Automated containment — isolating an endpoint, disabling an account, modifying a firewall rule — modifies production systems without CAB approval. This violates the change management policy and could result in audit findings.

The reality: ITIL explicitly defines “emergency changes” as changes required to resolve major incidents or security threats. Emergency changes bypass the standard CAB process and are documented retrospectively. Automated containment actions during an active security incident are emergency changes by definition. The key requirement is documentation: every automated action must be logged with timestamps, trigger conditions, and the entities affected. Sentinel playbooks provide this automatically — every Logic App run is logged with full execution history. The audit trail for automated containment is actually more complete and reliable than the audit trail for a human executing containment manually while under incident pressure.

Document your automated containment actions in your change management policy as “pre-approved emergency changes for security incidents.” Define the conditions, the actions, and the approval mechanism (automated for high-confidence detections, human approval for medium-confidence). Have the CAB review and approve this category. Problem solved.

The spectrum applied to NE’s top 5 alert types

Northgate Engineering’s top 5 alert types by volume, and where each sits on the automation spectrum:

1. AiTM credential phishing (35 alerts/week). Enrichment: auto-query SigninLogs for MFA claim analysis, check IP reputation, check user risk. Collection: auto-export last 24h SigninLogs + AuditLogs + mailbox audit. Notification: Teams card to SOC channel + email to IR lead. Containment: auto-revoke sessions + require MFA re-registration (high confidence, confirmed AiTM pattern). This alert type moves all the way to Position 4 because the detection is high-fidelity — the MFA-claim-in-token pattern has a very low false positive rate.

2. Impossible travel (120 alerts/week). Enrichment: auto-check VPN exit nodes, known travel patterns (watchlist), device compliance. Collection: auto-export SigninLogs. Notification: none for low risk scores. Containment: none — too many legitimate scenarios (VPN, travel, mobile networks). This alert stays at Positions 1-2 because the false positive rate is high.

3. Suspicious inbox rule creation (20 alerts/week). Enrichment: auto-query the rule details, check if forwarding external, compare against known-good patterns. Collection: auto-export mailbox audit. Notification: Teams card if external forwarding detected. Containment: auto-remove the rule only if it matches the malicious pattern exactly (forward to external + mark as read + contains financial keywords). Partial containment — the rule is removed but the account is not disabled.

4. Suspicious process creation on endpoint (200 alerts/week). Enrichment: auto-query process chain, parent-child relationships, file hash against TI. Collection: auto-trigger MDE investigation package. Notification: Teams card for High severity. Containment: auto-isolate only for ransomware indicators (VSS deletion + service creation). Human approval for everything else. The high volume makes containment automation dangerous — 200 false positive isolations per week would paralyse the organisation.

5. Failed sign-in attempts (50 alerts/week). Enrichment: auto-check source IP, count failed attempts, check if any succeeded. Collection: none needed. Notification: none — low severity. Containment: none — failed attempts are unsuccessful attacks. Auto-close if all attempts failed and no successful logon follows within 4 hours. This alert type uses automation for triage acceleration (auto-close known-false-positive patterns), not containment.

Decision point: You are building automation for Northgate Engineering’s AiTM alerts. The enrichment playbook works: it checks MFA claims, IP reputation, and user risk, and adds all results as an incident comment. The SOC team loves it. Now someone suggests adding auto-containment: session revocation + MFA reset when the enrichment confirms AiTM. The question is not whether to add it — the question is whether the enrichment has run long enough to validate the false positive rate. If the enrichment playbook has been running for 60 days and confirmed 100% of its AiTM classifications against analyst review, the false positive risk for auto-containment is quantified and acceptable. If the enrichment playbook has been running for 5 days with 12 total fires and no analyst review of accuracy, you do not have enough data to trust auto-containment. The decision is: automate containment when you have 30+ days of validated enrichment data confirming the classification accuracy.

Try it: Map your alert types to the spectrum

Take your top 5 alert types by volume. For each one, determine:

What enrichment would help? (IP, user, device, TI, history)
What evidence should be collected at alert time?
Who needs to be notified, and at what severity?
Can any containment action be automated? If so, at what confidence level?
What is the false positive rate for this alert type? (If you don’t know, you cannot safely automate containment.)

Map each alert type to the five spectrum positions. The alert types that reach Position 4 (containment) should be rare — typically only those with very high detection confidence and clear, reversible containment actions.

Your SOC's "impossible travel" alert has a 78% false positive rate — most fires are caused by VPN exit node geo-location errors. A junior analyst proposes building an automation that auto-disables accounts for all impossible travel alerts. What is the correct response?

Build the automation but add a 30-minute delay before disabling. The delay does not help — it still disables accounts for 78% of false positives. The problem is the false positive rate, not the timing.

Build the automation with a VIP watchlist exclusion. This prevents disabling executives but still disrupts 78% of non-VIP legitimate users. The false positive rate is the problem for all users, not just VIPs.

Do not automate containment for this alert type. Instead, automate enrichment (check VPN exit nodes, known travel watchlist, device compliance) and auto-close alerts where the enrichment confirms the travel is legitimate. Containment automation requires a false positive rate low enough that the disruption from false containment is acceptable — 78% is far too high.

Build the automation and accept the false positives — the 22% true positives are worth catching. No. Disabling 78 legitimate accounts to catch 22 compromised ones means 78 users cannot work, 78 helpdesk tickets are created, and the SOC team spends more time re-enabling accounts than they save on triage. Automation should save time, not create more work.

Where this goes deeper. SA2 builds the enrichment pipeline in detail — every enrichment type (IP, user, device, TI, history, geo) with production Logic App configurations. SA5-SA7 build containment automation with the confidence thresholds and safeguards that make Position 4 safe. The Incident Triage course (TR2-TR5) teaches the manual triage methodology that automation accelerates — understanding what the analyst does manually is prerequisite to automating it correctly.

Operational Artifact — Automation Spectrum Classification Template

Use the five-position spectrum to classify every automation candidate before building it. For each alert type: document the enrichment actions (Position 1), collection actions (Position 2), notification routing (Position 3), and containment eligibility (Position 4). Only alert types with confirmed low false positive rates (validated over 30+ days) advance to Position 4. The spectrum classification is the input to the automation roadmap in SA12.

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus

← SA0.1 Why Most SOCs Don't Automate (And Why They Should) SA0.3 The Three Automation Tiers →