SA0.3 The Three Automation Tiers
Figure SA0.3 — The three automation tiers. Every automation action in this course maps to one of these tiers. The tier determines the safeguards required before deployment.
Tier 1 — Enrichment: the automation you deploy this week
Tier 1 is every read-only action that adds context to an alert or incident. The defining characteristic is that nothing in the environment changes. The user does not lose access. The endpoint does not go offline. The firewall does not block a connection. The automation reads data, processes it, and attaches the results to the incident.
Enrichment actions include: querying an IP address against threat intelligence feeds, checking a user’s risk score in Entra ID Protection, looking up a device’s compliance status in Intune, searching for the user’s recent sign-in history, checking whether the user or IP has appeared in previous incidents, calculating impossible travel distance and speed, querying VirusTotal for a file hash, and pulling the user’s group memberships and role assignments.
The output of Tier 1 automation is an enriched incident. Instead of opening a raw alert that says “Unfamiliar sign-in properties for d.chen,” the analyst opens an incident with comments showing: d.chen’s last 10 sign-ins, d.chen’s current risk level (High), the source IP’s TI status (no hits on AbuseIPDB, flagged by one Sentinel TI feed as proxy infrastructure), d.chen’s device compliance (non-compliant — the sign-in came from an unregistered device), and d.chen’s previous incident history (zero prior incidents in 12 months). The analyst reads the enrichment, makes a judgment call in 60 seconds, and moves to investigation or closes the alert.
Without enrichment automation, the analyst runs each of those queries manually. They open a new browser tab, navigate to Sentinel, type a KQL query, wait for results, copy the relevant finding, paste it into the incident comment, and repeat for each enrichment source. Five enrichment queries × 1-2 minutes each = 5-10 minutes before the analyst has enough context to make a triage decision. Multiply by 500 alerts per day and you understand why the queue never empties.
Why Tier 1 is always safe. The worst case for a Tier 1 automation is that it returns incorrect information. The IP reputation query returns “clean” for a known-malicious IP because the TI feed has not yet classified it. The user risk score shows “None” because Entra ID Protection has not processed the latest sign-in. The device compliance shows “Compliant” because the Intune sync is delayed. Each of these failures is visible to the analyst — they see the enrichment result and can verify it against their own judgment. Incorrect enrichment does not change the environment. It does not disable anyone’s access. It does not isolate any system. It provides a data point that the analyst incorporates into their decision.
Deploy Tier 1 first. Start here. Build one enrichment playbook. Deploy it. Watch it run for a week. Fix the errors (there will be errors — entity extraction fails on certain incident types, API calls timeout on high-volume days, the managed identity needs an additional Graph permission). Learn how playbooks behave in production. Build the next enrichment. Within 30 days, you have a comprehensive enrichment pipeline that transforms every incident from a raw alert into an investigation-ready package. That pipeline is the foundation for everything that follows.
Tier 2 — Notification and collection: the automation you deploy next month
Tier 2 encompasses two categories of action: alerting humans and capturing evidence. Both have low blast radius but require more careful design than Tier 1.
Notification automation sends messages to people. Teams adaptive cards in the SOC channel. Email notifications to the CISO for Critical incidents. ServiceNow ticket creation for every High/Critical incident. On-call analyst escalation via PagerDuty or Opsgenie integration. MSSP notification for incidents requiring coordinated response.
The risk in notification is fatigue. If every incident generates a Teams message, the SOC channel becomes noise and everyone mutes it. If every Medium severity alert emails the CISO, the CISO stops reading security emails. If tickets are created for Low severity alerts that are auto-closed 30 minutes later, the ticketing system fills with resolved-before-read tickets.
The safeguards for notification are severity-based routing and deduplication. Critical incidents notify the CISO, the IR lead, and the SOC channel. High incidents notify the SOC channel and create a ticket. Medium incidents are visible in the Sentinel queue but do not generate external notifications. Low incidents are auto-enriched and auto-triaged with no notification. This matrix is defined once and automated — the playbook checks the incident severity and routes the notification accordingly.
Collection automation captures evidence at alert time. This is Tier 2 because it writes data — exporting logs, triggering investigation packages, and storing evidence files. But it does not modify the environment in ways that affect users or systems. The worst case is unnecessary evidence collection (wasted storage and processing for a false positive) or failed collection (the playbook errors and evidence is not captured — the analyst collects it manually later).
The value of collection automation is preservation. Session tokens expire. Sign-in logs roll after 30 days. Processes terminate. Network connections close. Evidence that exists at alert time may be gone by investigation time. A collection playbook that fires on incident creation and exports the last 24 hours of SigninLogs, AuditLogs, and OfficeActivity for the affected user captures volatile evidence automatically. When the investigator opens the case hours later, the evidence is waiting — complete, timestamped, and attached to the incident with chain-of-custody metadata.
Tier 3 — Containment and response: the automation you deploy with confidence
Tier 3 is every action that modifies the environment to stop an attacker. These actions have direct impact on users and systems.
Session revocation forces the user to re-authenticate. If the user is in the middle of a presentation, their SharePoint stops working, their Teams call drops, and their email closes. If the user is an attacker replaying a stolen token, the revocation terminates their access and they must re-phish or re-steal credentials to return.
Endpoint isolation cuts the device off from the network. If the device is a compromised workstation, isolation stops C2 communication, lateral movement, and data exfiltration. If the device is a production server running the ERP application, isolation takes the ERP offline for 810 users.
Account disable prevents all authentication. If the account is compromised, the attacker cannot sign in. If the account belongs to the on-call network engineer during a network outage, nobody can fix the outage until the account is re-enabled.
Each Tier 3 action has clear value (stop the attacker) and clear risk (disrupt legitimate operations). The question is never “should we automate containment?” The question is “under what conditions is the detection confidence high enough that the risk of false positive disruption is acceptable?”
Confidence thresholds. Define a confidence level for each detection type. AiTM detection with MFA-claim-in-token analysis: 95%+ confidence — the false positive rate for this specific pattern is near zero. Auto-contain. Impossible travel: 22% confidence (78% false positive rate) — do not auto-contain. Ransomware pre-encryption indicators (VSS deletion + service creation + SMB spreading): 90%+ confidence — auto-isolate the endpoint. Generic “suspicious process creation”: 40% confidence — enrich and notify only.
The confidence threshold is not a guess. It is measured. You deploy the detection rule, you enrich alerts for 30+ days, and you track the analyst’s classification (TP, FP, BTP) for every alert. After 30 days, you know the false positive rate. You set the containment threshold based on the measured rate. If the FP rate drops below 5%, you enable auto-containment. If it remains above 15%, you keep containment manual.
Safeguards for Tier 3 automation. Beyond confidence thresholds, Tier 3 playbooks include:
VIP watchlist check — before disabling an account, check if the user is on the VIP watchlist (C-suite, board members, critical service accounts). If they are, route to human approval instead of auto-executing. The CEO account is never auto-disabled.
Time-of-day check — endpoint isolation during business hours affects productivity immediately and generates helpdesk tickets. During off-hours, isolation has less immediate impact. Some teams use time-of-day gates: auto-isolate overnight, require approval during business hours.
Blast radius calculation — before isolating a device, check its role. Is it a workstation (low impact) or a server (high impact)? Is it the only device providing a specific service? Are other devices in the same attack dependent on this device remaining accessible? SA7 builds dynamic blast radius assessment into the containment playbook.
Rollback playbook — every Tier 3 action has a corresponding undo action. Account disabled → account re-enabled + password reset + user notified. Endpoint isolated → endpoint released from isolation + verification scan. The rollback playbook is tested and ready before the containment playbook is deployed. When the false positive occurs (and it will — even at 95% confidence, 1 in 20 fires is wrong), the rollback executes in minutes, not hours.
Post-containment verification — after executing containment, verify that it worked. Did the session revocation actually terminate the attacker’s active sessions? (Check: are there new sign-ins from the same IP after revocation?) Did the endpoint isolation actually cut network access? (Check: is the device still communicating with the C2 IP?) Verification is automated as the final step in the containment playbook.
The myth: Deploying automation requires a formal risk assessment, management sign-off, and a complete impact analysis before any playbook can go live. This process takes 6-12 weeks per playbook.
The reality: Tier 1 automation (enrichment) requires no risk assessment because it has zero blast radius. Deploy it, monitor it, improve it. Tier 2 automation (notification + collection) requires a brief review — confirm the notification routing matches the escalation policy, confirm evidence storage meets retention requirements. This is a 30-minute conversation, not a 6-week process.
Tier 3 automation (containment) does warrant a structured review — but not a 6-week formal assessment. The review covers: what triggers the containment, what safeguards are in place, what is the measured confidence level, and what is the rollback procedure. If the confidence level is based on 30+ days of measured false positive data, the review is straightforward. The three-tier model makes the review proportional to the risk — not a blanket bureaucratic process applied equally to “add an incident comment” and “disable a user account.”
Three-Tier Automation Decision Framework
For every automation candidate, answer these questions:
Step 1: What tier is this action?
- Does it read data without changing anything? → Tier 1 (enrichment)
- Does it alert humans or capture evidence? → Tier 2 (notification/collection)
- Does it modify the environment? → Tier 3 (containment/response)
Step 2: What safeguards are required?
- Tier 1: Rate limiting, error handling, API timeout management
- Tier 2: Severity routing, deduplication, storage limits
- Tier 3: Confidence threshold, VIP check, blast radius, approval gate, rollback, verification
Step 3: What approval is needed to deploy?
- Tier 1: SOC team review. Deploy and monitor.
- Tier 2: SOC lead approval. Test notifications with sample incidents.
- Tier 3: IR lead + SOC manager approval. 30 days of measured confidence data. Staging workspace test. Rollback playbook tested. Runbook documented.
Step 4: What monitoring is required post-deployment?
- All tiers: Logic App run success/failure rate, execution latency, error patterns
- Tier 2: Notification volume tracking, ticket creation rate
- Tier 3: Containment action count, false positive rate, rollback frequency, blast radius incidents
Decision point: An analyst builds a playbook that auto-disables any user account that triggers 3 or more High severity alerts within 10 minutes. The logic seems sound — 3 high-severity signals in 10 minutes is strong evidence of compromise. But you check the current analytics rules and discover that one rule fires on “multiple failed MFA attempts” (high severity), and this rule often fires 3 times in sequence when a user misremembers their authenticator code and retries. Your analyst’s “3 alerts in 10 minutes” containment rule would disable every user who fumbles their MFA. The decision: containment automation must account for alert correlation, not just alert count. Three alerts from three different detection rules (sign-in anomaly + inbox rule creation + OAuth consent) is stronger evidence than three alerts from the same rule (3 MFA failures). The playbook needs correlation logic, not just counting logic.
Try it: Classify your current automation
If your SOC has any existing automation (Sentinel automation rules, Defender AIR, custom playbooks), classify each one by tier:
- List every automation rule, playbook, and auto-response currently active
- For each one, assign a tier (1, 2, or 3)
- For every Tier 3 action, check: does it have a confidence threshold? A VIP check? A rollback procedure?
- For every Tier 2 action, check: is notification routing based on severity? Is deduplication configured?
If you find Tier 3 actions without safeguards — that is your highest-priority fix. If you find no Tier 1 actions — that is your highest-priority build.
Where this goes deeper. SA5 builds Tier 3 identity containment with every safeguard: confidence thresholds, VIP watchlists, meeting-time checks, sole-admin checks, and rollback playbooks. SA6 builds Tier 3 endpoint containment with server-vs-workstation decision logic and multi-endpoint coordination. SA7 builds cross-environment Tier 3 automation that coordinates identity + endpoint + network containment simultaneously. The framework from this sub is the foundation that every containment module builds on.
You're reading the free modules of this course
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.