In this section

0.3 The Three Automation Tiers

5 hours · Module 0 · Free

What you already know

Section 0.2 mapped the automation spectrum from fully manual to fully autonomous and introduced the left-to-right sequencing principle. This section defines the three tiers that classify every automation workflow by what it does to the environment: enrichment (zero blast radius), collection and notification (low blast radius), and containment (high blast radius). Every playbook you build in this course maps to one of these tiers.

Scenario

An AiTM phishing alert fires at 02:17 AM. The overnight analyst is handling two other incidents. By the time they reach this alert at 03:45, the attacker has created an inbox rule forwarding all email to an external address, consented to an OAuth application with Mail.ReadWrite permissions, and downloaded 340 files from SharePoint. Every one of those post-compromise actions was visible in the audit logs at 02:17. An enrichment playbook would have surfaced them in 30 seconds. An evidence collection playbook would have preserved the sign-in context and inbox rule state. A containment playbook with sufficient confidence would have revoked the session before the attacker created the forwarding rule.

Classifying by what the action does

The three tiers classify automation by what the playbook does to the environment, not by how complex the playbook is to build. A sophisticated enrichment pipeline with 12 API calls across 6 data sources is still Tier 1 because every call is read-only. A single-step playbook that disables a user account is Tier 3 because the consequence of incorrect execution is an immediate business disruption.

This distinction matters because it determines the governance model. Complexity determines how long a playbook takes to build and test. Blast radius determines what approvals, exclusions, and confidence thresholds the playbook needs before it runs in production. Confusing the two leads organizations to over-govern Tier 1 enrichment (slowing deployment of zero-risk automation) or under-govern Tier 3 containment (deploying high-risk automation without adequate safeguards).

Figure 0.3 — The three automation tiers. Risk increases left to right. Each tier's governance scales with its blast radius.

Tier 1: enrichment

Enrichment playbooks query data sources and attach the results to the incident. They fire at incident creation, run in parallel across multiple APIs, and complete before any analyst opens the incident. When an analyst reaches the queue, every incident already has context: the source IP's threat intelligence verdict, the affected user's risk score, their recent sign-in history, their device compliance state, and whether they appear on the VIP watchlist.

A production enrichment pipeline for AiTM alerts calls five to eight APIs. Microsoft Graph provides the user's current risk level from Entra ID Protection (/identityProtection/riskyUsers/{id}), their recent sign-in events (/auditLogs/signIns), and their group memberships for VIP classification. The Defender for Endpoint API provides the device compliance state. Threat intelligence connectors provide IP reputation scores. A Sentinel watchlist lookup checks whether the IP appears on the known-safe CDN ranges list. All of these calls are GET requests against read-only endpoints.

The managed identity permissions model for Tier 1 is straightforward and narrowly scoped. The Logic App's system-assigned managed identity needs these Graph API application permissions: User.Read.All (read user profiles and risk scores), AuditLog.Read.All (read sign-in logs and audit events), SecurityEvents.Read.All (read security alerts and risk detections). It also needs Microsoft Sentinel Reader on the workspace resource group for reading incident data, entity details, and watchlists. No write permissions of any kind.

Because the blast radius is zero, Tier 1 governance is lightweight. You build the playbook, test it against 10 sample incidents to verify the API calls return data and the output formatting is readable, and deploy it. If the enrichment query returns an error on a specific incident (a user not found in Entra ID, for example), the worst outcome is a missing enrichment field on one incident. No user, device, or service is affected. SA2 builds the complete Tier 1 pipeline.

Tier 2: notification and evidence collection

Tier 2 playbooks go beyond reading data. They preserve volatile evidence that would decay before an analyst reaches the incident, and they notify stakeholders through the appropriate channel at the appropriate urgency.

Evidence collection matters because cloud telemetry has retention limits. Advanced Hunting in Defender XDR retains raw event data for 30 days. Sentinel's default log retention varies by table: SecurityAlert retains for 90 days, but SigninLogs may retain for 30 days depending on the workspace configuration. An AiTM alert that sits untriaged for 72 hours has already lost 10% of its 30-day sign-in window. A collection playbook captures the sign-in context (Conditional Access evaluation, MFA claim details, token type, device compliance state at authentication time) and stores it as a structured JSON comment on the incident. That evidence exists permanently on the incident record regardless of when the analyst investigates.

Notification routing determines who learns about the incident and when. A high-severity identity alert generates a Teams adaptive card in the SOC channel with enrichment summary and action buttons. A critical incident involving a VIP triggers an email to the SOC manager with a different, executive-friendly template. An incident matching a regulatory pattern (data exfiltration involving personal data) generates a notification to the privacy officer with a compliance-specific format and a countdown to the 72-hour GDPR notification window. Without automated routing, the analyst has to make notification decisions manually for every incident, which adds 2 to 5 minutes per alert and is inconsistent across shifts.

The blast radius is low but real. A misconfigured notification playbook that sends 500 Teams messages per hour creates operational noise and erodes analyst trust in automation. A collection playbook that triggers Defender for Endpoint investigation packages on every low-severity alert consumes API quota (the MachineAction API enforces rate limits of 15 calls per minute per tenant) and may slow down collection for legitimate high-severity investigations. These are operational problems, not security incidents, but they justify validation testing before production deployment.

Governance for Tier 2 adds testing requirements. You deploy the playbook against a subset of alert types (one analytics rule at a time), verify that notification routing reaches the correct channels, confirm that evidence collection completes within the expected timeframe (under 60 seconds for most API calls), and monitor API rate limits for the first week. SA3 builds the complete evidence auto-collection stack. SA4 builds the notification and escalation framework.

Tier 3: containment and response

Tier 3 playbooks modify security state. They revoke sessions, disable accounts, isolate devices, block network addresses, and remove OAuth consent grants. Each of these actions directly affects a user, a device, or a service, and each has a different severity of impact if executed incorrectly.

The distinction within Tier 3 matters as much as the distinction between tiers. Session revocation forces the user to re-authenticate. If the alert was a false positive, the user experiences a brief interruption: they re-enter their credentials and continue working. No data is lost, no configuration changes, no permanent impact. Account disablement blocks all access across M365: email stops, Teams calls disconnect, SharePoint becomes unavailable, and any application using Entra ID as the identity provider fails authentication. Reversing account disablement requires an administrator action, which takes minutes to hours depending on help desk staffing and time of day.

Automation Tier Assessment

Action A — Revoke sessions: User re-authenticates in 30 seconds. False positive cost: minor inconvenience. Appropriate at 85% confidence for semi-automated, 95% for fully automated.

Action B — Disable account: All M365 access blocked until admin re-enables. False positive cost: hours of lost productivity, help desk ticket, potential executive escalation. Appropriate at 95% confidence for semi-automated with VIP exclusion, 99% for fully automated on non-VIP accounts only.

Action C — Isolate device: Device removed from network until admin releases. False positive cost: user cannot work at all until IT responds. Appropriate at 95% confidence for semi-automated, fully automated only for confirmed malware with high-fidelity EDR detection.

Design decision: Deploy session revocation first. It has the lowest false-positive cost and the fastest reversal. Graduate to account disablement and device isolation only after revocation has run in approval mode for 30+ days with tracked approval rates.

The permissions model for Tier 3 is strict. The Logic App managed identity needs write permissions: User.RevokeSessions for session revocation, User.ReadWrite.All for account disablement, Machine.Isolate for device isolation via the Defender for Endpoint API. These permissions allow the playbook to modify production accounts and devices. If the managed identity is compromised, an attacker can use those same permissions to disable accounts or isolate devices. Tier 3 managed identities should be scoped as narrowly as possible, and the Logic App should be in a dedicated, access-controlled resource group.

Policy Specification

Policy: Tier 3 Automation Deployment Requirements

Confidence threshold: Quantified from the triggering analytics rule's true positive rate over the previous 90 days. Minimum 85% for semi-automated, 95% for fully automated.

Blast radius assessment: Entity checked against VIP watchlist, service account list, and critical infrastructure registry before execution. Any match routes to approval gate regardless of confidence.

Approval mode: All new Tier 3 playbooks run in approval mode for minimum 30 days. Promotion to fully automated requires documented approval rate above 95% and zero false-action incidents.

Rollback: Every Tier 3 action has a documented reversal procedure tested quarterly. Session revocation: no reversal needed (user re-authenticates). Account disable: re-enable via Graph API or admin portal. Device isolation: release via Defender for Endpoint console.

SA5, SA6, and SA7 build the complete Tier 3 stack: identity containment, endpoint containment, and cross-environment coordinated response with all of these governance controls.

Anti-Pattern

One containment playbook for all entity types

A single playbook that disables every affected account on every incident regardless of entity type will eventually disable a service account that runs payroll, a shared mailbox that processes customer orders, or a break-glass emergency access account. Tier 3 requires differentiated response: different containment actions for user accounts versus service accounts, different confidence thresholds for standard users versus VIPs, and exclusion lists that protect entities where a false action causes more damage than a delayed response.

The tier dependency chain

Each tier depends on the infrastructure the previous tier builds. Tier 2 evidence collection cannot preserve meaningful context unless Tier 1 enrichment has already identified what context to collect. A collection playbook that blindly captures every SigninLog entry for a user produces noise — hundreds of routine sign-in records mixed with the three that matter. A collection playbook that uses the Tier 1 enrichment (which identified the anomalous IP and the time window of suspicious activity) to capture only the relevant sign-in records and the specific inbox rules created during that window is surgical. The enrichment output shapes the collection scope.

Tier 3 containment cannot calculate confidence without Tier 1 enrichment data. A session revocation playbook that fires on every high-severity AiTM alert will produce false actions because not every high-severity alert is a true positive. High severity means the rule author assessed the potential impact as significant — it does not mean the alert is confirmed. A session revocation playbook that checks the Tier 1 enrichment (user risk score elevated, IP flagged by two or more TI sources, impossible travel detected, no VIP match) and calculates a composite confidence score from those signals can distinguish between alerts that warrant automated action and alerts that need human review.

The dependency is not just architectural. It is operational. Organizations that deploy Tier 3 containment before Tier 1 enrichment is stable encounter a specific failure pattern: the containment playbook lacks the signal data it needs to make accurate confidence calculations, so it either fires too aggressively (false actions, business disruption, program shutdown) or sets confidence thresholds so conservatively that it never fires at all (zero value, abandoned). Both outcomes kill the automation program for different reasons. Enrichment-first avoids both by generating the data that makes containment decisions reliable.

Deploy in tier order. Bring Tier 1 to production first, starting with your highest-volume alert types to maximize the immediate capacity benefit. Accumulate operational data for at least two weeks to validate enrichment accuracy and API reliability. Deploy Tier 2 once enrichment is stable and you have confidence in the data quality. Deploy Tier 3 only after both Tier 1 and Tier 2 are reliable and you have enough historical data to calculate meaningful confidence thresholds from the enrichment signals.

Automation Principle

Classify every automation action by what it does to the environment, not by how complex it is to build. A 12-step enrichment pipeline is Tier 1 because it reads data. A one-step session revocation is Tier 3 because it modifies security state. Complexity determines build effort. Blast radius determines governance. Never confuse the two.

Section 0.4 defines the confidence threshold problem: how to calculate the confidence score that determines whether a Tier 3 playbook executes autonomously or pauses for human approval. Without a quantified confidence threshold, automated containment is either too aggressive or too conservative.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →