2.7 Automated Investigation and Response (AIR)

10-14 hours · Module 2

Automated Investigation and Response (AIR)

SC-200 Exam Objective

Domain 1 — Manage a SOC Environment: "Manage automated investigation and response capabilities in Microsoft Defender XDR" and "Configure automatic attack disruption in Microsoft Defender XDR."

Introduction

Automated Investigation and Response is MDE’s built-in investigation engine. When an alert fires, AIR does not wait for an analyst — it immediately begins investigating the alert by examining related processes, files, network connections, registry modifications, and user activities. It follows the same investigation methodology a human analyst would, but at machine speed and scale. A human analyst might take 30 minutes to analyze an alert. AIR completes the same analysis in seconds.

AIR exists because SOC alert volume exceeds human capacity. A mid-size organization generates hundreds of endpoint alerts per week. If every alert requires 30 minutes of manual investigation, the SOC needs dozens of analysts just to keep up with the queue. AIR handles the routine investigation work — the alerts that follow predictable patterns with clear remediation steps — freeing human analysts to focus on the complex, ambiguous, multi-stage attacks that require judgment and creativity.

This subsection teaches you how AIR works architecturally, the four automation levels and when to use each, what AIR investigates and what remediation actions it recommends, how to review and approve pending actions, how to configure automation levels per device group, and how attack disruption extends AIR’s capabilities to automatically contain active attacks.

How AIR works

When a detection triggers an alert, AIR launches an automated investigation. The investigation follows a structured process that mirrors human investigation methodology.

Figure 2.9: AIR's five-step process mirrors human investigation methodology but executes in seconds. The critical configuration decision is step 5: whether remediation executes automatically (full automation) or waits for analyst approval (semi-automation).

Step 1: Alert analysis. AIR reads the alert metadata — the detection rule that fired, the MITRE ATT&CK technique, the severity, and the entities involved (device, user, file, process, IP).

Step 2: Artifact collection. AIR gathers related artifacts from the device: the process tree connected to the alerted process, files written or modified by those processes, network connections made by those processes, registry keys created or modified, and other alerts on the same device or involving the same user.

Step 3: Evidence correlation. AIR cross-references the collected artifacts against threat intelligence, known-malicious indicators, behavioral patterns, and the device’s historical baseline. It determines whether each artifact is malicious, suspicious, benign, or unknown.

Step 4: Verdict determination. Based on the evidence analysis, AIR classifies the overall investigation as one of: malicious (confirmed threat requiring remediation), suspicious (probable threat requiring analyst review), no threats found (the alert was a false positive or the threat was already remediated), or partially investigated (some artifacts could not be fully analyzed — typically because the device was offline or data was incomplete).

Step 5: Remediation recommendation. For malicious verdicts, AIR recommends specific remediation actions: quarantine a malicious file, stop a malicious process, remove a persistence mechanism (scheduled task, registry key), undo a mailbox rule change, or reset user credentials. Whether these actions execute automatically or wait for analyst approval depends on the automation level.

The investigation process and its findings are visible in the Defender portal under the Investigation tab of the incident. Each step shows what AIR examined, what it found, and why it reached its verdict. This transparency is important — you need to be able to verify that AIR’s analysis is correct before approving remediation actions.

The four automation levels

The automation level determines how much autonomy AIR has over remediation. You configure this per device group, which means different device populations can have different levels of automation — aggressive automation for standard workstations, conservative automation for critical servers.

AIR Automation Levels — Comparison

Level	AIR investigates?	Remediation	Best for
Full	✓ Automatic	Automatic — no approval	Standard workstations with known baselines
Semi (any remediation)	✓ Automatic	Pending — analyst approves all	Initial deployment, critical servers
Semi (core folders)	✓ Automatic	Auto except system directories	Intermediate — auto for user dirs
Semi (non-temp folders)	✓ Automatic	Auto only in temp directories	Conservative — auto for temp only
No automated response	✗ None	✗ None	Not recommended

Recommendation: Start all device groups at "Semi — require approval for any remediation." After 30-60 days of reviewing AIR recommendations and confirming a 95%+ approval rate, upgrade to Full for standard workstations. Keep critical servers at Semi permanently.

Full — remediate threats automatically. AIR investigates and remediates without analyst intervention. When a malicious file is detected, it is quarantined automatically. When a persistence mechanism is found, it is removed automatically. When a malicious process is running, it is terminated automatically. No analyst approval is required.

This is the most efficient level — threats are contained in seconds rather than minutes or hours. The risk is that an incorrect classification leads to automatic remediation of a legitimate file or process. In practice, AIR’s false positive rate for remediation recommendations is very low, but it is not zero. Use full automation for device groups where you have high confidence in AIR’s accuracy (typically standard workstations with well-known software baselines) and where the impact of a false positive is recoverable (the quarantined file can be restored).

Semi — require approval for any remediation. AIR investigates fully but all remediation actions are placed in the Pending tab of the Action Center, waiting for analyst approval. This is the recommended starting configuration for organizations deploying AIR for the first time. It gives you the full investigation benefit (AIR’s analysis is immediate) while maintaining human control over remediation. After 30-60 days of reviewing AIR’s recommendations and confirming their accuracy, upgrade high-confidence device groups to full automation.

Semi — require approval for core folders remediation. AIR remediates automatically for most locations but requires approval for actions involving Windows system directories (core folders). This provides a middle ground for organizations that want automated remediation of malware in user directories and temp folders but want to manually verify before modifying system files.

Semi — require approval for non-temp folders remediation. Similar to the above but only auto-remediates in temporary directories. Files in any other location require approval.

No automated response. AIR does not investigate or remediate. Alerts are generated but no automated analysis occurs. This effectively disables AIR and is not recommended — even if you want manual control over remediation, the investigation analysis that AIR provides is valuable for analyst efficiency.

Configure automation per device group, not globally

Different device populations have different risk profiles and different tolerance for automated remediation. Standard workstations with predictable software baselines can safely use full automation. Critical servers that run custom applications should use semi-auto to prevent AIR from quarantining a file that is actually a custom application component. Domain controllers should use semi-auto with analyst review because any incorrect remediation on a DC can disrupt authentication for the entire domain.

What AIR remediates

AIR can take the following remediation actions. Each action is reversible through the Action Center — if AIR quarantines a file that turns out to be legitimate, you can restore it.

Quarantine a file. Moves a malicious file to the Defender quarantine, preventing execution. The file remains on disk in an encrypted quarantine store and can be restored if the quarantine was incorrect.

Stop and quarantine a process. Terminates a running malicious process and quarantines its executable. This handles in-memory threats — the process is killed immediately and the binary is prevented from running again.

Remove a scheduled task. Deletes a scheduled task that was identified as a persistence mechanism. AIR cross-references task creation time against the investigation timeline to determine whether the task was created by the attacker.

Remove a registry value. Deletes a registry auto-start entry (Run key, RunOnce key, Winlogon modification) identified as attacker persistence.

Block a URL or IP. Creates an indicator that blocks network communication with a malicious destination identified during the investigation.

Each recommended action in the Action Center shows the specific artifact, the evidence that supports the recommendation, and the confidence level. Review this evidence before approving — particularly for registry and scheduled task removals, which can affect system behavior if AIR incorrectly identifies a legitimate item as attacker persistence.

Reviewing and managing pending actions

When your automation level includes approval requirements, pending actions accumulate in the Action Center. Managing these pending actions is a daily operational task.

Navigate to the Action Center → Pending tab. Each pending action shows the investigation it belongs to, the recommended action, the target artifact, the evidence summary, and the time the recommendation was created. Sort by age (oldest first) to ensure no action sits pending indefinitely.

Review process for each pending action:

Read the investigation summary. What alert triggered the investigation? What did AIR find? Is the verdict reasonable given the evidence?

Examine the specific artifact. For a file quarantine: is the file known? What is its prevalence in your organization (how many other devices have it)? Is it signed? What is the VirusTotal detection ratio? For a persistence removal: when was the entry created? Does it align with the investigation timeline? Does the entry path look legitimate or suspicious?

Approve or reject. If the evidence supports the recommendation, approve. If you disagree with AIR’s assessment (the file is a legitimate application, the scheduled task is a known business process), reject with a reason. Rejections contribute to AIR’s learning — they help refine future investigations.

Operational target: clear all pending actions within 30 minutes of arrival. Pending actions represent identified threats that are waiting for containment. Every minute a pending action sits unapproved is a minute the threat remains active. If you consistently find yourself approving 95% or more of pending actions for a device group, that device group is a candidate for upgrading to full automation.

Attack disruption

Attack disruption is the most aggressive form of automated response in the Defender XDR ecosystem. When the AI model detects a high-confidence attack pattern — ransomware encryption activity, business email compromise, or coordinated credential theft — it automatically takes containment actions without waiting for AIR’s standard investigation cycle or analyst approval.

Attack disruption can automatically isolate a device (cutting network access immediately), disable a user account (preventing the compromised identity from accessing any resource), and contain a user (blocking the identity at the endpoint level across all onboarded devices). These actions execute within seconds of the AI model’s confidence threshold being met.

The key difference from standard AIR: attack disruption operates at the incident level, not the alert level. It requires cross-product signal correlation (endpoint + identity + email + cloud apps) to reach the confidence threshold. A single endpoint alert does not trigger disruption. A coordinated pattern — phishing email delivered, credentials stolen, attacker sign-in detected, inbox rules created, lateral movement attempted — triggers disruption because the AI model recognizes the multi-stage attack pattern.

Attack disruption actions are logged in the incident timeline with a “Disrupted” badge. Each action can be reviewed and reversed if the disruption was a false positive — but in practice, the confidence threshold is set high enough that false positive rates are extremely low. Microsoft reports that attack disruption reduces the time from detection to containment from an average of 2-4 hours (manual SOC response) to under 3 minutes (automated disruption).

Configuring attack disruption: Attack disruption is enabled by default in Defender XDR. It does not require separate configuration beyond having the individual products (MDE, Defender for Identity, MDO, MDA) operational. The AI model uses signals from all connected products to build the correlation confidence. The more products you have active, the more accurate the disruption decisions become.

Tuning AIR for your environment

AIR’s effectiveness depends on your environment configuration. Out of the box, AIR works well for common threat patterns. To optimize it for your specific environment, consider these tuning strategies.

Reduce false positives through exclusions. If AIR repeatedly investigates the same legitimate process or file (a custom application that triggers behavioral detection), add it to the appropriate exclusion list. This reduces unnecessary investigation cycles and prevents the Pending tab from filling with actions you always reject.

Review investigation efficiency metrics. In the portal, review the statistics for completed automated investigations: how many resulted in remediation, how many found no threats, how many were partially investigated. A high “no threats found” rate suggests your detection rules are generating too many false positive alerts that AIR wastes cycles investigating. Tune the underlying detections (subsection 2.4) to reduce noise.

Gradual automation escalation. Start every device group at semi-auto. After 30 days of reviewing and approving AIR recommendations, calculate your approval rate. If you approve 95%+ of recommendations for a device group, upgrade it to full automation. If certain action types always require approval (registry modifications on specific servers), keep those at semi-auto while upgrading the rest.

Monitor AIR through Advanced Hunting. Query the automated investigation data to track AIR performance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// AIR investigation outcomes over the last 30 days
AlertInfo
| where Timestamp > ago(30d)
| where DetectionSource == "Automated investigation"
| summarize InvestigationCount = count(),
    Remediated = countif(Category == "Malware"),
    NoThreat = countif(Category == ""),
    PendingApproval = countif(AttackTechniques has "pending")
    by bin(Timestamp, 1d)
| render timechart

Try it yourself

Navigate to the Action Center in your Defender portal. Review the Pending tab (if any items exist) and the History tab (to see past automated actions). Then navigate to Settings → Endpoints → Device groups and check the automation level configured for each group. In your lab, the default device group is likely set to "Full" or "Semi." If it is Full, consider changing it to Semi for learning purposes — this lets you see AIR's recommendations in the Pending tab rather than having them execute automatically, giving you the opportunity to review AIR's investigation reasoning before approving.

What you should observe

The Action Center History tab shows every automated action taken in your tenant, including the investigation ID, the action type, the target device, and the result. Each entry links to the full investigation details where you can review what AIR examined and why it recommended that specific action. Understanding this workflow is critical because during a real incident, the Action Center is where you verify that AIR's automated response was appropriate and complete.

Knowledge check

Check your understanding

1. Your organization has just deployed MDE with AIR enabled. The SOC team has no prior experience with automated investigation. What automation level should you configure?

Semi — require approval for any remediation. This gives the SOC team the investigation benefits of AIR (immediate automated analysis) while requiring human approval for all remediation actions. Over 30-60 days, the team reviews AIR's recommendations, builds confidence in its accuracy, and identifies any false positive patterns specific to the environment. After this learning period, device groups with consistently high approval rates can be upgraded to full automation.

Full automation — get maximum protection immediately

No automated response — the team should investigate everything manually first

Full for workstations, semi for everything else

The progressive approach balances protection with operational safety. Full automation from day one risks disrupting business processes if AIR makes an incorrect classification in an environment the team does not yet fully understand. No automation wastes AIR's most valuable capability. Semi-auto with gradual escalation is the industry best practice for initial AIR deployment.

2. The Action Center shows 15 pending actions that have been waiting for 3 days. What operational problem does this indicate?

The SOC is not reviewing pending actions as part of their shift routine. Each pending action represents an identified threat that AIR has analyzed and recommended remediation for — but no analyst has approved the remediation. For 3 days, those threats have remained active in the environment. This is a process failure: the shift-start routine (Module 1.7) should include reviewing and clearing all pending actions. If the SOC consistently cannot keep up with pending actions, the device groups should be upgraded to full automation.

AIR is misconfigured and generating false recommendations

The pending actions are informational and do not require attention

15 pending actions over 3 days is normal operational volume

Stale pending actions represent a gap between detection and response. AIR identified the threats and recommended specific remediations. The SOC's failure to review them means those threats are contained by detection but not by remediation — the malicious files remain on disk, the persistence mechanisms remain active, and the attacker's tools remain available for use. This gap is exactly what automated attack disruption aims to close for high-confidence attacks.

3. Attack disruption automatically isolates a device and disables a user account during what appears to be a ransomware attack. The affected user is the CFO and the device is their primary workstation. The CFO's assistant calls demanding the device be restored immediately. What do you do?

Review the attack disruption evidence in the incident timeline first. Attack disruption triggers at a high confidence threshold with cross-product correlation. If the evidence supports the ransomware classification (encryption behavior detected, suspicious process chain, C2 communication), the disruption was correct and the device must remain isolated until remediation is complete. Explain to the assistant that the device was automatically quarantined to prevent data loss and that restoration depends on the investigation outcome. If the evidence is weak or the disruption appears to be a false positive (very rare), reverse the actions after documenting your reasoning.

Restore access immediately — the CFO's work takes priority over security

Disable attack disruption to prevent future false positives

Escalate to the CISO and wait for instructions

Attack disruption actions are evidence-based and taken at high confidence. Restoring access without reviewing the evidence risks enabling an active ransomware attack to continue encrypting files. Disabling disruption removes a critical automated defense. The correct approach is evidence-first: review the incident, verify the classification, and make a decision based on technical evidence — not organizational pressure. If the disruption was correct, the CFO's workstation was about to be encrypted. The 3-minute automated containment prevented what could have been catastrophic data loss.

← 2.6 Response Actions and Live Response 2.8 Threat and Vulnerability Management →