In this section

1.8 Testing Automation Safely

5 hours · Module 1 · Free

What you already know

You have built an enrichment playbook (Section 1.4), configured its permissions (Section 1.5), connected entity extraction (Section 1.6), and added error handling (Section 1.7). Before this playbook runs on real incidents, you need to validate that it works. This section teaches the testing methodology that scales from low-risk enrichment playbooks through high-risk containment playbooks. The methodology applies to every playbook built in this course.

Scenario

The automation engineer finishes a containment playbook that disables compromised accounts. They test it by running it against a real High-severity incident in the production queue. The playbook works correctly: it extracts the account entity, evaluates the risk, and disables the account. The account belongs to the Director of Finance. The Director's session drops during a board presentation. The incident was a true positive, but the containment was premature. No analyst had reviewed the alert. No one had verified the classification. The playbook did exactly what it was designed to do. The testing methodology was what failed.

Why testing rigor scales with blast radius

Enrichment playbooks read data. They add comments. They update tags. If an enrichment playbook runs on a test incident and produces incorrect output, the worst case is a misleading comment on an incident you close immediately. The blast radius is a single incident record.

Containment playbooks disable accounts, isolate devices, revoke sessions, and block IP addresses. If a containment playbook runs on the wrong incident or executes incorrectly, the blast radius extends to real users, real systems, and real business operations. A disabled service account breaks application integrations. An isolated endpoint blocks a remote worker. A revoked session interrupts a customer-facing demo.

The testing methodology for each tier reflects this risk gradient. Tier 1 enrichment is safe to test in production with manual incidents. Tier 3 containment requires dedicated test accounts, a dry-run validation period, and explicit promotion to live execution.

Figure 1.8a: Testing requirements by automation tier. Read-only operations can be tested in production. Write operations that affect real users and systems require isolated test targets and a dry-run validation period.

Creating test incidents

You need test incidents to trigger your playbooks. Sentinel provides three methods for creating test incidents: the portal UI, the API, and the Logic Apps connector.

The portal method is the fastest for one-off testing. In the Defender portal, navigate to Microsoft Sentinel → Incidents → Create incident (Preview). Set a title that identifies the test ("SA-TEST: Brute Force — Test Enrichment"), select High severity, set status to New, and add entities manually. The Account entity needs at minimum a Name and UPNSuffix that match an account your playbook can enrich. The IP entity needs an Address field. Create the incident, and it appears in the queue immediately. Automation rules evaluate it on creation, and any playbook connected to a matching automation rule fires.

The API method is better for repeated testing because you can script it. The Incidents API endpoint accepts PUT requests with the incident properties in JSON. You create a named test incident, trigger the playbook, validate the results, then delete the incident via the API. Script the entire cycle and rerun it whenever you modify the playbook.

PowerShell

# Create a test incident via the Sentinel API
# Requires: Az module authenticated, Sentinel Responder role
$subscriptionId = "your-subscription-id"
$resourceGroup = "rg-sentinel-prod"
$workspace     = "law-sentinel-prod"
$incidentId    = [guid]::NewGuid().ToString()
$uri = "https://management.azure.com/subscriptions/$subscriptionId/resourceGroups/$resourceGroup/providers/Microsoft.OperationalInsights/workspaces/$workspace/providers/Microsoft.SecurityInsights/incidents/$($incidentId)?api-version=2023-11-01"
$body = @{
    properties = @{
        title       = "SA-TEST: AiTM Token Theft - Enrichment Validation"
        severity    = "High"
        status      = "New"
        description = "Automated test incident for playbook validation. Safe to close."
        labels      = @(@{ labelName = "sa-test"; labelType = "User" })
    }
} | ConvertTo-Json -Depth 5
Invoke-AzRestMethod -Uri $uri -Method PUT -Payload $body
# After testing, delete the incident:
# Invoke-AzRestMethod -Uri $uri -Method DELETE

One limitation: incidents created through the portal and API do not automatically carry entity data the same way incidents generated by analytics rules do. Analytics rules populate the entity collection from entity mapping (Section 1.6). Manually created incidents require you to add entities through the portal's entity picker or include them in the API payload. If your playbook depends on entity extraction and the test incident has no entities, the For Each loop iterates zero times and the test appears to succeed when it did not actually exercise the enrichment path. This is the single most common testing mistake in Sentinel automation.

For API-created incidents, the portal entity picker handles entity creation after the incident exists. For scripted testing, the Sentinel connector's "Create Incident" Logic Apps action includes entity fields that accept Account, IP, and Host entities as structured JSON. The advantage of the Logic Apps action over the raw REST API is that the connector formats entity data into the correct schema automatically. If you use the raw API, you need to construct the entity objects with the correct kind, properties, and entity mapping format. Get the schema wrong and the entities exist in the incident payload but the playbook's Get Accounts action returns nothing because the schema does not match what the extraction action expects.

Reading the Logic App run history

The run history is your primary debugging tool. After triggering a playbook on a test incident, navigate to the Logic App in the Azure portal → Overview → Run history. Click the run to open the execution trace. Each action shows its status (Succeeded, Failed, Skipped), start time, duration, inputs, and outputs.

The inputs and outputs are the critical data. Click any action to expand its detail. The inputs show exactly what the action received: the HTTP request body, the KQL query text, the entity values. The outputs show exactly what the action returned: the HTTP response code, the query results, the parsed JSON. When an action fails, the outputs contain the error message and HTTP status code. When an action is Skipped, it means the preceding action failed and the default Run After condition (requires Succeeded) prevented execution. The distinction matters because a skipped action is not the same as a successful one. A run where the error handling scope catches the failure and the top-level status shows Succeeded can still contain skipped enrichment actions. The overall run status does not tell you whether the playbook produced the correct output.

For playbooks with parallel branches, each branch shows its execution independently. If one branch enriches from the Graph API and another enriches from a KQL query, the run history shows both branches with separate timelines. A failure in one branch does not necessarily affect the other. Expand both branches and verify each one individually.

For the enrichment playbook from Section 1.4, check these outputs in sequence: the incident trigger shows the full incident payload with entities. The Get Accounts action shows the parsed Account entities. The KQL query action shows the query text (with dynamic values substituted) and the results. The HTTP action shows the Graph API request and response. The Add Comment action shows the comment text. Walk through each action's inputs and outputs, comparing what was sent against what was returned. This is how you find misconfigurations: a KQL query that uses the wrong column name, a Graph API call that targets the wrong endpoint, an entity field reference that resolves to null.

The dry-run pattern for containment

Containment playbooks need a validation period where the playbook evaluates every condition and makes the containment decision, but does not execute the destructive action. The dry-run pattern provides this. Add a Logic App parameter called "DryRun" with a default value of "true." At the containment decision point, add a Condition action: if DryRun equals "true," skip the containment action and post a diagnostic comment. If DryRun equals "false," execute the containment action.

Analyst Decision

DRY RUN — SA-Contain-AccountDisable

Incident: NE-2026-04315 — Impossible Travel with Token Replay

Target account: d.chen@northgateeng.com

Risk score: High (Graph API IdentityRiskEvents: unfamiliarFeatures + anomalousToken)

VIP check: Not on VIP watchlist

Service account check: Not a service account

Decision: WOULD HAVE disabled account and revoked all sessions

Action taken: None (DryRun = true). Manual review required.

The dry-run comment validates the entire decision chain: entity extraction produced the correct account, the Graph API risk query returned the expected risk events, the VIP watchlist check correctly excluded the account, and the service account check correctly classified it. Every step except the final containment action was exercised. Run dry-run mode for one to two weeks, reviewing every dry-run comment against the analyst's manual classification. When the dry-run decisions match the analyst's decisions consistently, change the parameter to "false" and the playbook begins executing containment.

The dry-run period also discovers watchlist gaps. The playbook processes real incidents at production volume. If a service account is not on the exclusion watchlist and the playbook would have disabled it, the dry-run comment surfaces that gap before the containment action could cause an outage.

Testing containment playbooks against real accounts

Create dedicated test accounts in Entra ID for containment testing. Name them explicitly: test-containment-01@northgateeng.com, test-containment-02@northgateeng.com. Add them to a "Test Accounts" security group. Configure MFA on both (so MFA reset testing works). Sign in as each test account to create active sessions (so session revocation testing works). Enroll a test VM in Defender for Endpoint if your playbooks include device isolation. The test infrastructure exists solely for automation testing. When you test the containment playbook, it disables the test account, isolates the test VM, and revokes the test session. Then you run the rollback and verify restoration. Both containment and rollback are tested before production deployment.

The four-step validation hierarchy

Every new playbook goes through four validation steps before it runs on production incidents. Each step catches a different category of error.

Step 1: Desk check. The automation engineer reads through the Logic App designer action by action. Does the trigger match the intended incident type? Does the entity extraction reference the correct entity types? Does each HTTP action call the correct API endpoint? Does the error handling cover expected failure modes? The desk check catches configuration errors: wrong API URLs, missing parameters, incorrect entity references. Five minutes, zero risk, catches the most common mistakes.

Step 2: Manual test. Create a test incident with the expected entities. Manually trigger the playbook from the incident page (Actions → Run playbook). Watch the execution in real time via Logic App run history. Verify each action's inputs and outputs. Check the incident comment. Does the output match expectations? Fifteen minutes, catches logic errors that the desk check misses because they only appear at runtime.

Step 3: Dry-run period. For containment playbooks, deploy with DryRun set to "true" and let it process real incidents for one to two weeks. Review every dry-run comment. Compare the playbook's decision against the analyst's classification. Identify watchlist gaps, edge cases, and entity mapping issues that only appear at production volume. Track disagreements: if the playbook would have contained an account that the analyst classified as a false positive, that is a tuning issue in the analytics rule, not the playbook. If the playbook would have skipped containment on an account the analyst manually contained, that is a gap in the decision logic. Both outcomes inform changes before you switch to live execution. This step is unique to containment. Enrichment playbooks go from Step 2 directly to Step 4.

Step 4: Production canary. Deploy to production (or switch DryRun to "false" for containment) and monitor the first five executions manually. If all five produce correct results, the playbook is validated. If any execution is incorrect, disable the playbook, investigate, fix, and restart the canary. The canary catches environment-specific issues: permission differences between test and production, data format variations, timing dependencies. Keep the monitoring window open for 48 hours after promotion. Some issues only surface at volume or when the playbook encounters incident types that did not appear during the dry-run period. Document the canary results alongside the dry-run analysis so the next engineer who modifies the playbook understands the validation history.

Automation Principle

The cost of testing is measured in minutes. The cost of an untested containment playbook disabling the wrong account during a board meeting is measured in credibility. Every playbook in this course specifies its testing tier. Enrichment playbooks are tested in production with manual incidents. Containment playbooks go through the full four-step hierarchy including a dry-run period. Do not skip steps to save time. The time you save is the time you spend explaining why automation broke production.

Section 1.9 covers monitoring automation health: how to detect silent playbook failures before analysts discover them, the analytics rules that watch Logic App execution status, and the health dashboard that gives the automation owner visibility into the entire playbook fleet.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →