In this section

1.11 Guided Walkthrough — Connecting the Automation Chain

5 hours · Module 1 · Free

What you already know

Sections 1.1 through 1.10 built individual skills: automation rule configuration, playbook architecture, managed identity permissions, entity extraction, error handling, testing methodology, monitoring queries, and cost estimation. Each section treated its topic in isolation. This walkthrough tests the integration points between those topics. An automation rule that triggers a playbook with an entity extraction step that fails because the analytics rule does not map account entities, which should invoke the error handling path, which should be visible in the monitoring query, which should be costed differently because the For Each loop executes zero iterations instead of two. That chain crosses six sections. You cannot validate it by reading any one of them.

Scenario

Northgate Engineering's SOC has approved three automation rules and one enrichment playbook for production. Before deployment, the security architect must demonstrate that the components work together: that rules trigger the playbook on the correct incidents, that entity extraction handles incidents with and without account entities, that error handling produces visible incident comments when enrichment fails, and that monitoring detects both full failures and partial failures. The walkthrough takes you through six exercises that trace data from an incident through the entire automation chain.

Figure 1.11a: The six-exercise walkthrough traces data from incident creation through the entire automation chain. Each exercise validates an integration point between sections.

Prerequisites: Sentinel workspace with at least one enabled analytics rule. The Sentinel connector's incident trigger requires an active workspace. You do not need live incidents for most exercises. Where an exercise requires a test incident, the instructions show you how to create one.

Exercise 1: Automation rule execution order and interaction

This exercise connects Section 1.1 (automation rules) with Section 1.3 (your first automation rule). You build two rules that operate on the same incident and observe whether execution order produces the expected outcome.

Create two automation rules in your Sentinel workspace. The first rule (order 100) triggers on incident created, matches incidents where the title contains "AiTM," and performs two actions: change severity to High and add the tag auto-severity-override. The second rule (order 200) triggers on incident created, matches incidents where severity equals High, and adds the tag high-severity-review.

The question: does the second rule see the severity change made by the first rule?

Create a test incident with the title "AiTM phishing detection" and an initial severity of Medium. After both rules execute, check the incident. If execution order works as documented, the incident should have High severity and both tags. If the second rule evaluated severity before the first rule's change took effect, the incident would have only the auto-severity-override tag and would be missing high-severity-review.

KQL

// Verify both automation rules executed on the test incident
SecurityIncident
| where Title has "AiTM"
| where CreatedTime > ago(1h)
| project Title, Severity, Labels,
    TagCount = array_length(Labels)
| extend HasSeverityTag = Labels has "auto-severity-override",
    HasReviewTag = Labels has "high-severity-review"

The expected result: both tags present. Automation rules execute sequentially in order number. Rule 100 completes (severity changes to High, tag applied) before rule 200 evaluates its condition. Rule 200 sees the updated severity and matches. This is the sequential execution model from Section 1.1. If you find only one tag, check the execution order values. Two rules with the same order number execute in an undefined sequence, and the second rule's condition evaluation may not see the first rule's changes.

Record the execution latency. From incident creation to both tags appearing, how many seconds elapsed? Automation rules typically complete within 5 to 10 seconds for simple property changes. This baseline matters because in Exercise 3 you add a playbook trigger at order 300, and the playbook adds 15 to 45 seconds of execution time on top of the automation rule latency.

Exercise 2: Managed identity permission boundaries

This exercise connects Section 1.5 (authentication and permissions) with the enrichment queries from Section 1.4 (your first playbook). You test what happens when a managed identity lacks the permissions a playbook action requires.

Open your enrichment playbook in the Logic App designer. Locate the HTTP action that queries the Microsoft Graph API for user risk data. The action calls https://graph.microsoft.com/v1.0/identityProtection/riskyUsers and requires the IdentityRiskyUser.Read.All Graph API permission on the managed identity.

Before running this exercise, verify the current permissions assigned to the playbook's managed identity:

PowerShell

# Get the managed identity's current Graph API permissions
$playbook = Get-AzResource -Name "SA-Playbook-Enrichment" `
    -ResourceType "Microsoft.Logic/workflows"
$spId = (Get-AzADServicePrincipal `
    -DisplayName "SA-Playbook-Enrichment").Id
# List all app role assignments for this service principal
Get-MgServicePrincipalAppRoleAssignment `
    -ServicePrincipalId $spId |
    Select-Object AppRoleId, ResourceDisplayName, CreatedDateTime

If the managed identity has IdentityRiskyUser.Read.All, the risk query succeeds and the playbook adds a formatted enrichment comment to the incident. If it does not, the HTTP action returns a 403 Forbidden response, and the behavior depends entirely on whether the error handling from Section 1.7 is in place.

Test both cases. First, confirm the playbook succeeds with the permission present. Then temporarily remove the Graph permission (or create a second test playbook without it) and trigger the playbook again. With proper Scope-based error handling, the playbook should catch the 403, add an incident comment documenting the failure ("Risk enrichment unavailable: insufficient Graph API permissions"), and continue to the Teams notification. Without error handling, the entire playbook fails and the incident receives no enrichment at all.

CLI Output

Run History — SA-Playbook-Enrichment
Run ID:        08585698-3a7b-4b2c-9e1f-abc123def456
Status:        Succeeded (with partial failure)
Start:         2026-05-22T09:14:22Z
Duration:      18.4 seconds
Actions:
  ✓ Microsoft Sentinel Incident (trigger)         0.8s
  ✓ Entities - Get Accounts                       1.2s
  ✓ For Each Account
    ├─ Account: tom.ashworth@northgate.co.uk
    │   ✓ Run KQL Query — SigninLogs              3.1s
    │   ✗ HTTP — Get User Risk (403 Forbidden)    0.4s
    │   ✓ Scope Error Handler                     0.2s
    │   ✓ Add Comment — Partial Enrichment        2.8s
    └─ Account: priya.sharma@northgate.co.uk
        ✓ Run KQL Query — SigninLogs              2.9s
        ✗ HTTP — Get User Risk (403 Forbidden)    0.3s
        ✓ Scope Error Handler                     0.1s
        ✓ Add Comment — Partial Enrichment        2.7s
  ✓ Post to Teams                                 1.4s

The key observation: the overall run status is "Succeeded" even though the Graph API calls failed. The Scope-based error handler caught the 403, documented it, and allowed the playbook to continue. The monitoring query from Section 1.9 should still detect this as a partial failure because the action-level status is "Failed." Verify this by running the monitoring query and confirming the failed HTTP actions appear in the results even though the overall run succeeded.

This exercise demonstrates why least-privilege permission scoping from Section 1.5 and Scope-based error handling from Section 1.7 are interdependent. If you scope permissions tightly (correct), some enrichment sources will occasionally fail when permissions change or when a new enrichment source requires an additional Graph scope. Error handling catches those failures gracefully. Tight permissions without error handling produces brittle playbooks that fail completely when any single enrichment source is unavailable.

Exercise 3: Entity extraction across incident types

This exercise connects Section 1.6 (entity extraction) with Section 1.7 (error handling). You test entity extraction against two incident types that produce different entity mappings.

Sentinel analytics rules map entities based on the KQL query's output columns. A rule that detects suspicious sign-ins maps Account entities from the UserPrincipalName column and IP entities from the IPAddress column. A rule that detects port scanning maps only IP entities because the detection query operates on network flow data with no user identity column.

Trigger your enrichment playbook against both incident types. For the sign-in incident, the "Entities - Get Accounts" action returns one or more account objects, and the For Each loop executes normally. For the port scan incident, "Entities - Get Accounts" returns an empty array.

What happens to a For Each loop when the input array is empty? Logic Apps skips the loop entirely. No iterations execute. No enrichment comment is added. No error is thrown. The playbook run status is "Succeeded" because nothing failed. The incident simply receives no enrichment. The monitoring query from Section 1.9 does not flag this as a failure because no action failed.

This is the silent gap that Section 1.6 warned about. The fix is a Condition action after entity extraction that checks the array length:

JSON

{
  "type": "If",
  "expression": {
    "and": [
      {
        "greater": [
          "@length(body('Entities_-_Get_Accounts')?['Accounts'])",
          0
        ]
      }
    ]
  },
  "actions": {
    "For_Each_Account": {
      "type": "Foreach",
      "foreach": "@body('Entities_-_Get_Accounts')?['Accounts']"
    }
  },
  "else": {
    "actions": {
      "Add_Comment_No_Entities": {
        "type": "ApiConnection",
        "inputs": {
          "body": {
            "incidentArmId": "@triggerBody()?['object']?['id']",
            "message": "Automation note: No account entities mapped for this incident. Manual enrichment required. Analytics rule may need entity mapping update."
          }
        }
      }
    }
  }
}

After adding the Condition action, trigger the playbook against the port scan incident again. The incident should now receive a comment documenting the absence of account entities and recommending manual enrichment. The monitoring query can also be extended to detect "no entity" runs by looking for incidents where the playbook ran but no enrichment comment was added (only the "no entities" comment).

Estimate the cost difference between the two runs. The sign-in incident with two account entities executes approximately 10 actions (Section 1.10 calculation). The port scan incident with zero entities executes approximately 4 actions: trigger, Get Accounts, Condition (false branch), Add Comment. The cost per run drops by 60%. For environments where network-based incidents outnumber identity-based incidents, this difference changes the monthly cost projection significantly.

Exercise 4: Monitoring coverage for partial failures

This exercise connects Section 1.9 (monitoring) with the error scenarios from Exercises 2 and 3. You verify that your monitoring queries detect all three failure modes: complete failure, partial failure, and silent no-entity runs.

Run the health monitoring query from Section 1.9 against your workspace. It should show three categories of results from the previous exercises:

KQL

// Extended health monitoring: three failure categories
let lookback = 1d;
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.LOGIC"
| where resource_workflowName_s has "SA-Playbook"
| where TimeGenerated > ago(lookback)
| where Category == "WorkflowRuntime"
| where OperationName has "workflowActionCompleted"
| extend ActionName = resource_actionName_s,
    ActionStatus = status_s,
    ErrorCode = code_s
| summarize
    TotalActions = count(),
    FailedActions = countif(ActionStatus == "Failed"),
    FailedActionNames = make_set_if(ActionName, ActionStatus == "Failed")
    by resource_runId_s, resource_workflowName_s
| extend FailureCategory = case(
    FailedActions == TotalActions, "Complete Failure",
    FailedActions > 0, "Partial Failure",
    "Success")
| where FailureCategory != "Success"
| project resource_runId_s, resource_workflowName_s,
    FailureCategory, FailedActions, TotalActions,
    FailedActionNames

If your monitoring query only checks for run-level "Failed" status, it misses partial failures entirely. The playbook run with 403 errors on the Graph API call completed successfully at the run level. Only the action-level query catches those failures. This is the monitoring gap that Section 1.9 described. Verify that your implementation catches it.

The no-entity scenario from Exercise 3 is harder to detect through diagnostic logs because no action failed. Detecting no-entity runs requires a different approach: query the SecurityIncident table for incidents that have a playbook automation tag but no enrichment comment, or build a custom tracking mechanism where the playbook always writes at least one comment (either the enrichment data or the "no entities" notification).

Exercise 5: End-to-end validation with a realistic incident

This exercise assembles the full chain: automation rule fires, sets severity and tags, triggers the playbook, the playbook extracts entities, queries sign-in history, queries user risk, formats the enrichment comment, posts to Teams. You create a test incident that exercises every component.

Create a test incident with the following properties: title containing "AiTM," Medium severity, and at least one Account entity mapped. If your workspace has a test analytics rule, trigger it against a test sign-in record. If not, use the Sentinel API to create a test incident with mapped entities.

After the automation chain completes (allow 30 to 60 seconds for the full sequence), verify the following outcomes against the incident:

Analyst Decision

Automation Rule Validation: Severity changed from Medium to High (Rule 100). Tags include auto-severity-override (Rule 100) and high-severity-review (Rule 200). Both rules executed within 10 seconds of incident creation.

Playbook Trigger Validation: Playbook triggered by Rule 300 (severity equals High). Run history shows the trigger input contains the incident ARM ID, severity High, and the mapped entity list.

Entity Extraction Validation: Get Accounts action returned at least one account object. The account UPN matches the entity mapped by the analytics rule. For Each loop iterated once per account.

Enrichment Query Validation: KQL query returned sign-in records from the past 7 days. HTTP action to Graph riskyUsers returned a 200 response with the user's risk level (or "none" if the user has no risk detections).

Output Validation: Incident comment contains formatted enrichment data: sign-in count, most recent sign-in location, authentication method, user risk level. Teams message posted to the SOC channel with incident title, severity, and enrichment summary.

Monitoring Validation: AzureDiagnostics shows the run with all actions in "Succeeded" status. No partial failures. Run duration under 45 seconds.

If any validation point fails, the run history in the Logic App designer is your primary debugging tool. Click the failed run, then click each action to see its inputs and outputs. The most common failures in this exercise: the playbook trigger does not fire because the automation rule's "Run playbook" action is not configured with the correct playbook resource ID, the KQL query returns empty results because the sign-in data for the test user is not in the workspace, or the Teams notification fails because the managed identity does not have permission to post to the target channel.

Exercise 6: Cost projection for your environment

This exercise connects Section 1.10 (cost management) with the actual execution data from Exercises 1 through 5. Instead of estimating costs from theoretical action counts, you use the real run data from your test runs.

KQL

// Count actual actions per run from diagnostic logs
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.LOGIC"
| where resource_workflowName_s has "SA-Playbook"
| where OperationName has "workflowActionCompleted"
| summarize ActionsPerRun = count()
    by resource_runId_s
| summarize
    AvgActions = round(avg(ActionsPerRun), 1),
    MinActions = min(ActionsPerRun),
    MaxActions = max(ActionsPerRun),
    TotalRuns = count()

The query returns the actual action count per run, averaged across all your test executions. Compare this number with the theoretical estimate from Section 1.10. If the estimate assumed 2 entities per incident and your test incidents had 1 entity, the actual action count is lower. If your error handling path executed additional actions (the Scope error handler, the fallback comment), the actual count is higher than the theoretical estimate.

Multiply the average actions per run by your workspace's actual incident volume (query SecurityIncident | where CreatedTime > ago(30d) | count) and the connector action rate ($0.000125) to produce a cost projection grounded in real execution data rather than estimates. This is the number you put in the automation business case: not a theoretical model, but a measured projection from your own environment's data.

Testing components in isolation and assuming they integrate

The automation rule works. The playbook works. The monitoring query works. So the team deploys to production, and the first incident with no mapped entities produces an empty enrichment comment that the monitoring query does not detect because it only checks for failed runs, not empty results. Each component passed its individual test. The integration failed because nobody tested the chain. Exercises 3 through 5 exist specifically to catch this class of failure: the entity edge case that produces a successful run with useless output.

Automation Principle

Validation is integration testing. Individual sections teach individual skills. The walkthrough tests whether those skills work together. An automation chain where entity extraction, error handling, monitoring, and cost estimation interact correctly is more valuable than any single component working in isolation. Test the chain, not the parts.

Section 1.12 summarizes the module: the automation rule and playbook foundations, the authentication model, the entity extraction patterns, the error handling and monitoring layers, and the cost estimation method. It connects forward to SA2 where the single enrichment query from this module expands into a full enrichment pipeline.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →