In this section

Check My Knowledge

5 hours · Module 1 · Free

Scenario 1. Northgate Engineering's SOC receives 120 incidents per day. Forty of those are AiTM phishing alerts that always require senior analyst review. The SOC lead wants to automatically set the severity to High, assign the incidents to the senior analyst, and add a triage-priority tag. The security architect is deciding between an automation rule and a playbook. Which approach is correct, and why?

Build a Logic App playbook with three actions: Update Incident (severity), Update Incident (owner), Update Incident (tag). Playbooks provide a run history that documents every change for audit purposes. a
Playbooks do provide run history, but that benefit does not justify the overhead. These three actions are simple incident property changes that automation rules handle natively. A playbook consumes Logic App resources ($0.000125 per connector action) and adds 15 to 30 seconds of latency. The automation rule performs the same changes in under 5 seconds at zero cost. Run history for automation rules is available in the Automation blade's execution log.
Build a playbook because automation rules cannot change severity, assign owners, and add tags in a single rule. You would need three separate automation rules, one for each action. b
A single automation rule supports multiple actions. You can change severity, assign an owner, and add tags all within one rule's action configuration. There is no need for three separate rules or a playbook to combine these actions.
Build a single automation rule. The trigger is incident created, the condition matches titles containing "AiTM," and the actions are change severity to High, assign owner, and add tag. Automation rules handle simple property changes without Logic App overhead, execute in under 10 seconds, and cost nothing. c
Correct. Section 1.1 established the decision boundary: automation rules for incident property changes (severity, status, owner, tags), playbooks for multi-step workflows requiring external API calls. All three actions are property changes. The automation rule completes faster, costs nothing, and requires no Logic App infrastructure. Reserve playbooks for enrichment, containment, and notification workflows that require HTTP calls or connector integrations.
Skip both and create a custom analytics rule that outputs AiTM alerts at High severity with the senior analyst as the default owner. That way triage is handled at detection time. d
Coupling severity and assignment to the analytics rule means you must modify the detection rule every time triage routing changes. Section 1.1 explained why separation matters: detection rules define what to detect, automation rules define how to route and classify. Changing the analyst on call should not require editing an analytics rule. Automation rules keep those decisions independent.

Scenario 2. A playbook queries the Microsoft Graph API for user risk data using the managed identity. The HTTP action returns 403 Forbidden. The security architect checks the managed identity permissions and finds Microsoft Sentinel Responder and Log Analytics Reader assigned, but no Graph API permissions. What is the correct fix?

Assign the Global Reader role to the managed identity. Global Reader provides read access to all Graph API resources including identity protection data. a
Global Reader is an Entra ID directory role, not a Graph API application permission. Even if it were, it grants far broader access than needed. Section 1.5 established the principle: grant the specific Graph API permission the playbook needs, not a broad directory role. The 403 requires a targeted Graph API permission, not an administrative role.
Grant the IdentityRiskyUser.Read.All application permission to the managed identity through the Microsoft Graph PowerShell SDK or the Entra portal. This is the minimum permission for reading user risk data from the Identity Protection API. b
Correct. Section 1.5 covered the permission model: workspace-scoped roles (Sentinel Responder, Log Analytics Reader) control Sentinel and Log Analytics access, but Graph API endpoints require separate application permissions on the managed identity. IdentityRiskyUser.Read.All is the minimum scope for the riskyUsers endpoint. The permission is an application permission (not delegated) because managed identities authenticate as the application, not as a user.
Replace the managed identity with a service principal that has a client secret, because service principals support Graph API permissions and managed identities do not. c
Managed identities fully support Graph API application permissions. The 403 is caused by missing permissions, not by the authentication method. Section 1.5 explained why managed identity is preferred: it eliminates secret management entirely. A service principal with a client secret creates a credential that must be rotated, stored securely, and monitored for exposure. Managed identity avoids all of that.
Add the User.ReadWrite.All permission because the read-only IdentityRiskyUser scope does not cover the v1.0 API endpoint. The v1.0 riskyUsers endpoint requires write permissions. d
The v1.0 identityProtection/riskyUsers endpoint requires only IdentityRiskyUser.Read.All for read operations. User.ReadWrite.All grants write access to user profiles, which the enrichment playbook does not need and should not have. Section 1.5 emphasized: enrichment playbooks get read-only permissions. Write permissions belong to containment playbooks with separate managed identities.

Scenario 3. An enrichment playbook works correctly on AiTM incidents but produces no enrichment for firewall alert incidents. The playbook run history shows Status: Succeeded for both incident types. No actions failed. What is the most likely cause?

The firewall analytics rule does not map Account entities. The playbook's Get Accounts action returns an empty array. The For Each loop receives an empty input and Logic Apps skips it entirely. No actions execute inside the loop, so no actions fail, and the overall run succeeds with no enrichment output. a
Correct. Section 1.6 described this exact failure pattern: empty entity arrays cause For Each loops to execute zero iterations, producing a successful run with no output. The fix is a Condition action that checks array length before the loop and adds a "no entities available" comment in the false branch. The monitoring query from Section 1.9 needs to detect these silent successes, which requires checking for incidents with automation tags but no enrichment comments.
The managed identity does not have permissions to access firewall incidents. Sentinel permissions are scoped by incident source, and firewall incidents require a different role assignment. b
Sentinel workspace permissions are not scoped by incident source. If the managed identity has Sentinel Responder on the workspace, it can read and modify all incidents in that workspace regardless of which analytics rule or data connector created them. Section 1.5 covered the permission model: permissions are workspace-scoped, not incident-scoped.
The automation rule that triggers the playbook has a condition that filters for AiTM incidents only. The firewall incident does not match the trigger condition, so the playbook never runs. c
The scenario states that the playbook run history shows Succeeded for both incident types. If the automation rule filtered out firewall incidents, the playbook would not appear in the run history for those incidents at all. The playbook did run. It produced no enrichment because the entity extraction returned empty results, not because it was never triggered.
Logic Apps rate limited the playbook because both incidents triggered simultaneously. The firewall incident run was throttled and its enrichment actions were silently dropped. d
Logic Apps concurrency controls may delay execution, but they do not silently drop actions. A throttled run would show "Waiting" status or would fail with a specific throttling error code. Section 1.7 covered retry behavior for rate-limited actions. The scenario shows Succeeded status with no failed actions, which points to an empty iteration set, not throttling.

Scenario 4. A playbook enriches incidents from six sources in parallel branches. One branch queries VirusTotal and fails with a 429 rate limit error. The retry policy is exponential with 3 retries and a 10-second initial interval. All three retries also receive 429 responses. What should happen next in a correctly designed playbook?

The entire playbook should fail and the incident should receive no enrichment. A failure in any enrichment source means the enrichment data is incomplete and should not be added to the incident. a
Section 1.7 established the partial enrichment principle: five successful enrichment sources provide more investigative value than zero sources. Terminating the entire playbook because one source failed discards all the enrichment data that other branches successfully retrieved. The correct approach is to document the failure and continue.
The playbook should retry indefinitely until VirusTotal responds successfully. Rate limits are temporary, and the enrichment data is valuable enough to justify waiting. b
Indefinite retries block the playbook run and accumulate Logic App action costs for every retry attempt. If the rate limit persists for hours (which happens when daily quotas are exhausted), the playbook run stays open consuming resources while the incident sits unenriched. Section 1.7 specified bounded retry policies with a defined maximum retry count precisely to prevent this scenario.
The playbook should switch to a cached version of the VirusTotal data from a previous lookup. If the same IP was queried recently, the cached result is still useful. c
Enrichment caching is a valid optimization for high-volume playbooks, but it is not an error handling mechanism. The playbook does not inherently maintain a cache unless you build one (typically using Azure Table Storage or a KV store). Section 1.7 focused on Scope-based error handling that catches the failure, documents it, and lets the playbook continue. Caching is a performance optimization covered in later modules, not a substitute for error handling.
The Scope error handler around the VirusTotal branch catches the failure after all retries are exhausted. The handler adds an incident comment documenting which source failed and why. The other five enrichment branches complete normally. The incident receives enrichment from five of six sources with a clear note about the VirusTotal gap. d
Correct. Section 1.7 designed Scope-based error handling for exactly this scenario. Each enrichment branch runs inside its own Scope. If the branch fails (after all retries), the Scope's "Configure run after" catches the failure and executes the error handler. The error handler adds an incident comment identifying the failed source. The other Scope branches are unaffected because parallel branches in Logic Apps execute independently. Partial enrichment with documented gaps is the correct outcome.

Scenario 5. The security architect is ready to deploy the first enrichment playbook to production. The playbook has been tested in a staging workspace with synthetic incidents. The staging tests all passed. The SOC manager asks: what is the deployment plan? What is the correct approach?

Deploy the playbook directly to the production workspace. The staging tests validated the playbook's behavior. Production and staging use the same Logic App, so there is no additional deployment step. a
Production and staging typically use different Logic Apps in different resource groups or subscriptions. A playbook tested in staging needs to be recreated or deployed to the production resource group with production-specific configuration: production workspace ID in KQL queries, production managed identity with production permissions, production Teams channel webhooks. Section 1.8 covered the staging-to-production promotion process.
Deploy the playbook and the automation rule simultaneously, targeting all incident types. Maximize coverage from day one and monitor the run history for any failures during the first week. b
Deploying to all incident types simultaneously makes debugging difficult. If the playbook works on some incident types but fails on others (different entity mappings, different data volumes), diagnosing the root cause across dozens of incident types is harder than testing against a narrow, well-understood set first. Section 1.8 recommended a phased rollout starting with 3 to 5 high-volume, well-understood alert types.
Deploy the playbook to the production resource group with production-specific configuration. Create the automation rule targeting only AiTM incidents (the most common, best-understood alert type). Monitor run history for one week. Verify enrichment quality, action success rates, and execution timing against real production data. Then expand to additional incident types in week two. c
Correct. Section 1.8 established the phased deployment methodology: deploy to a narrow, well-understood scope first, validate with real production data, then expand. AiTM incidents are ideal as the first target because they consistently map Account entities (the playbook's primary extraction type), they occur frequently enough to generate meaningful run data within a week, and the enrichment output is immediately useful for analyst triage. The week-one monitoring period catches issues that staging cannot replicate: production data volumes, real entity mapping variations, and Graph API permission behavior against production Entra ID.
Deploy the playbook to production but keep it disconnected from any automation rule. Instead, run it manually against selected incidents for two weeks to build confidence before connecting the trigger. d
Manual execution provides controlled testing, but two weeks of manual operation delays the automation value the SOC needs. Section 1.8 balanced risk and value: the phased approach connects the playbook to a narrow automation rule scope from day one so the SOC receives immediate value while risk is contained. Manual execution for extended periods treats production as a second staging environment instead of a phased rollout.

Scenario 6. The monitoring dashboard shows that the enrichment playbook has a 94% success rate over the past 30 days. The remaining 6% are runs where individual enrichment actions failed but the overall run succeeded. The SOC manager considers 94% acceptable and does not want to investigate. Should the security architect agree?

Yes, 94% is above industry benchmarks for automation success rates. The 6% failure rate is within acceptable tolerances for a first-generation playbook. Focus engineering effort on building the next playbook rather than optimizing the current one. a
The 6% number is meaningless without understanding what is failing and why. Section 1.9 explained why action-level monitoring matters: if the same enrichment source fails on every run (for example, a Graph API permission that was accidentally revoked), the 6% represents a systematic issue that will persist or worsen. If the failures are distributed randomly across different sources due to transient network errors, they are genuinely acceptable. The success rate number alone does not answer the question.
No. The 6% failure rate requires investigation before acceptance. Query the diagnostic logs to identify which specific actions are failing, whether the same source fails repeatedly, and whether the failures correlate with specific incident types or time windows. A consistent failure in one enrichment source indicates a permission, configuration, or API issue that can be fixed. Random, distributed failures across sources indicate transient conditions that may genuinely be acceptable. b
Correct. Section 1.9 built the monitoring model specifically to answer this question. The extended health monitoring query categorizes failures by action name and run ID. If the same action (HTTP to VirusTotal, KQL query, Graph API call) appears in every failure, the root cause is identifiable and usually fixable. Accepting a failure rate without knowing its composition means accepting a potentially solvable problem as a permanent cost. Investigate first, then decide whether the residual rate is acceptable.
No. The target should be 100% success. Every failed enrichment action means an analyst receives an incomplete picture. Disable the playbook until all failure modes are resolved, then redeploy. c
100% success is unrealistic for playbooks that call external APIs. Network timeouts, API rate limits, and transient service outages will always produce occasional failures. Section 1.7 designed the partial enrichment model precisely because some failure is inevitable. Disabling the playbook removes all enrichment for all incidents, which is worse than delivering enrichment with a documented 6% gap. The correct response is to investigate and fix systematic failures, not to demand perfection or disable the automation entirely.
Yes, but add a weekly report that shows the failure rate trend. If the rate increases above 10%, escalate. Otherwise, the current rate is acceptable as-is. d
Trend monitoring is useful but does not replace root cause analysis. The failure rate could remain stable at 6% for months while a fixable permission issue persists, costing the SOC enrichment data on every affected incident. Section 1.9 emphasized that monitoring quantifies the problem; investigation identifies whether it is fixable. Setting a threshold without investigating the composition of failures means waiting for a problem to get worse before addressing a potentially simple fix.

Scenario 7. The enrichment playbook runs inside a For Each loop that iterates over account entities. Most incidents have 1 to 2 accounts. A new analytics rule generates incidents with 25 account entities (a bulk sign-in anomaly detection). After the rule goes live, the monthly Logic App cost increases from $8 to $47. What is the root cause, and what is the architectural fix?

The For Each loop executes all enrichment actions once per entity. With 25 entities per incident, every enrichment action runs 25 times instead of 2. The cost scales linearly with entity count. The fix is to add a cap on the For Each loop using the take() expression to process only the first 5 entities, or to filter entities by relevance before the loop using a Condition that selects only the primary account from the incident. a
Correct. Section 1.10 identified unbounded For Each loops as the primary runaway cost scenario. The calculation: 25 entities multiplied by 6 enrichment actions per entity equals 150 connector actions per incident. At 50 incidents per day with the bulk detection rule, that is 225,000 actions per month instead of the original 18,000. The take() expression or a filter Condition bounds the loop input. Most investigations need enrichment for the primary compromised account, not for all 25 entities in a bulk detection. Enriching the first 3 to 5 provides investigative value; enriching all 25 wastes API calls on accounts that share the same anomaly pattern.
The For Each loop should be replaced with a single enrichment action that accepts an array of UPNs. Querying all 25 accounts in a single Graph API call eliminates the per-entity cost multiplication. b
The Graph API riskyUsers endpoint does support filtering by multiple UPNs in a single call, but the KQL enrichment query and the incident comment formatting still need per-entity execution. Batching the Graph call reduces one action per entity but does not eliminate the loop. The fundamental issue is that 25 iterations of the full enrichment pipeline produce more actions than the cost model anticipated. Capping the entity count is the architectural fix because it limits all actions inside the loop, not just the Graph call.
Switch from the Consumption plan to the Standard plan. The Standard plan has a fixed monthly cost that does not increase with action volume, so the 25-entity incidents would not affect the bill. c
The Standard plan starts at approximately $150 per month for the base workflow service plan. The current Consumption cost is $47, so switching to Standard would increase costs rather than reduce them. Section 1.10 explained that Standard plan pricing only makes sense when Consumption costs consistently exceed the Standard base price. The correct fix is to reduce the action count by capping entity iteration, not to change the billing model.
Disable the bulk sign-in analytics rule. The cost increase is caused by the detection rule, not the playbook. Removing the high-entity-count incidents eliminates the cost spike. d
Disabling a detection rule to avoid automation costs defeats the purpose of the automation program. The detection rule identifies genuine anomalies that need investigation. Section 1.10 established the principle: optimize the automation to handle the detection, not the other way around. The playbook's For Each loop needs an entity cap, not the analytics rule needs removal.

Scenario 8. The security architect is building the automation business case for leadership. The enrichment playbook saves analysts an estimated 8 minutes per incident by pre-populating sign-in history, user risk, and device compliance. The workspace processes 1,500 incidents per month. The Logic App cost is $12 per month. The SOC manager asks: how do you quantify the value? What is the strongest metric to present?

Present the Logic App cost savings: $12 per month is cheaper than any alternative enrichment tool. The low cost is the primary selling point for automation investment. a
The $12 Logic App cost is a real number but it does not quantify value. Leadership does not approve automation programs because Logic Apps are cheap. They approve them because the analyst time recovered exceeds the investment. Section 1.10 framed the business case around analyst time, not platform cost. The Logic App cost is a line item in the business case, not the headline metric.
Present the number of incidents enriched per month: 1,500 incidents with automated enrichment demonstrates scale. Include a chart showing enrichment volume trending upward as more analytics rules are onboarded. b
Volume metrics show activity but not value. Enriching 1,500 incidents means nothing if the enrichment does not change analyst behavior. A leadership audience needs to see time recovered, not incidents processed. Section 1.10 built the business case on analyst minutes saved, which translates to either headcount efficiency or faster response times, both of which leadership can evaluate against other investments.
Present the playbook success rate: 94% of enrichment runs succeed, demonstrating reliability. Pair with the monitoring dashboard showing consistent uptime over 90 days. c
Reliability metrics matter operationally but do not quantify business value. A 94% success rate tells leadership the automation works. It does not tell them what it is worth. The business case needs a value metric (time saved, cost avoided) supported by a reliability metric (success rate). Leading with reliability answers "does it work" but not "should we invest more."
Present analyst time recovered: 8 minutes per incident across 1,500 incidents equals 200 analyst hours per month. At a blended analyst cost of $55 per hour, that is $11,000 per month in recovered capacity for a $12 Logic App investment. The 94% success rate and monitoring dashboard are supporting evidence that the time savings are real and sustainable. d
Correct. Section 1.10 established the business case structure: lead with time recovered (the metric leadership cares about), convert to cost equivalent (the metric finance cares about), and support with reliability data (the metric operations cares about). The 200 hours per month represents capacity that can be redirected to investigations, threat hunting, or detection engineering. The $11,000 versus $12 comparison makes the ROI immediately obvious without requiring leadership to understand Logic App pricing or Sentinel architecture.
💬

How was this module?

Your feedback helps us improve the course. One click is enough — comments are optional.

Thank you — your feedback has been received.
Unlock the Full Course See Full Course Agenda