In this module

1.7 Unified Portal Operations: Daily SOC Workflow

10-14 hours · Module 1 · Free

Operational Objective

This subsection covers unified portal operations: daily soc workflow — a core operational skill for security teams working in Microsoft 365 environments. Every concept is demonstrated through practical scenarios from the Northgate Engineering environment.

Deliverable: Working proficiency with the techniques and operational patterns covered in this subsection.

Estimated completion: 25 minutes

Figure 1.7 — Operational workflow from input through documented output.

Figure — Unified Portal Operations: Daily SOC Workflow. Applied to security operations at Northgate Engineering.

Unified Portal Operations: Daily SOC Workflow

Introduction

Required role: Security Reader (minimum for portal navigation). Security Administrator for configuration changes.

This subsection is not in Microsoft Learn. Microsoft's training teaches you what each button does and how each feature works. It does not teach you how to work an eight-hour shift efficiently, how to prioritize when twenty incidents arrive simultaneously, how to hand over cleanly to the next analyst, or how to avoid the alert fatigue that degrades your detection effectiveness over weeks and months.

// Quick health check: when did each critical table last receive data?
union
    (DeviceProcessEvents | summarize LastEvent = max(Timestamp) | extend Table = "DeviceProcessEvents"),
    (EmailEvents | summarize LastEvent = max(Timestamp) | extend Table = "EmailEvents"),
    (IdentityLogonEvents | summarize LastEvent = max(Timestamp) | extend Table = "IdentityLogonEvents"),
    (CloudAppEvents | summarize LastEvent = max(Timestamp) | extend Table = "CloudAppEvents"),
    (AlertEvidence | summarize LastEvent = max(Timestamp) | extend Table = "AlertEvidence")
| project Table, LastEvent, DataAge = datetime_diff('minute', now(), LastEvent)
| order by DataAge desc

// Identify the noisiest alert types for tuning
AlertInfo
| where Timestamp > ago(30d)
| summarize AlertCount = count(),
    FPCount = countif(DetectionSource == "Custom" and AttackTechniques == ""),
    AvgSeverity = avg(case(Severity == "High", 3, Severity == "Medium", 2, 1))
    by Title
| where AlertCount > 20
| order by AlertCount desc
| take 15

Expand for Deeper Context

The Defender XDR portal is your workspace. Knowing where the buttons are is necessary but not sufficient. What matters is how you use the portal operationally: what you check first when you start your shift, how you triage the queue without wasting time on false positives, when you investigate versus when you escalate, how you document your work so the next analyst can pick up where you left off, and how you maintain your effectiveness across hundreds of alerts per week.

This subsection describes the daily workflow practiced by analysts in operational SOCs. It is based on real shift patterns, not theoretical frameworks. The specific times and sequences described here are guidelines that you should adapt to your environment, team size, and alert volume — but the underlying principles are universal.

---

Shift start routine

Every shift starts the same way, regardless of what happened on the previous shift. This routine takes approximately 15 minutes and ensures you have situational awareness before you start working the queue.

Check the incident queue (5 minutes). Open the Defender portal and navigate to Incidents. Filter to Status = New, sort by Severity descending. Count the new incidents since the last shift. Read the incident names and severities without opening them yet — you are building a mental map of what needs attention, not investigating. If any incident is marked High or Critical severity, note it. If any incident was auto-assigned by an automation rule but not yet acknowledged, note that too. The goal of this step is to answer one question: is anything on fire right now?

Read the shift handover (3 minutes). If your SOC uses a handover document — a Teams channel, a shared OneNote, a wiki page, or even an email — read it. The previous shift's handover should tell you: what incidents were actively being investigated when they left, what actions are pending (waiting for user response, waiting for management approval to isolate a device, waiting for a vendor to respond), and whether any data pipeline issues were detected. If there is no formal handover document, check the most recent incident comments in the queue — analysts who follow good practices leave comments on incidents they were working.

Check data pipeline health (5 minutes). This is the step most analysts skip and most SOC managers wish they would not. If a data connector stopped flowing, your detection rules are blind — alerts that should fire will not fire, and you will have a false sense of security from an empty queue. Run a connector health check by navigating to Microsoft Sentinel → Data connectors (if your org uses Sentinel) or by running a quick Advanced Hunting query against each critical table:

Expected Output — Data Pipeline Health Check

Table	LastEvent	DataAge (minutes)
IdentityLogonEvents	2026-03-21 08:42	3
DeviceProcessEvents	2026-03-21 08:41	4
EmailEvents	2026-03-21 08:40	5
CloudAppEvents	2026-03-21 08:38	7
AlertEvidence	2026-03-21 08:35	10

Healthy pipeline: All tables show data within the last 10 minutes. Normal ingestion latency for Defender XDR tables is 5-15 minutes. If any table shows a DataAge greater than 60 minutes, investigate the connector. If DeviceProcessEvents is stale, your endpoint detections are blind. If EmailEvents is stale, phishing alerts will not fire.

If any table shows a data age greater than 60 minutes, this is your first priority — not the incident queue. A silent data pipeline is more dangerous than a noisy queue. Escalate connector issues to your engineering team or Microsoft support immediately.

Check Threat Analytics (2 minutes). Navigate to Threat Analytics in the Defender portal. Are there new threat reports marked with "Impact" on your environment? Microsoft publishes threat analytics when new campaigns or vulnerabilities are actively exploited. If a new threat report shows your environment has exposed or impacted assets, this takes priority over the standard queue — you may need to run hunting queries or apply emergency protections before working general incidents.

---

Triage methodology

After the shift start routine, you work through the incident queue. Triage is the process of quickly assessing each new incident to determine whether it requires investigation, and if so, how urgently. Effective triage is the difference between a SOC that catches real attacks early and a SOC that drowns in false positives while real attacks slip through.

The 5-minute triage rule. For each new incident, spend no more than 5 minutes on initial triage. In those 5 minutes, you need to reach one of four classifications:

True Positive (TP) — the alert accurately describes malicious or unauthorized activity. Action: assign to yourself (or the appropriate Tier 2 analyst), begin investigation or escalation.

False Positive (FP) — the alert fired on legitimate activity. Action: close the incident with a comment explaining why it is a false positive. If this alert pattern fires repeatedly on the same legitimate activity, create a suppression rule or tuning recommendation. Do not simply close it silently — the next analyst who sees the same pattern needs your reasoning.

Benign True Positive (BTP) — the alert accurately describes the activity, but the activity is authorized. Example: a penetration test triggers lateral movement alerts. Action: close with comment documenting the authorization (reference the change request number or pen test scope document).

Informational / Unknown — the evidence is insufficient for classification in 5 minutes. Action: assign to yourself, flag for deeper investigation during dedicated investigation time.

What to check during the 5-minute triage:

First, read the incident name and severity. The auto-generated incident name in Defender XDR usually describes the core alert — "Multi-stage incident involving phishing and credential theft on one endpoint" tells you a lot in one sentence.

Second, check the entities involved. How many users? How many devices? How many mailboxes? A single-user, single-device incident is likely contained. An incident involving 15 users across 8 devices is potentially widespread.

Third, read the first alert. Open the highest-severity alert in the incident. Read the alert description, check the MITRE ATT&CK mapping, and look at the evidence (process tree for endpoint alerts, email details for email alerts, sign-in details for identity alerts).

Fourth, check automated investigation status. If Defender XDR's automated investigation has already analyzed the alert and taken remediation actions, your triage is faster — review the automated findings and decide whether you agree with the classification.

Fifth, classify and act. Based on the evidence, classify as TP/FP/BTP/Unknown and take the corresponding action.

The queue is a triage list, not a task list

A common mistake for new analysts is treating every incident as a task to be completed. The queue is a triage list — your job is to quickly sort incidents by urgency and impact, then dedicate investigation time to the ones that matter. If you spend 45 minutes on a false positive that could have been classified in 3 minutes, you have lost 42 minutes that could have been spent on a real attack. Speed in triage is not about cutting corners — it is about pattern recognition that develops with experience.

---

Priority-based investigation

After triaging the queue, you shift to investigation. Investigation time should be structured, not reactive. Work incidents in priority order:

Priority 1 — Active attacks in progress. Indicators: attack disruption actions triggered, ransomware-related alerts, alerts showing active data exfiltration, alerts showing ongoing lateral movement. These incidents get your full attention immediately. If necessary, interrupt other work.

Priority 2 — High-severity confirmed true positives. Incidents classified as TP during triage with High or Critical severity. These need investigation within the current shift. Do not defer to the next shift unless you have documented your progress in the incident comments.

Priority 3 — Medium-severity true positives and unknowns. Incidents that require investigation but are not actively progressing. These should be investigated within 24 hours.

Priority 4 — Low-severity true positives and operational items. Policy violations, informational alerts, and configuration issues. These can be batched and handled during quiet periods. Do not let these accumulate indefinitely — schedule a recurring block (30-60 minutes per shift) for clearing low-priority items.

---

Documentation standards

Every incident you touch should have comments that allow another analyst to understand the current state without asking you. This is not bureaucracy — it is operational necessity. Analysts go on leave, shift patterns rotate, and incidents that span multiple days need continuity.

What to document in incident comments:

Your classification and reasoning. "Classified as FP. Alert fired on legitimate admin PowerShell activity by admin.t.clark running a scheduled compliance script. Script is in the approved automation list (ref: CHANGE-2026-0142)."

Your investigation progress. "Investigated the process chain. Confirmed malicious macro in invoice.docx delivered via email from compromised external account. User j.morrison's device (DESKTOP-NGE042) isolated. Investigation package collected. Pending: decode the Base64 PowerShell payload and check if the file hash appears on other devices."

Actions taken and pending. "Actions taken: device isolated, user sessions revoked, inbox rules checked (none found). Pending: password reset requires manager approval per IR policy — ticket INC-NE-2026-0321 raised. Handover to next shift if not approved by 17:00."

Escalation notes. "Escalated to Tier 2 — incident involves 12 devices and potential data exfiltration. Tier 2 lead: S. Patel, notified via Teams at 14:30."

The golden rule of documentation: another analyst should be able to read your comments and continue the investigation without contacting you. If they need to ask you questions to understand where you left off, your documentation is insufficient.

---

Shift handover

At the end of each shift, write a handover that covers three things:

Active incidents. Which incidents are you currently investigating? What is the current status? What actions are pending?

Pipeline and environment issues. Were there any data connector issues during your shift? Did any automation rules malfunction? Are there any environment-wide concerns (patch deployment in progress, scheduled maintenance windows, pen test running)?

Notable observations. Did you see any patterns in the queue that might indicate an emerging campaign? Did you create any new suppression rules? Did you identify any tuning opportunities for noisy detection rules?

Keep the handover concise — five to ten bullet points, not a two-page report. The next analyst needs a 2-minute briefing, not a novel.

---

Managing alert fatigue

Alert fatigue is the gradual degradation of an analyst's attention and response quality caused by exposure to high volumes of alerts, most of which are false positives or low-value true positives. It is the single biggest threat to SOC effectiveness and is the root cause of most "how did we miss that" post-incident reviews.

Recognizing alert fatigue in yourself: You start closing incidents without fully reading the alert details. You classify ambiguous alerts as FP without investigating because "it's probably another false positive." You stop checking the process tree and rely solely on the alert title. You skip the data pipeline health check because it has been healthy for weeks. If you notice these behaviors, take a break, switch to a different task (threat hunting, rule tuning, documentation), and return to the queue with fresh attention.

Organizational countermeasures: The most effective countermeasure is aggressive false positive reduction. Every week, review the top 10 most common alert types in your queue. For each one that is predominantly false positive, create a suppression rule, tune the detection threshold, or add an exclusion. A SOC that receives 500 alerts per week with a 90% false positive rate (50 real alerts buried in 450 noise alerts) is less effective than a SOC that receives 100 alerts per week with a 50% false positive rate (50 real alerts in 50 noise alerts). The real attack count is identical — but the analyst's ability to find them is vastly different.

Rotation and variety. Analysts who spend every shift working the same queue on the same alerts experience faster fatigue. Rotate between queue triage, threat hunting, detection engineering (writing new rules), and investigation to maintain engagement. If your team is too small for formal rotation, allocate 20-30% of each shift to non-queue activities.

Expected Output — Top 15 Noisiest Alert Types (30 Days)

Title	AlertCount	FPCount	AvgSeverity
Suspicious PowerShell command line	187	142	2.1
Email messages containing malicious URL removed after delivery	93	12	1.8
Suspicious process injection observed	67	51	2.4

Tuning opportunity: "Suspicious PowerShell command line" fired 187 times in 30 days with 142 false positives (76% FP rate). This single alert type consumed approximately 15 hours of analyst triage time (187 × 5 min). Review the false positives to identify a common pattern (specific script, specific user, specific device) and create a suppression rule. Reducing this alert's FP rate from 76% to 20% would save ~10 hours of analyst time per month.

---

The Action Center

The Action Center is the unified view of all remediation actions — both automated and manual — taken across your environment. Navigate to it from the Defender portal navigation. The Action Center has two tabs: Pending (actions awaiting analyst approval) and History (completed actions).

Pending actions appear when your automation level is set to "Semi — require approval for any remediation" or "Semi — require approval for core folders remediation." In these modes, automated investigation identifies remediation actions but waits for an analyst to approve them before executing. Common pending actions include quarantining a malicious file, removing a persistence mechanism (scheduled task, registry key), and stopping a malicious process.

Check the Pending tab at least once per shift. Pending actions that sit unapproved for days mean your automated investigation is identifying threats but you are not allowing it to remediate them — you are getting the detection benefit but losing the response benefit. If you consistently approve the same types of pending actions, consider upgrading your automation level to "Full — remediate threats automatically" for those action types.

History shows every completed action with timestamps, the analyst who approved it (or "Automated" for fully automated actions), the device and file affected, and the result (successful or failed). Use the History tab during incident reviews to verify that all remediation actions completed successfully and to document the response timeline for incident reports.

Try it yourself

Run the data pipeline health check query in Advanced Hunting on your developer tenant. Note which tables have data and which are empty — empty tables in a developer tenant usually indicate that the corresponding data connector is not configured. Then navigate to the Action Center and review any items in the Pending and History tabs. In a developer tenant with simulated data, you may see automated investigation actions from sample alerts.

What you should observe

The health check query gives you immediate visibility into which data sources are flowing and which are stale. In a production environment, running this query at the start of every shift catches connector outages before they become blind spots. The Action Center shows the full remediation lifecycle — if you see pending actions that have been waiting for days, that represents a gap between detection and response that needs attention.

Knowledge check

Compliance mapping

NIST CSF: DE.AE-1 (Baseline of operations established), PR.DS-1 (Data-at-rest is protected). ISO 27001: A.8.15 (Logging), A.8.16 (Monitoring activities). SOC 2: CC7.2 (Monitor system components). Every configuration in this subsection contributes to the logging and monitoring controls that auditors verify.

Compliance Myth: "The Secure Score tells you how secure you are"

The myth: The Secure Score tells you how secure you are

The reality: Secure Score measures configuration compliance against Microsoft's recommended settings. It does not measure: whether your detection rules catch real attacks, whether your SOC can investigate an incident, whether your users recognize phishing, or whether your IR plan works under pressure. A tenant with a 95% Secure Score and no SOC is less secure than a tenant with a 70% Secure Score and a trained, practiced incident response team. Score is hygiene. Capability is security.

Check your understanding

1. You start your shift and run the data pipeline health check. DeviceProcessEvents shows a DataAge of 4 hours. The incident queue shows zero new incidents overnight. Should you be concerned?

Yes — extremely concerned. A 4-hour gap in endpoint telemetry means no process creation events have been ingested for 4 hours. The empty incident queue is not reassuring — it is alarming, because endpoint-based detection rules cannot fire without endpoint data. This could mean a widespread sensor outage, a data connector failure, or an attacker who has disrupted your monitoring. Escalate to engineering immediately as your top priority.

No — zero incidents overnight means everything is fine. The data delay is normal.

Mildly concerned — check again in an hour to see if the pipeline recovers.

A silent queue combined with a stale data pipeline is one of the most dangerous situations in SOC operations. The absence of alerts does not mean the absence of threats — it means the absence of detection capability. This is why the data pipeline health check is part of the shift start routine, not an optional step. Four hours of missing endpoint telemetry could mask an active ransomware deployment.

2. You have 25 new incidents in the queue. Three are High severity, twelve are Medium, ten are Low. You have 6 hours left in your shift. What is your approach?

Triage all 25 incidents first (5 minutes each = ~2 hours). Classify each as TP/FP/BTP/Unknown. Then investigate in priority order: the 3 High-severity TPs first, followed by Medium TPs. Close all FPs immediately during triage. Document progress on any incident you cannot complete before shift end.

Start with the first incident in the queue and work through them sequentially.

Focus exclusively on the 3 High-severity incidents and leave the rest for the next shift.

Triage first, investigate second. If you start investigating the first incident without triaging the rest, you might spend 90 minutes on a Medium-severity false positive while a Critical-severity true positive sits undiscovered at position 15 in the queue. Triaging all 25 incidents first (about 2 hours) gives you the complete picture, lets you close FPs immediately, and ensures you spend your remaining 4 hours on the highest-impact work.

3. You classify an incident as a False Positive. What is the minimum documentation you should leave?

A comment explaining what the alert fired on, why it is a false positive, and — if this is a recurring pattern — a recommendation for a suppression rule or detection tuning. Another analyst seeing the same pattern should be able to read your comment and immediately understand the classification without repeating your analysis.

No documentation needed for false positives — just close the incident.

A single word: "FP" in the comments field.

Undocumented FP closures are one of the most common sources of operational friction in SOCs. When the same alert fires next week and a different analyst sees it, they will spend another 5 minutes triaging something you already analyzed. Multiply that across hundreds of recurring false positives and thousands of analyst hours per year. A 30-second comment saves hours of cumulative rework and builds institutional knowledge that makes the entire team faster.

Decision point

You manage NE's M365 security stack. Microsoft releases a new Defender feature in preview. The feature promises to reduce AiTM risk by 80%. Do you enable it immediately?

Not in production. Enable in a test tenant or for a pilot group first. Preview features may: change behavior before GA, have undocumented interactions with existing CA policies, or produce unexpected results in specific tenant configurations. The deployment sequence: (1) enable in a test tenant and validate against NE's CA policy set, (2) enable for a pilot group of 10 users for 2 weeks, (3) monitor for FPs and operational impact, (4) roll out to all users after successful pilot. Microsoft's '80% reduction' claim is based on their telemetry across all tenants — NE's specific configuration may produce different results.

You've set up your M365 tenant and learned the Defender XDR unified portal.

Module 0 got your M365 developer tenant configured with sample data. Module 1 took you through the Defender XDR unified incident queue across endpoint, email, identity, and cloud apps. Now you investigate every major M365 attack type and deploy the detections that catch them next time.

15 investigation and configuration modules — Defender for Endpoint, Purview, Defender for Cloud, Security Copilot, Sentinel workspace design, log ingestion, analytics rules, and threat hunting
5 named attack investigations — AiTM credential phishing, BEC and financial fraud, consent phishing and OAuth grant abuse, token replay and session hijacking, insider threat
KQL from fundamentals through advanced hunting — dedicated modules on query language, cross-table joins, statistical analysis, and threat hunting queries
SC-200 exam objectives fully covered — every module maps to the January 2026 SC-200 update. The certification is the side effect of operational competence, not the goal
Production artefacts per module — detection rules, investigation playbooks, and hardening checklists you deploy to your own tenant

Unlock the full course with Premium See Full Syllabus

← Previous Next →