In this section

TR0.8 The Triage Scorecard

3-4 hours · Module 0 · Free

What you already know

Section 0.2 introduced the four-outcome classification (TP, FP, BTP, Indeterminate) and the confidence levels that govern each. Section 0.5 taught blast radius and asset prioritization. Section 0.7 drew the boundary between triage and investigation. This section gives you the operational tool that ties them together — the 8-question scorecard that produces a consistent, defensible classification for any alert within 15 minutes.

Scenario

Priya picks up a "Suspicious sign-in from Tor network" alert at 10:14. No playbook exists for this alert type. She checks the sign-in logs, sees the user authenticated from a Tor exit node, and instinctively classifies it as a true positive. Her colleague Tom picks up an identical alert at 10:22 for a different user, checks the same logs, sees the user is a pentester running authorized testing, and classifies it as a benign true positive. Both analysts are right — for their specific alerts. But their reasoning is invisible. Neither documented why they classified the way they did, and neither could defend the classification to an auditor beyond "I checked and it looked right." When a third identical alert fires at 10:35, the next analyst has no framework to follow — just two conflicting precedents.

Why intuition fails on novel alerts

Experienced analysts triage common alert patterns by recognition — they've seen the pattern before, they know the answer, they classify in seconds. This works for the 80% of alerts that match familiar patterns. It fails on the 20% that don't. Novel attack techniques, subtle indicators that resemble legitimate activity, and cross-environment incidents that span cloud and endpoint all defeat pattern recognition because there is no pattern to match.

Breaches aren't found in the 80% of alerts your team handles well. They hide in the 20% that don't match a familiar pattern — the one alert that looks like every other false positive but is actually an attacker mimicking legitimate behavior. The scorecard exists for those alerts. On familiar patterns, it takes 2–3 minutes and confirms what the analyst already knows. On unfamiliar patterns, it prevents the analyst from defaulting to "it's probably fine" when the evidence says otherwise.

Figure TR0.8 — The 8-question triage scorecard. Questions 1–3 carry the most diagnostic weight (3 points each). Questions 4–7 refine severity and urgency. Question 8 is the confidence gate — it overrides the score when the analyst's certainty doesn't match the number.

The diagnostic questions

The eight questions are ordered by diagnostic value. Questions 1–3 most strongly differentiate true positives from false positives. Questions 4–7 refine the severity and urgency. Question 8 is the confidence override that prevents false certainty.

Q1 — Evidence beyond the alert? (+3 / 0 / +1 unknown). The most powerful triage question. A single alert in isolation could be noise. A single alert accompanied by corroborating evidence from a different data source is almost certainly real.

When the CHAIN-HARVEST AiTM alert fired, the 5-minute triage query revealed three corroborating signals: the source IP was a known Tor exit node, the same user had a simultaneous legitimate session from Bristol, and the authentication method was token replay rather than interactive. Three signals from SigninLogs — each independently suspicious, together definitive.

Q2 — Scope beyond a single entity? (+3 / 0 / +2 unknown). An alert affecting one user is an incident. A pattern affecting 24 users is a campaign. CHAIN-HARVEST's password spray targeted 24 accounts — the triage response included checking all 24 for successful authentication, not just the one that fired the alert. Scope changes the severity classification and the containment approach.

Q3 — Active or historical? (+3 / 0 / +2 uncertain). An active threat requires immediate containment. A historical threat (discovered after the fact) requires investigation but not emergency response. The distinction determines the Triage Trinity sequence: active threats may require containment before preservation. Historical threats allow the standard order.

Q4 — Sensitive data at risk? (+2 / 0 / +1 probable). Data exposure shifts the incident from "security event" to "potential data breach." A compromised account that accessed a SharePoint library containing customer PII triggers GDPR Article 33 notification. The same account accessing a public marketing site does not.

Q5 — Containment urgency? (+2 immediate / +1 soon / 0 standard). Active ransomware encryption, ongoing data exfiltration, or a live BEC email about to send — these demand containment within minutes. A dormant persistence mechanism discovered during a review allows the responder to complete classification and preservation first.

Q6 — Business impact? (+2 high / +1 medium / 0 low). A compromised test account in a developer tenant has different impact than a compromised CFO account with wire transfer authorization. The business impact score scales the response and determines whether management notification is immediate or routine.

Q7 — Regulatory trigger? (+2 / 0 / +1 uncertain). If the alert involves personal data, essential service disruption, financial system compromise, or health data exposure, the triage responder flags this in the report. The responder doesn't make the notification decision — that's legal and management. The responder ensures the trigger is visible so the notification assessment can start within the regulatory timeline.

Q8 — Confidence override. Not a point score. A decision gate. High confidence on any score means acting on the number. Low confidence on a score of 8 or above means escalate regardless — you don't close uncertain alerts with high scores. Low confidence on a score below 8 means document the uncertainty, set a 24-hour watchlist, and re-triage if new evidence surfaces.

Answering Q1–Q3 with data

Questions 1 through 3 are data-intensive — you need to query your environment to answer them. The following KQL answers all three for a cloud identity alert by checking for corroborating sign-in anomalies, entity scope, and whether the threat is currently active:

KQL

// Scorecard Q1-Q3: corroboration, scope, active status
let AlertIP = "185.220.101.42";
let AlertUser = "j.morrison@northgate-eng.com";
let AlertTime = datetime(2026-03-15T08:14:00Z);
SigninLogs
| where TimeGenerated between (AlertTime - 1h .. AlertTime + 1h)
| where IPAddress == AlertIP
    or UserPrincipalName == AlertUser
| summarize
    // Q1: corroborating evidence
    DistinctIPs = dcount(IPAddress),
    AuthMethods = make_set(AuthenticationMethodsUsed),
    RiskLevels = make_set(RiskLevelDuringSignIn),
    // Q2: scope beyond single entity
    AffectedUsers = dcount(UserPrincipalName),
    AffectedApps = dcount(AppDisplayName),
    // Q3: active — any session in last 15 min?
    MostRecentActivity = max(TimeGenerated),
    ActiveSessions = countif(
        TimeGenerated > ago(15m)
        and ResultType == 0)

The output gives you all three answers in one query. DistinctIPs greater than 1 with concurrent legitimate sessions means Q1 is YES (+3). AffectedUsers greater than 1 means Q2 is YES (+3). ActiveSessions greater than 0 means Q3 is ACTIVE (+3).

Answering the environmental questions

Questions 4–7 don't come from queries. They come from environmental knowledge — the context you built in Section 0.5 when you mapped your hybrid environment and classified your assets.

Q4 — Sensitive data at risk. You're looking at what the compromised identity can reach, not what has been confirmed accessed. A mailbox containing PII or financial approvals scores YES even if you haven't confirmed exfiltration. The risk is potential exposure, not proven exposure. If the compromised account is a service principal with Graph API permissions to read all mailboxes, Q4 is YES regardless of what the attacker has done so far.

Q5 — Containment urgency. This distinguishes between active and historical threats. An active session where the attacker can still operate demands immediate containment — revoke sessions, block the IP, disable the account. A historical event where the session has expired and no persistence is visible can tolerate a measured response. The difference is whether you contain first and investigate second, or investigate first and contain if needed.

Q6 — Business impact. A compromised intern account on a test tenant is not the same as a compromised finance director account with SAP access. The business impact assessment comes from the asset classification you built: Tier 1 assets score HIGH, Tier 2 score MEDIUM, Tier 3 score LOW. You don't need to calculate this during triage — you need to have already classified your environment so the answer is a lookup, not a judgment call.

If your organization hasn't classified its assets, every Q6 answer becomes a debate, and the scorecard loses its speed advantage.

Q7 — Regulatory trigger. Section 0.6 covered the specific thresholds: GDPR 72 hours, NIS2 24 hours, DORA 4 hours initial. If the compromised account has access to personal data and the evidence suggests the attacker accessed it, regulatory notification clocks start when you have reasonable certainty of a breach. This is not a legal determination — it's a triage flag that tells you to preserve evidence and escalate to your legal or DPO function immediately rather than completing investigation first.

Q8 — Confidence in classification. This is the analyst's self-assessment. HIGH confidence means multiple independent indicators point the same direction. MODERATE means the evidence is consistent but relies on a single indicator or the alert context is ambiguous. LOW means the analyst can't classify with certainty — the correct action is to escalate or tag as Indeterminate rather than guessing.

The confidence override exists for one specific situation: when the analyst's domain expertise says the score is wrong. A scorecard total of 6 normally maps to "Probable FP." But if the analyst recognizes the activity pattern from a threat intelligence report published that morning, the override to "Escalate — possible campaign" is legitimate. The override must be documented with the specific reasoning, and the team lead reviews all overrides during monthly calibration.

Overrides should be rare — if an analyst overrides more than 10% of their scorecard results, either the scorecard thresholds need adjustment or the analyst is reverting to intuitive classification under a different label.

Automating Q1–Q3 with PowerShell

For environments without Sentinel, the same three questions can be answered from Entra ID sign-in logs via the Graph API:

PowerShell — Scorecard Q1–Q3 via Graph

# Scorecard Q1-Q3: pull sign-in context for the alert user + IP
$AlertUser = "j.morrison@northgate-eng.com"
$AlertIP   = "185.220.101.42"
$Window   = (Get-Date).AddHours(-2).ToString("yyyy-MM-ddTHH:mm:ssZ")
# Q1+Q2: corroboration and scope
$Signs = Get-MgAuditLogSignIn -Filter "createdDateTime ge $Window and (ipAddress eq '$AlertIP' or userPrincipalName eq '$AlertUser')"
$UniqueIPs    = ($Signs.IpAddress | Sort-Object -Unique).Count
$UniqueUsers  = ($Signs.UserPrincipalName | Sort-Object -Unique).Count
$UniqueApps   = ($Signs.AppDisplayName | Sort-Object -Unique).Count
# Q3: any successful session in last 15 minutes?
$Recent = (Get-Date).AddMinutes(-15).ToString("yyyy-MM-ddTHH:mm:ssZ")
$Active = $Signs | Where-Object {
    $_.CreatedDateTime -ge $Recent -and $_.Status.ErrorCode -eq 0
}
Write-Host "Q1 — Distinct IPs: $UniqueIPs | Q2 — Affected users: $UniqueUsers, Apps: $UniqueApps | Q3 — Active sessions: $($Active.Count)"

The output maps directly to the scorecard. Multiple IPs associated with the same user confirm Q1. Multiple affected users confirm Q2. Active sessions within 15 minutes confirm Q3. The same logic applies regardless of whether you query Sentinel or the Graph API — the questions are platform-agnostic.

Scoring the CHAIN-HARVEST AiTM alert

The scorecard applied to the CHAIN-HARVEST alert from Section 0.4:

Worked Example — CHAIN-HARVEST Scorecard

Alert: AiTM Token Replay — j.morrison (185.220.101.42)

Q1 — Evidence beyond alert? YES (+3). Tor exit node, simultaneous Bristol session, token replay auth method.

Q2 — Scope beyond single entity? YES (+3). Password spray hit 24 accounts. j.morrison confirmed compromised.

Q3 — Active or historical? ACTIVE (+3). Token replay session currently valid.

Q4 — Sensitive data at risk? YES (+2). Mailbox contains financial approvals and engineering proposals.

Q5 — Containment urgency? IMMEDIATE (+2). Live session = attacker can read/send email now.

Q6 — Business impact? HIGH (+2). Access to financial approval workflows. BEC risk.

Q7 — Regulatory trigger? PROBABLE (+1). If personal data in email was accessed, GDPR applies.

Q8 — Confidence? HIGH. Three independent corroborating indicators.

Total: 16/20 — CONFIRMED TRUE POSITIVE

Action: Full Triage Trinity. Revoke session, preserve evidence, escalate to IR, notify management.

The scorecard took 8 minutes to complete. The KQL query for Q1–Q3 ran in under 30 seconds. Questions 4–7 were answered from environmental knowledge (j.morrison's role, the data accessible from that mailbox, the regulatory context). The classification is defensible — any auditor can trace the score to eight documented evidence points.

Calibration across analysts

The scorecard is a decision support tool, not an automated classifier. Two analysts scoring the same alert may produce different scores and both be defensible. The goal is consistent classifications: if one analyst scores 12 and another scores 14, both classify as probable TP and both initiate the same containment actions. The scores differ but the outcome is the same.

Monthly calibration at NE works like this: Rachel presents 5 anonymized alerts from the previous month. Each analyst independently scores all 5. The team compares results. Discrepancies of 1–2 points per question are normal — different analysts weigh ambiguous evidence differently. Discrepancies of 3+ points on any question indicate a gap in how that evidence category is interpreted, and that gap maps to a specific training module.

The calibration session produces two outputs. First, a consensus score range for each alert. This becomes the benchmark for new analysts — they score the same 5 alerts during onboarding, and their results are compared against the team consensus. Second, a list of questions where the team disagrees. If three analysts scored Q4 differently on the same alert, the team discussion isn't about who was right. It's about what evidence each analyst used and whether the asset classification is clear enough to produce consistent answers.

Calibration also catches scorecard drift. Over time, analysts develop shortcuts — they stop answering Q7 because "we never trigger regulatory notification" or they default Q6 to MEDIUM because "everything is medium priority." Monthly calibration forces every question to be answered deliberately, and the discussion surfaces assumptions that have become invisible. A team that skips calibration for three months will discover that their scorecard has quietly degraded into the same intuitive classification it was designed to replace.

"The scorecard slows experienced analysts down — they can classify faster by instinct"

On common alert patterns, experienced analysts classify in seconds regardless of the scorecard. The scorecard adds 2–3 minutes on familiar alerts. But the scorecard doesn't exist for familiar alerts. It exists for the 20% of alerts that don't match a known pattern — the novel technique, the subtle indicator, the attacker deliberately mimicking legitimate behavior. The 3 minutes the scorecard adds to that alert is the 3 minutes that catches the breach everyone else missed. Industry data consistently shows that the majority of breaches begin with alerts that were triaged incorrectly or ignored entirely. The scorecard is insurance against the one time pattern recognition fails.

Investigation Principle

The scorecard doesn't replace analyst judgment — it structures it. Eight questions, documented answers, a defensible classification. The number isn't the decision. The reasoning behind each answer is the decision. The number is how you communicate that reasoning consistently across a team.

Section 0.9 teaches the triage report template — the structured document that captures the scorecard output, the evidence inventory, the containment record, and the outstanding questions for the investigation team. The report is the handoff artifact that makes everything you've learned operational.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →