In this section

TR0.7 Triage vs Investigation — Where Triage Ends

3-4 hours · Module 0 · Free

What you already know

Sections 0.1 through 0.6 taught the triage methodology: the 60-minute window, the four-outcome classification, the Triage Trinity, the CHAIN-HARVEST timeline, environment mapping, and the legal framework. This section draws the line between triage and investigation — when your triage work is complete, what the investigation team needs from you, and why crossing that boundary during triage degrades both phases.

Scenario

Tom classifies an AiTM alert on j.morrison as a true positive at 08:26. He revokes the session, captures the sign-in log snapshot, and starts writing his triage report. Then he pauses — the memory dump from DESKTOP-NGE042 is sitting in the evidence folder, and he's curious what the beacon configuration reveals. He opens Volatility, starts reconstructing the C2 infrastructure, and 90 minutes later has deep forensic findings but hasn't handed off the triage report, hasn't returned to the alert queue, and three new high-severity alerts have gone untriaged. His investigation was excellent. His triage was abandoned.

What triage answers

Triage answers five questions. Each produces a specific deliverable that the investigation team needs. Once all five are answered, triage is complete — regardless of how much more you could learn by continuing.

Is this a true incident? The deliverable is a classification: TP, FP, BTP, or Indeterminate with a confidence level. This is the scorecard output from Section 0.8. The classification tells the investigation team whether they are starting from a confirmed compromise or evaluating an ambiguous signal.

What is the severity? The deliverable is a severity tier — Critical, High, Medium, or Low — based on the asset classification and blast radius from Section 0.5. Severity determines response urgency. A Critical finding means the IR team mobilizes immediately. A Low finding means documentation now and a scheduled follow-up within the SLA window.

What volatile evidence must be preserved? The deliverable is a list of evidence preservation actions with timestamps, integrity hashes, and chain-of-custody metadata. Memory dumps captured, log snapshots taken, process states recorded — everything documented against the framework from Section 0.6.

What immediate containment is needed? The deliverable is a record of containment actions executed and verified. Session revoked, endpoint isolated, account disabled. Each action timestamped with confirmation that it succeeded. Containment that isn't verified isn't containment.

What does the investigation team need to start? The deliverable is the triage report itself — classification, severity, evidence inventory, containment record, and outstanding scope questions. When the investigation team reads it, they begin deep analysis immediately rather than spending their first hour reconstructing what you already knew.

Figure TR0.7a — Triage produces the classification, evidence, and containment. Investigation starts from the handoff. The arrow is the triage report.

What triage deliberately leaves unanswered

Triage leaves several critical questions for the investigation phase. These aren't failures — they're boundaries. Trying to answer investigation questions during triage delays the handoff and degrades both phases.

Who is the attacker? Triage identifies the compromised account and the entry vector. Investigation determines attribution — threat actor group, campaign infrastructure, whether this is opportunistic or targeted. Attribution matters for long-term defense posture, but you don't need it to revoke a session or isolate an endpoint. The containment decision doesn't change based on whether the attacker is APT29 or a commodity phishing kit.

How far did they get? Triage identifies the environments touched based on initial evidence (Section 0.4's boundary crossings). Investigation determines the full scope — every account accessed, every system pivoted to, every persistence mechanism planted. This is the question that most often triggers triage creep. You see evidence of lateral movement in the sign-in logs and your instinct is to follow it. That's investigation instinct, not triage instinct.

During triage, you document the boundary crossings you observed and hand them to the investigation team as scope questions. They have the hours and the forensic tooling to chase every pivot.

What data was accessed or exfiltrated? Triage confirms that data access occurred — the database export started, the SharePoint download triggered, the OneDrive sync ran. Investigation determines exactly what records were touched, what volume left the environment, and what regulatory notification timeline applies. The data loss assessment feeds the legal framework from Section 0.6, but the assessment itself requires hours of log correlation across CloudAppEvents, OfficeActivity, and AuditLogs that triage can't accommodate.

What is the root cause? Triage identifies the technical entry point — AiTM via phishing, credential spray against a legacy endpoint, USB insertion on a factory floor device. Investigation determines why: why the phishing email bypassed email security, why the account lacked conditional access enforcement, why the legacy endpoint accepted RDP from the internet, why service account credentials were cached on a user workstation. Root cause analysis feeds remediation. Triage feeds investigation.

Measuring the handoff boundary

You can measure how cleanly your team separates triage from investigation. The query below calculates triage duration against total resolution time for closed incidents. If triage durations regularly exceed 60 minutes, analysts are likely drifting into investigation work before handing off.

KQL

// Triage duration vs total resolution — spot investigation creep
SecurityIncident
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where Status == "Closed"
| where CreatedTime > ago(30d)
| extend TriageDurationMin = datetime_diff("minute",
    FirstModifiedTime, CreatedTime)
| extend TotalResolutionHrs = datetime_diff("minute",
    ClosedTime, CreatedTime) / 60.0
| where TriageDurationMin > 0
| summarize
    MedianTriageMin = percentile(TriageDurationMin, 50),
    P90TriageMin = percentile(TriageDurationMin, 90),
    MedianResolutionHrs = round(percentile(TotalResolutionHrs, 50), 1),
    IncidentCount = count()
    by Severity

The output breaks down by severity. What you're looking for: median triage under 30 minutes for High and Critical incidents, under 60 for Medium. If your P90 triage duration exceeds 90 minutes, analysts are consistently crossing the boundary into investigation before handing off. That's the triage creep signal.

The danger of triage creep

Triage creep occurs when the responder classifies an alert as a true positive and then continues into investigation instead of producing the handoff. They open the memory dump. They start reconstructing the full attack timeline. They begin writing the incident report rather than the triage report. The transition feels natural — you've already been working the case, the evidence is in front of you, and stopping feels like abandoning the problem.

Three things go wrong simultaneously. First, the handoff stalls — the investigation team can't begin until they receive the triage report, and every minute spent investigating is a minute the team waits without the evidence inventory and containment record they need. The investigation team's clock doesn't start when the alert fires. It starts when they receive a triage report they can act on.

Second, the triage responder often lacks the forensic tools or time allocation for deep analysis. A 90-minute investigation attempt that produces shallow findings is worse than a 5-minute triage report that hands off to an investigator with proper tooling and a dedicated time block. The responder who reconstructs half the attack timeline leaves the investigation team repeating work rather than extending it.

Third, the alert queue accumulates. Three hours investigating one incident means three hours of high-severity alerts sitting without classification. If your SOC processes 40 alerts per shift and your team has two analysts, each analyst needs to process roughly one alert every 12 minutes to maintain throughput. A single triage creep event that consumes 90 minutes represents 7–8 unclassified alerts — any one of which could be a Critical TP waiting for containment.

The operational discipline is explicit: complete the Triage Trinity (classify, preserve, contain), produce the triage report, hand off, and return to the alert queue. If you are both the triage responder and the investigator — common in teams of one or two — explicitly switch roles. Close the triage phase, produce the report, then begin investigation as a separate activity with a separate timeline.

"There's no real difference — triage is just the first step of investigation"

Even when one person performs both phases, they are distinct operations with distinct constraints. Triage operates under time pressure: 15–60 minutes targeting classification, preservation, and containment. Investigation operates under quality pressure: hours to days targeting root cause, full scope, and eradication. Conflating them means the analyst spends four hours investigating while other alerts accumulate untriaged — or rushes the investigation to return to the queue, producing findings that miss persistence mechanisms and lateral movement. Separating the phases mentally, even as a solo operator, ensures each gets the attention it requires.

What a structured handoff looks like

The investigation team receives the triage report and immediately understands what was found, what was preserved, what was contained, and what remains unknown. They don't re-run triage queries. They don't re-collect volatile evidence. They begin deep analysis from exactly where the triage responder stopped.

Structured Triage Handoff

Bad handoff (Slack message):

"Hey, j.morrison got phished, I revoked the session, can someone investigate?"

Missing: severity, affected entities beyond j.morrison, evidence preserved, containment verification, scope questions for IR team.

Good handoff (structured triage report):

Classification: TP — AiTM token replay, j.morrison, confidence >95%

Severity: HIGH — cloud identity + endpoint compromise confirmed

Evidence preserved: SigninLogs snapshot (SHA256: 4a8b..c1), memory dump DESKTOP-NGE042 (SHA256: 9f2d..e7), auth.log SRV-NGE-BRS-DB01

Containment: Session revoked 08:19 ✓, endpoint isolated 08:35 ✓, svc-dbadmin disabled 08:42 ✓

Outstanding for IR: (1) Other accounts from attacker IP? (2) OneDrive folder seeding scope? (3) Database export volume? (4) External staging server ID?

Triage analyst: T. Ashworth | Completed: 08:47 UTC | Duration: 21 min

The bad handoff forces the investigation team to spend their first 30 minutes reconstructing what the triage responder already knew. The volatile evidence that could have been preserved during those 30 minutes degrades further. The good handoff lets the IR team begin immediately: forensic examination of the memory dump, full scope assessment across the compromised account's access, database audit log analysis, and lateral movement hunting across the four outstanding questions.

Handoff timing by classification

The handoff occurs at different points depending on the triage outcome.

FP or BTP: The triage report is the incident's closing comment. No IR team involvement. The analyst closes the incident in Sentinel with the classification, documented reasoning, and any tuning recommendation for recurring patterns. For BTPs involving authorized activity (penetration testing, IT admin lateral movement), note it for detection tuning so the rule stops generating noise.

Indeterminate: Partial handoff. The triage responder has preserved evidence and documented findings but can't reach a definitive classification. The report explicitly states what remains unresolved — "classification pending user confirmation of login activity, evidence preserved, recommend MFA verification and 24-hour SigninLogs watchlist." The IR team reviews the evidence and makes the final classification.

Confirmed TP: Full handoff. The triage report, preserved evidence, and containment record transfer to the IR team. The investigation begins: deep forensic analysis, scope assessment, root cause determination, regulatory impact evaluation. The triage responder returns to the alert queue.

You can track how your handoffs distribute across these categories over time. The pattern tells you something about both your detection tuning and your triage methodology:

KQL

// Handoff classification distribution — 30 days
SecurityIncident
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where Status == "Closed"
| where ClosedTime > ago(30d)
| summarize
    TotalClosed = count(),
    TruePositives = countif(Classification == "TruePositive"),
    FalsePositives = countif(Classification == "FalsePositive"),
    BenignPositives = countif(Classification == "BenignPositive"),
    Undetermined = countif(Classification == "Undetermined")
| extend FP_Rate = round(100.0 * FalsePositives / TotalClosed, 1)
| extend TP_EscalationRate = round(
    100.0 * TruePositives / TotalClosed, 1)

If your FP rate exceeds 70%, your detection rules need tuning — analysts are spending most of their triage time on non-incidents. If your Undetermined rate exceeds 15%, your triage methodology needs tightening — the scorecard in Section 0.8 addresses exactly this problem. A healthy distribution for a mature team looks like 50–60% FP/BTP (tuned out over time), 25–35% TP (real work), and under 10% Undetermined. Track this distribution monthly.

When detection rules improve, your FP rate drops and your TP rate increases — each triage session produces more actionable handoffs and fewer wasted cycles. That's the feedback loop between triage quality and detection engineering that makes both functions stronger over time.

The solo operator's role switch

In small teams — one or two analysts handling both triage and investigation — the same person performs both phases. The temptation is to treat them as one continuous workflow. Resist it.

The mental discipline is a deliberate role switch. When you finish the Triage Trinity, you produce the report, close the triage activity, and make a conscious decision: return to the queue or begin investigation. If you choose investigation, you're operating on a different timeline with different objectives. You're no longer triaging — you're investigating. The alert queue is paused until you return.

The decision itself is a prioritization call. If the confirmed TP is Critical severity and the queue holds only Medium and Low alerts, investigation is the right choice — the high-impact incident demands depth over throughput. If the TP is Medium and the queue holds two High alerts, returning to the queue is the right choice — those High alerts need classification before you spend three hours on root cause analysis.

The triage report gives you the information to make this call rationally rather than defaulting to whichever activity feels most interesting.

Document that switch. A one-line Sentinel incident comment works: "Triage complete — switching to investigation role. Queue paused at 09:15, 2 pending." This protects you if another high-severity alert arrives while you're investigating. You know when you left the queue. Your team lead knows. The audit trail shows a deliberate operational decision rather than an analyst who disappeared into a rabbit hole.

The same discipline applies when you return from investigation to triage. You check the queue state, re-establish triage mode — fast classification, preservation-focused, containment-focused — and work through accumulated alerts by severity. The context switch is real. Investigation mode rewards depth and patience. Triage mode rewards speed and decisiveness. Moving between them without an explicit transition is how important alerts get triaged at investigation depth (too slow) or investigations get conducted at triage depth (too shallow).

Investigation Principle

Complete the Triage Trinity, produce the triage report, hand off, return to the queue. Triage stops the bleeding. Investigation cures the disease. Both are essential. Both require dedicated attention. The responder who tries to do both simultaneously does neither well.

Section 0.8 teaches the triage scorecard — the structured scoring framework that produces consistent, defensible classifications. Eight weighted questions, action thresholds, and calibration across analyst teams. You'll apply the scorecard to NE alerts and see how it resolves the judgment calls that unstructured triage leaves to intuition.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →