In this section

TR1.6 The Preservation Decision Tree

6-7 hours · Module 1 · Free

What you already know

From TR1.1–1.4 you know what volatile evidence exists across cloud, Windows, and Linux environments, and the order in which it disappears. But knowing what to collect is only half the problem. The other half is when to collect it — specifically, whether to preserve evidence before containment or contain the threat before preservation. This section provides the decision tree that resolves that tension, then covers how to maintain the integrity of everything you collect.

Scenario

It's 06:52 at Northgate Engineering. The on-call analyst has confirmed the credential-stuffing compromise across three environments: the svc-backup cloud account with OAuth persistence and inbox forwarding (TR1.2), the WS-FIN-042 Windows workstation running a DLL sideloading payload with active C2 (TR1.3), and the web-prod-01 Linux server with Nginx proxying attacker traffic (TR1.4). Rachel Okafor joins the bridge call. The attacker is not currently executing new actions — the C2 beacon is checking in every 30 seconds, but no new commands have been issued since 03:22. Rachel asks the question that determines everything that happens next: "Is the attacker actively causing damage right now, or are they dormant?"

The one question that determines the sequence

The preservation decision tree resolves to a single question: is the attacker actively causing damage right now?

"Actively causing damage" means the attacker is performing destructive, exfiltrative, or propagation actions at this moment. Not "the attacker was active 3 hours ago." Not "the attacker might become active later." Right now. Ransomware encrypting files is active damage. Data streaming to an external server is active damage. A C2 beacon checking in on a 30-second interval without issuing commands is not active damage — it's a dormant implant waiting for instructions.

Figure TR1.6 — The preservation decision tree. One question determines the sequence: is the attacker actively causing damage right now? Both paths eventually complete both actions — only the order differs.

The distinction matters because both containment and preservation are time-sensitive, and doing them in the wrong order has irreversible consequences. If the attacker is actively encrypting files and you spend 5 minutes capturing a memory dump first, those 5 minutes of additional encryption may be the difference between recovering from backup and losing irreplaceable data.

If the attacker is dormant and you rush to contain by killing the C2 process, you've destroyed the network connection evidence, the process memory, and the loaded DLL that the investigation team needed to understand the attacker's tooling and objectives.

In the NE scenario, Rachel's answer is clear: the attacker is dormant. The last C2 command was issued at 03:22 — over 3 hours ago. The beacon is checking in but receiving no instructions. This means the response team preserves first across all three environments before executing containment.

Contain-first: when the attacker is actively causing damage

Three scenarios demand immediate containment regardless of evidence impact. Active ransomware encryption is the most urgent — every second of delay means more files lost. The response is to network-isolate the affected endpoint through Defender for Endpoint or physically disconnect the network cable. Isolation stops lateral spread and C2 communication while keeping the system powered on. Memory remains intact. The ransomware process continues running on local files but cannot reach other systems. After isolation, the responder captures volatile evidence from the still-running system.

Active data exfiltration follows the same logic but offers a more surgical containment option. Blocking the destination IP at the network perimeter (firewall rule) stops the data transfer without touching the source endpoint. The exfiltration process keeps running on the endpoint — it just fails on the next connection attempt — preserving process memory, loaded modules, and the connection's source port for forensic capture. This is the ideal contain-first action because it stops the damage with zero evidence destruction on the endpoint.

A live business email compromise in progress — the attacker composing or sending emails from a compromised mailbox — demands account suspension in Entra ID. Revoking the session and resetting the password stops the BEC activity. The evidence impact is minimal: mailbox contents, sent items, and Entra sign-in logs are all persistent and survive the session revocation. The only evidence lost is the active session's token, which Sentinel already captured in SigninLogs.

The critical insight is that some containment actions preserve evidence while others destroy it. Network isolation through Defender keeps the system running with full volatile evidence intact — the responder can still access the system through Live Response. A firewall block preserves everything on both sides. Account suspension preserves cloud logs and mailbox state. But a hard shutdown eliminates all volatile evidence: memory, processes, connections, DNS cache, ARP table. Reimaging a system destroys both volatile and disk evidence.

The triage responder must choose the containment action that stops the damage with the least evidence impact.

After contain-first execution, the responder still captures whatever volatile evidence survived the containment action. If the system was network-isolated but not powered off, memory and process state are still available — acquire them immediately after isolation, before anyone restarts the system. If the system was shut down (because isolation wasn't possible and active damage was ongoing), disk evidence — event logs, prefetch files, registry hives, $MFT — is still available from the powered-off disk.

The investigation scope narrows (no memory evidence), but disk forensics can still reconstruct the attacker's persistence mechanisms, lateral movement paths, and timeline.

Preserve-first: when the attacker is dormant

When the attacker is dormant — persistence planted, C2 beacon checking in, but no active commands — the response team has a window to capture volatile evidence before containment. The window's duration depends on the environment. A cloud OAuth token might persist for days. A dormant Windows implant survives until someone reboots the system or the scheduled Windows Update triggers an automatic restart. A Docker container with restart: always survives until any crash or daemon restart triggers an automatic replacement.

The NE scenario is a preserve-first situation across all three environments. The response team assigns parallel tasks: one analyst captures the Windows endpoint evidence (TR1.3 five-minute sequence), another captures the Linux container and host evidence (TR1.4 sequence), and the cloud team begins UAL export and sign-in log preservation. Parallel execution is critical — three analysts working simultaneously compress a 45-minute sequential evidence collection into a 15-minute parallel window.

The containment actions — endpoint isolation, firewall block of the C2 IP, OAuth app revocation, account suspension — execute only after volatile evidence capture is complete across all environments. Simultaneous containment across all three environments prevents the attacker from detecting containment in one environment and pivoting or destroying evidence in the others.

The coordination challenge in parallel preserve-first operations is communication. Each analyst works independently on their assigned environment but must signal when evidence capture is complete. The NE team uses the Sentinel incident chat for real-time coordination: "Windows volatile capture complete, proceeding to KAPE." "Linux /proc and container state captured, starting log collection." "Cloud sign-in export complete, UAL export queued." Rachel monitors the chat and gives the containment go-ahead only when all three environments report evidence capture complete.

If any environment reports an issue — AVML blocked by kernel_lockdown, KAPE failing on a locked file — the team adjusts. Rachel may authorise containment on the completed environments while the problematic environment's analyst works the issue, or she may wait for all three to complete. The decision depends on the risk: is the additional delay in one environment worth maintaining simultaneous containment across all three?

The preserve-first decision document is filed in the Sentinel incident as a comment before collection begins. It records the reasoning at the time the decision was made — not after the outcome is known.

If the attacker sends a destructive command during the preservation window and the team switches to contain-first, the decision log shows the team made a defensible choice based on the available evidence, not that they gambled and lost.

Analyst Decision — Preserve-or-Contain for NE Incident

Decision: Preserve first, then contain.

Rationale: Last observed attacker command at 03:22. Current time 06:52. C2 beacon active but no new instructions issued in 3.5 hours. No active encryption, exfiltration, or lateral movement detected. The attacker is dormant — likely operating in a different timezone and will return during their business hours.

Risk accepted: The attacker could issue a new command at any moment. If the attacker sends a destructive command during evidence collection, the team immediately switches to contain-first. The 15-minute preservation window is an acceptable risk given 3.5 hours of dormancy.

Escalation trigger: If any Sentinel alert fires indicating new attacker activity during preservation, all teams stop evidence collection and execute immediate containment across all three environments simultaneously.

The uncertain middle ground

The most common real-world situation is neither clearly active nor clearly dormant. The alert fired 20 minutes ago based on a suspicious process that is still running, but the analyst cannot determine from available telemetry whether the attacker is actively operating. The default depends on severity. For high-severity indicators — ransomware signatures, credential dumping patterns, known exfiltration tools — default to contain-first.

The cost of unnecessary containment (a brief disruption to one system) is less than the cost of allowing an active ransomware campaign to continue. For medium and low severity indicators, spend 2–3 minutes running Tier 1 capture commands (processes, connections) to assess attacker state. The results tell you whether active connections exist and whether destructive processes are running.

This 2–3 minute assessment window is a deliberate, time-limited investigation — not an open-ended analysis. The analyst runs Get-NetTCPConnection (Windows) or ss -tnp (Linux) to check for active outbound connections to known-bad IPs or unusual destination ports. They run Get-Process or ps auxwf to check for processes consuming unusual CPU or performing high volumes of file I/O (a ransomware signature). They check the Sentinel incident timeline for any attacker actions in the last 30 minutes.

If the 2–3 minute assessment reveals active damage indicators, the analyst immediately switches to contain-first. If it reveals a dormant state (no active connections, no suspicious process activity, no recent attacker commands), they proceed with preserve-first.

The key discipline is committing to the time limit. An analyst who spends 15 minutes trying to determine attacker state has consumed the entire preservation window in uncertainty. If you cannot determine active vs dormant within 3 minutes, assume the higher risk and default to contain-first with an evidence-preserving containment action (network isolation, not shutdown). The decision log should document the uncertainty: "Unable to determine attacker state within 3-minute assessment window. Defaulting to contain-first with network isolation. Volatile evidence capture will proceed immediately after isolation."

Containment blast radius

Not all containment actions are equal in their impact on evidence. Network isolation through Defender for Endpoint keeps the system running with full volatile evidence intact — the responder can still access the system through Live Response. A firewall block preserves everything on both sides. Account suspension preserves cloud logs and mailbox state. But a hard shutdown eliminates all volatile evidence: memory, processes, connections, DNS cache, ARP table. Reimaging a system destroys both volatile and disk evidence.

The triage responder chooses the containment action that stops the damage with the least evidence impact.

The escalation path matters too. When the first containment action is insufficient — network isolation doesn't stop a process that's encrypting local files — the responder escalates to the next level of containment while still minimising evidence destruction.

The escalation sequence for a Windows endpoint: network isolation (preserves all evidence) → process termination of the specific malicious process (preserves memory state of all other processes, loses the targeted process's memory) → hard shutdown (destroys all volatile evidence, preserves disk) → reimage (destroys everything). Each step destroys more evidence than the previous one. The triage responder uses the minimum containment action that stops the active damage.

The decision matrix: active threat with evidence-preserving containment available → contain first using isolation or firewall block. Active threat with only evidence-destroying containment available → preserve first (capture memory and Tier 1 volatile evidence), then contain. Dormant threat → preserve first, then use evidence-preserving containment.

The manager override

A non-security manager demands immediate system restoration before evidence is preserved: "Bring the server back online NOW." The triage responder pushes back with a factual explanation: restoring the server destroys evidence that can never be recovered, and a 15-minute delay for evidence capture is less disruptive than discovering during investigation that the evidence no longer exists. If the manager insists, document the override — who requested it, when, what evidence was at risk — and escalate to the CISO. At NE, Rachel's SOP gives evidence preservation priority over non-emergency restoration. Only the CISO can override that policy. This pre-approved authority eliminates the on-the-spot debate that costs minutes during an incident.

Triage Principle

The preservation decision is not "preserve vs contain" — it's "which one first." Both actions always happen. Active damage demands immediate containment, then preservation of what remains. Dormant threats permit preservation first, then containment. In both cases, choose containment actions that preserve evidence (network isolation, firewall blocks) over actions that destroy it (hard shutdown, reimage).

Section 1.7 covers chain of custody and evidence integrity — the SHA256 hashing discipline, the chain-of-custody log template, evidence transfer procedures, and the documentation that makes evidence defensible for investigation, regulatory compliance, and legal proceedings. The 30 seconds spent hashing at collection time prevents the integrity questions that derail investigations days later.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →