LX1.9 The Triage Decision Framework

3-4 hours · Module 1 · Free

Triage Decision Framework: What to Collect First and Why

Learning objective: Build the decision-making framework for prioritizing evidence collection under time pressure — which evidence to collect first when you cannot collect everything, how to assess the severity and urgency of an incident from initial indicators, and how to adapt the collection sequence based on the investigation environment, available access, and the attacker’s current activity.

The Triage Problem

In theory, you collect everything. In practice, you have constraints: the attacker may be active and you need to contain before they exfiltrate more data, the server is business-critical and must return to service within hours, the incident affects 15 servers and you are one investigator, or the container will restart in 3 minutes and you need to decide what to grab first.

Triage is the discipline of prioritizing evidence collection based on what is most volatile, most relevant to the investigation questions, and most at risk of destruction. An investigator who follows a rigid collection checklist regardless of circumstances collects evidence methodically but slowly. An investigator who triages effectively focuses on the evidence that answers the most critical questions first, then expands collection as time permits.

Decision Factor 1: Is the Attacker Currently Active?

This is the single most important triage factor. If the attacker is currently logged in (visible via who, w, or active SSH sessions in ss -tnp), the priority shifts dramatically toward capturing their current activity.

Attacker active — immediate priority: Capture the running process list with /proc direct reads (their tools, their reverse shells, their processes). Capture the network connection state (their C2 channels, their lateral movement connections). Capture the contents of /dev/shm and /tmp (their staged tools and exfiltrated data). Acquire memory if LiME is available (captures everything in one shot). Then — and only then — move to log files and persistent evidence.

Attacker not active (or status unknown) — standard priority: Follow the collection sequence from LX1.5 in order. Volatile evidence first but without the extreme urgency. You have minutes to hours rather than seconds to minutes.

Decision Factor 2: What Is the Investigation Question?

Different investigation questions require different evidence prioritization. If you know the incident type from the initial alert, you can focus collection on the evidence sources most relevant to that type.

Credential compromise (SSH brute force, stolen credentials): Priority evidence: auth.log / secure (authentication events), wtmp and btmp (login records), .ssh/authorized_keys (persistence), lastlog (last known access per account). The filesystem and network state are secondary — the authentication logs tell the story.

Web application compromise (web shell, SQLi, RCE): Priority evidence: web server access and error logs (the exploitation request), the web root directory (web shell files), the process tree (reverse shell parent/child relationships), /tmp and /dev/shm (staged payloads). Authentication logs are secondary — the attacker may not have authenticated through SSH at all.

Cryptomining: Priority evidence: running process list with /proc direct reads (the miner process and its command line), network connections (mining pool connections), CPU utilization data. Log files and filesystem artifacts are secondary — the miner is running now and the evidence is in the process and network state.

Ransomware: Priority evidence: memory dump (encryption keys may still be in process memory), filesystem state (encrypted files, ransom notes, encryption binary), the process tree (the encryption process may still be running). Immediate containment (network isolation) takes priority over collection — every minute the system stays online, more files are encrypted.

Data exfiltration: Priority evidence: network connections (active exfiltration channels), bash history (exfiltration commands), auditd file access records (what files were read), /tmp and /dev/shm (staged data archives). Time-sensitive: the data may be leaving the network right now.

Decision Factor 3: How Many Systems Are Affected?

A single compromised server gets the full collection treatment — all phases, all evidence types. But what about 15 compromised servers? Or 50 containers in a Kubernetes cluster?

Multi-system triage: Run UAC ir_triage (fast profile) on every affected system first. This captures the most critical evidence from all systems in a fraction of the time a full collection takes. After triage data is secured from all systems, return to the highest-priority systems for comprehensive collection (memory, full UAC, disk imaging).

The worst outcome in a multi-system incident is spending 3 hours on a full collection of the first server while evidence is being destroyed on the other 14. The better outcome: 15 minutes of triage on each system (3.75 hours total), then full collection on the systems where the triage data revealed the most significant compromise.

Decision Factor 4: What Access Do You Have?

Your available access determines which collection methods are feasible:

SSH access only (most common): Full live response, UAC, remote disk imaging. Memory acquisition requires transferring a pre-compiled LiME module. All evidence streams to your forensic workstation over SSH.

Cloud console only (no SSH): Disk snapshot via API, cloud audit trail collection, security group and IAM configuration export. No live volatile collection — you cannot run commands on the system. If the investigation requires volatile evidence, you must arrange SSH access first.

kubectl only (container clusters): Pod-level collection only. No host access. If container escape is suspected, you need node-level SSH access from the cluster administrator.

Physical access only (air-gapped environments): USB-based collection. LiME from USB. UAC from USB with output to USB. Disk imaging with write blocker. No remote streaming.

Try it: Run a triage scenario. Set a timer for 10 minutes. Scenario: WEBSRV-NGE01 is showing suspicious outbound connections to an unknown IP on port 4444. The attacker may be active. You have SSH access. What do you collect in those 10 minutes? Write down your first 5 actions in order, with the exact commands. After the timer, review: did you capture the most volatile evidence first? Did you identify the attacker’s process? Did you capture the C2 connection details? This exercise builds the muscle memory for real-time triage decisions.

Beyond This Investigation

The triage framework applies to every scenario module. LX4–LX13 each begin with an initial alert and a time-constrained collection phase. The triage decisions you make in that initial phase determine the quality of evidence available for the analysis that follows. Investigators who triage effectively have more evidence, better evidence, and faster investigation outcomes.

Check your understanding:

The attacker is currently active on a compromised server. You have SSH access and a pre-compiled LiME module. What are your first three collection actions?
You are the sole investigator for 8 compromised Linux servers. What collection strategy do you use, and why?
A web server compromise has been detected through an alert on outbound connections. You do not know the specific incident type yet. What is your initial evidence collection priority?
You have cloud console access but no SSH access to a compromised AWS EC2 instance. What evidence can you still collect, and what is the most critical gap?

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus

← LX1.8 Container and Kubernetes Evidence Collection LX1.10 Collection Scripting and Automation →