LX0.3 The Volatility Problem
The Volatility Problem: Why Collection Order Determines Investigation Success
Evidence that disappears
On Windows, most forensic evidence is persistent. The registry survives reboots. Prefetch files survive reboots. The Event Log survives reboots. The MFT survives reboots. An investigator who images a Windows system three days after a compromise will recover most of the evidence — the same artifacts that existed during the incident still exist on the disk.
On Linux, critical evidence categories are ephemeral by design. They exist only while the system is running, only while a process is active, or only until the next log rotation cycle. An investigator who waits three days to begin collection may find that the most important evidence no longer exists — not because the attacker destroyed it, but because the system’s normal operations did.
/proc is entirely volatile. Every file in /proc is generated by the kernel in real time. When a process terminates, its /proc/[pid]/ directory vanishes — along with the command line arguments, the executable path, the memory maps, the file descriptors, the network connections, and the environment variables. If the attacker’s cryptominer is killed before you read /proc/[pid]/exe, you cannot recover the binary through this method.
Memory is volatile. Everything in RAM — running process state, decrypted data, session tokens, network connection state, kernel module lists — vanishes on reboot. On Linux, memory acquisition is harder than on Windows because there is no built-in memory dump capability equivalent to Windows crash dumps. You must load a kernel module (LiME) onto the running system, and that module must be pre-compiled for the exact kernel version.
/dev/shm is always volatile. This is a RAM-backed filesystem (tmpfs). Nothing written to /dev/shm ever touches the disk. On reboot, everything is gone. Attackers specifically use /dev/shm because they know it will not appear in disk forensics — if you image the disk without first collecting /dev/shm, you lose whatever the attacker stored there.
/tmp is volatile on modern distributions. Systems using systemd-tmpfiles or tmpfs for /tmp clear the directory on reboot. Files the attacker staged in /tmp — downloaded payloads, compiled exploits, exfiltrated data archives — are gone after a reboot.
Network connections are volatile. The current network connection state exists only in kernel memory. ss -tnp shows active connections right now. Once a connection is closed or the system rebooted, the state is gone.
| |
The order of volatility
The order of volatility defines the sequence in which evidence should be collected — most volatile first, least volatile last. Collecting in the wrong order means evidence is destroyed by the time you get to it.
What routine operations destroy
The volatility problem is not just about reboots. Routine system operations destroy evidence continuously on a running Linux system.
Log rotation is the most significant evidence destroyer. The logrotate daemon runs daily (typically at 06:25 via cron) and rotates log files based on size and age policies. A typical configuration rotates auth.log weekly, keeping 4 rotated copies. That gives you approximately 28 days of authentication history.
systemd journal cleanup runs automatically based on the policies in /etc/systemd/journald.conf. The SystemMaxUse setting limits total journal size. When the limit is reached, the oldest journal entries are deleted. On busy servers, the journal may retain only a few days of history.
tmpfiles cleanup removes files from /tmp based on age. The systemd-tmpfiles-clean.timer runs daily and removes files older than the threshold specified in /usr/lib/tmpfiles.d/tmp.conf (default: 10 days for /tmp).
| |
Worked artifact — Retention assessment template:
Complete this assessment at the start of every investigation. It tells you how far back your evidence reaches and identifies gaps before you waste time looking for evidence that has already been rotated out.
Case: INC-2026-XXXX System: [hostname]
Log retention:
- auth.log rotation: weekly / daily, copies retained: , history window: ~ days
- Journal disk usage: ___MB, max configured: MB, estimated retention: ~ days
- Audit.log rotation: ___ copies, size limit: ___MB per file
Volatile state:
- /tmp mount type: tmpfs / disk-backed, cleanup policy: ___ days
- /dev/shm contents: ___ files (___ total size) — COLLECTED: ☐ YES ☐ NO
- /proc enumeration: ___ processes captured — COLLECTED: ☐ YES ☐ NO
- Network state (ss -tnp): COLLECTED: ☐ YES ☐ NO
- Memory (LiME): COLLECTED: ☐ YES ☐ NO ☐ NOT AVAILABLE (reason: ___)
Compromise timeline vs retention: Estimated compromise start: ___ | Log history reaches: ___
- If compromise predates log retention: ☐ Check SIEM/remote logs ☐ Check wtmp ☐ Check journal
Assessment: Evidence is / is not sufficient to cover the compromise window. Gaps: ___
The collection decision: live response vs power off
Every Linux investigation presents a fundamental decision: do you collect evidence from the live system (preserving volatile data but risking evidence modification), or do you power off and image the disk (preserving disk state but losing all volatile data)?
Choose live response when: the investigation requires volatile evidence (running processes, active network connections, memory contents, /dev/shm or /tmp contents), the server is business-critical and cannot be taken offline, or you suspect a rootkit (memory forensics may be the only way to detect it). Live response is the default for cloud and container investigations.
Choose power off when: evidence integrity is paramount (legal proceedings, law enforcement involvement), you have no immediate need for volatile data, or the system is compromised to the point where live commands cannot be trusted (a kernel rootkit makes all userspace tools unreliable).
The hybrid approach (the recommended default): collect volatile evidence first (memory dump, /proc snapshot, network state, /tmp and /dev/shm contents, running process list), then either image the disk live or power off and image from external media. This preserves both volatile and persistent evidence.
Myth: “We should immediately shut down a compromised server to preserve evidence.”
Reality: Shutting down a Linux server destroys all volatile evidence — running processes, memory contents, network connections, /dev/shm contents, and current login state. The correct first step is to collect volatile evidence, then decide whether to image live or power off. Immediate shutdown is the worst possible first response for a Linux investigation because the evidence destroyed by shutdown (memory, processes, connections) is often the most valuable for understanding what the attacker was doing at the moment of discovery. The only exception: active ransomware encryption where every second of uptime means more files encrypted.
Decision points: what determines collection urgency
The urgency of volatile collection depends on three factors:
Is the attacker currently active? If the attacker is logged in (visible in who or w), their processes and connections are live evidence that could disappear at any moment — if they detect the investigation and disconnect, their session state is lost. Collection urgency: immediate.
Is a reboot imminent? If the IT team is about to reboot, if a kernel panic is occurring, or if a cloud auto-healing mechanism is about to terminate the instance, you have minutes to collect volatile data. Collection urgency: critical.
Has the attacker deployed anti-forensic triggers? Some malware monitors for investigator activity (new SSH connections, specific commands) and triggers evidence destruction when detected. If you suspect this, memory acquisition should be the absolute first action — before any commands that the malware might intercept.
Troubleshooting: evidence already lost
The server was rebooted before you were called. All volatile evidence (memory, /proc, /dev/shm, network state, utmp) is permanently gone. Focus on persistent evidence: log files, filesystem timestamps, configuration files, disk image. The journal may retain events from before the reboot. wtmp survives reboots and shows the login history including the session that was active when the reboot occurred.
Log rotation ran overnight and deleted the oldest log file. The oldest rotated copy is gone. Check if the events exist in the systemd journal (separate retention). Check if logs were forwarded to a SIEM or remote syslog server. Check wtmp for authentication events that overlap the lost log period.
/dev/shm was collected but is empty. The attacker may have cleaned up before disconnecting, or the staging files may have been in /tmp instead. Check /tmp (may or may not be tmpfs). Check for evidence of cleanup in bash history or auditd.
Memory acquisition failed because LiME was not pre-compiled. Document the gap. Proceed to /proc enumeration as the best available volatile evidence. The /proc data is not as complete as a full memory dump (you cannot detect kernel rootkits, recover encryption keys, or analyze process memory) but it captures running processes, network connections, and open files.
Try it: Check the log retention on a Linux system you manage. Run the retention assessment commands from the block above. Calculate: based on the rotation policy and journal settings, how many days of log history do you have? If a compromise began 30 days ago, would you have the authentication evidence from day one? If not, where else would you look? Complete the retention assessment template for one of your systems — you are building the muscle memory for the first 5 minutes of every future investigation.
Beyond this investigation
The volatility hierarchy in this subsection defines the collection sequence you will follow in LX1 (Evidence Collection and Triage). Every scenario module (LX4–LX13) assumes evidence was collected in this order — volatile evidence first, persistent evidence second. When you reach LX12 (Memory Forensics), you will work with memory dumps that were acquired as the first step of collection, before any other investigation activity modified the system state.
Check your understanding:
- A server was rebooted by the IT team before the security team was notified. Which evidence sources are now permanently lost, and which survived the reboot?
- Log rotation deleted
auth.log.4.gzovernight. What is the maximum age of authentication events you can recover from the remaining log files (assuming weekly rotation with 4 copies)? - Why is
/dev/shma preferred staging location for attackers, and what must an investigator do to collect evidence from it before it is lost? - You need to investigate a compromised production web server that cannot be taken offline. What collection approach do you use and in what order?
You're reading the free modules of this course
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.