LX0.4 Three Investigation Environments
The Three Environments Where Linux IR Happens
Why the Environment Changes Everything
The Linux operating system is the same regardless of where it runs. An ext4 filesystem on a bare-metal server has the same inode structure as an ext4 filesystem on an AWS EC2 instance. The auth.log format is identical whether the server is in your data centre or in Azure. The commands you run during an investigation are the same.
What changes is how you access the system, how you collect evidence, what additional evidence sources are available, and what evidence is absent. A bare-metal server has a physical disk you can image with a write blocker. A cloud VM has a virtual disk you can snapshot through the provider’s API without ever logging into the system. A container has an ephemeral filesystem that is destroyed and recreated every time the pod restarts — there may be no disk to image at all.
Understanding these differences before the investigation begins prevents wasted effort. An investigator who spends 30 minutes trying to figure out how to “image the disk” of a Kubernetes pod is 30 minutes behind the investigator who knows that container evidence collection requires kubectl cp and docker export before the pod is terminated.
Environment 1: Bare-Metal Servers
Bare-metal servers — physical machines running Linux directly on hardware — are the traditional investigation environment and the one most closely resembling Windows forensics. The server has a physical disk (or RAID array) containing the operating system, application data, and log files. Evidence collection follows the classical forensic workflow: acquire memory with LiME, acquire the disk with dd or dc3dd, verify hashes, and analyze offline.
What you have that other environments lack: physical access to the hardware. You can power off the server and attach the disk to a forensic workstation with a write blocker — the gold standard for evidence integrity. You can acquire memory by loading LiME from a USB drive rather than over the network. You can examine the BIOS/UEFI boot configuration for firmware-level persistence. You can check for physical devices (USB keyloggers, rogue network adapters) connected to the server.
What bare-metal lacks: cloud audit trails. There is no CloudTrail, no Activity Log, no Cloud Audit Log recording API calls against the server. The only evidence of what happened is on the server itself and in whatever network monitoring captured traffic to and from it. If the attacker deleted the logs on the server, you have no external copy unless logs were forwarded to a SIEM.
Collection approach: For high-severity incidents requiring evidence integrity (legal proceedings, law enforcement), collect memory with LiME, then power off and image the disk with a write blocker. For lower-severity incidents or business-critical servers that cannot go offline, perform a live collection: memory dump, /proc snapshot, volatile data collection, then live disk image using dc3dd if=/dev/sda | ssh forensics@workstation "dd of=/cases/case001/disk.raw".
Northgate Engineering example: DBSRV-NGE01 is a bare-metal Dell PowerEdge running Ubuntu 22.04 with a RAID-10 array. It hosts the customer database (PostgreSQL). A compromise of this server requires physical access to the data centre for write-blocked disk acquisition. Memory acquisition uses LiME loaded from a USB drive. The RAID configuration means the disk image must be acquired at the logical volume level (/dev/mapper/vg0-root), not the physical disk level.
Environment 2: Cloud Virtual Machines
Cloud VMs (AWS EC2, Azure VMs, GCP Compute Engine) run Linux on virtualized hardware managed by the cloud provider. The investigation approach is fundamentally different from bare-metal because you have two evidence planes: the VM itself (containing the same Linux artifacts as a bare-metal server) and the cloud control plane (containing API audit trails that record every action taken against the VM through the provider’s management layer).
What you have that bare-metal lacks: cloud audit trails. AWS CloudTrail records every API call made against your AWS account — who launched the instance, who modified the security group, who created a new IAM role. Azure Activity Log records the same for Azure resources. GCP Cloud Audit Logs records the same for GCP. These logs exist outside the VM and cannot be tampered with by an attacker who only has access to the VM. They survive VM termination. They often provide the first evidence of how the attacker gained access — particularly for attacks that exploited cloud-native vulnerabilities like SSRF-based metadata service abuse (the attacker calls the instance metadata service at 169.254.169.254 to steal IAM credentials, then uses those credentials to access other cloud resources).
What cloud VMs lack: physical access. You cannot attach a write blocker to a virtual disk. You cannot plug a USB drive into a VM to load LiME. Evidence collection uses the cloud provider’s API: create a disk snapshot (which creates a point-in-time copy of the virtual disk), then attach the snapshot to a forensic analysis VM as a secondary volume. Memory acquisition is harder — most cloud providers do not expose hypervisor-level memory dump capability to customers. You must install LiME on the running VM (which requires SSH access to a potentially compromised system) or use Volatility 3’s live memory access through /dev/mem or /proc/kcore (which is restricted on most hardened kernels).
The metadata service attack pattern: The cloud-specific attack you will encounter most frequently is SSRF (Server-Side Request Forgery) targeting the instance metadata service at http://169.254.169.254/latest/meta-data/. If a web application running on the VM has an SSRF vulnerability, the attacker can extract the IAM role credentials assigned to the instance, then use those credentials from their own infrastructure to access S3 buckets, databases, secrets managers, and other cloud resources. The evidence of this attack is split: the SSRF request appears in the web application logs on the VM, while the credential use appears in CloudTrail/Activity Log in the cloud control plane. Correlating across both evidence planes is essential — and is the focus of LX10 (Cloud VM Compromise).
Collection approach: Snapshot the disk through the cloud provider’s API (no need to log into the VM — this preserves the disk state without modifying it). Collect CloudTrail/Activity Log/Cloud Audit Logs for the time window. If memory evidence is needed, SSH into the VM and acquire memory with LiME (accepting that this modifies the system). Collect the instance metadata, security group configuration, IAM role permissions, and network flow logs through the API.
Environment 3: Containers
Containers (Docker, Kubernetes, Podman) are the most challenging investigation environment because the fundamental assumption of forensics — that evidence persists until collected — does not hold. A container’s filesystem is ephemeral. When a Kubernetes pod is terminated and rescheduled, the new pod starts from the base image with no knowledge of what happened in the previous instance. The attacker’s files, the modified configurations, the bash history, the logs — all gone.
What makes container forensics different: the filesystem is layered. A Docker container starts from a base image (read-only layers) and adds a writable layer on top. Any files the attacker creates or modifies exist only in the writable layer. If you can capture the writable layer before the container is destroyed (docker export or docker diff), you have the attacker’s modifications isolated from the base image — a cleaner evidence set than a bare-metal investigation where attacker files are mixed with hundreds of thousands of system files.
What containers lack: persistence across restarts, traditional log file retention (container logs are typically streamed to stdout/stderr and collected by the container runtime, not written to files inside the container), and the /proc filesystem is namespace-isolated (each container sees only its own processes, not the host’s processes). Memory forensics is especially difficult — you cannot run LiME inside a container (it requires kernel module loading, which is a host-level operation), and acquiring host memory includes all containers mixed together.
The Kubernetes evidence plane: Kubernetes adds an orchestration layer above the container runtime. The Kubernetes audit log records API calls: who created the pod, who executed into it (kubectl exec), who modified the deployment, who changed RBAC roles. The audit log is the Kubernetes equivalent of CloudTrail — it exists outside the container and survives container destruction. The etcd database stores cluster state, including secrets, configmaps, and service account tokens. If the attacker abused a service account token to escalate privileges within the cluster, the evidence is in the Kubernetes audit log and etcd, not in the container filesystem.
Collection approach: Speed is critical. If you suspect a container is compromised, collect evidence before it restarts. docker export <container_id> > evidence.tar captures the full filesystem. docker logs <container_id> captures the container’s stdout/stderr output. docker inspect <container_id> captures the container’s configuration, environment variables, network settings, and mounted volumes. kubectl logs <pod_name> --all-containers captures logs from all containers in a Kubernetes pod. kubectl cp <pod_name>:/path /local/path copies specific files out of a running pod. If the container has already been destroyed, your evidence sources narrow to: the container runtime logs, the Kubernetes audit log, any persistent volumes that were mounted into the container, and the host-level Docker data directory (/var/lib/docker/).
Northgate Engineering example: K8S-NGE is a Kubernetes cluster running the company’s customer-facing API. A compromised pod running the API gateway container has been detected sending outbound connections to an unknown IP. The pod is running, but Kubernetes’ liveness probe might restart it at any moment. The investigator’s first action: kubectl exec into the pod and collect /proc data and container filesystem artifacts before the restart. Second action: collect Kubernetes audit logs and the pod’s event history. Third action: check whether the attacker escaped the container to the host node.
| |
Worked artifact — Environment identification and collection plan:
Complete this plan at the start of every investigation before running collection commands.
Case: INC-2026-XXXX System: [hostname/pod name]
Environment identification:
- ☐ Bare-metal (physical server, data center: ___)
- ☐ Cloud VM (provider: AWS/Azure/GCP, instance ID: ___, region: ___)
- ☐ Container (runtime: Docker/containerd, orchestrator: K8s/ECS/none, image: ___)
Evidence planes available:
- ☐ OS-level: filesystem, logs, /proc, memory
- ☐ Cloud control plane: CloudTrail/Activity Log/Cloud Audit Logs
- ☐ Orchestrator: Kubernetes audit log, etcd, pod events
- ☐ Network: VPC flow logs, security group logs, WAF logs
Collection approach:
- Bare-metal: ☐ LiME (USB) → ☐ /proc + volatile → ☐ logs → ☐ dc3dd disk image
- Cloud VM: ☐ Disk snapshot (API) → ☐ Cloud audit logs → ☐ SSH live response → ☐ LiME
- Container: ☐ docker export (URGENT) → ☐ logs → ☐ inspect → ☐ K8s audit → ☐ host check
Time constraint: Container restart expected: ☐ Yes (___min) ☐ No ☐ Unknown Escape indicator: ☐ Docker socket mounted ☐ Privileged container ☐ CAP_SYS_ADMIN ☐ None detected
Myth: “Cloud VM investigations are easier because the cloud provider handles security.”
Reality: Cloud providers secure the infrastructure (hypervisor, physical hardware, network fabric) under the shared responsibility model. They do not secure the operating system, the applications, or the data inside your VM. A compromised cloud VM contains the same OS-level artifacts as a compromised bare-metal server — and the cloud provider will not collect evidence for you. The investigator must collect OS-level evidence from the VM AND cloud control plane evidence from the provider’s audit logs. Ignoring either evidence plane means missing half the story. The cloud provider’s audit trail is often the only evidence of how the attacker gained initial access (stolen API keys, compromised IAM role, exposed metadata service).
Decision points: choosing the collection approach by environment
Known bare-metal server: default to hybrid approach — volatile collection first, then disk imaging. If legal proceedings are likely, power off for write-blocked acquisition after volatile capture.
Known cloud VM: snapshot the disk via API first (cleanest acquisition, no system modification), then collect cloud audit trails, then SSH in for volatile collection. If memory evidence is critical, LiME must be deployed via SSH.
Known container: collect immediately — docker export the filesystem before any restart. If the container has already restarted, pivot to runtime logs, K8s audit log, persistent volumes, and host-level Docker data directory.
Unknown environment: run the environment identification commands above. The first 60 seconds of the investigation determine your entire collection approach.
Troubleshooting: environment-specific problems
Cloud VM: disk snapshot fails with permission error. The IAM role or user account you are using does not have ec2:CreateSnapshot (AWS) or equivalent permission. Request elevated permissions from the cloud team — document the time spent waiting as it affects evidence volatility.
Container: docker export fails because the container already terminated. Check if the container was removed: docker ps -a | grep <name>. If it shows as “Exited” (not removed), you can still export. If it was removed (docker rm), check /var/lib/docker/overlay2/ on the host for the writable layer.
Bare-metal: RAID array prevents direct disk imaging. Image at the logical volume level (/dev/mapper/vg0-root) rather than the physical disk level. If the RAID controller presents a logical device, image that device. Document the RAID configuration from the controller (level, stripe size, member disks) for the evidence chain.
Cannot determine the environment from inside the system. The metadata service checks above may fail on hardened systems that block 169.254.169.254. Check /sys/class/dmi/id/product_name — it shows “HVM domU” (AWS), “Virtual Machine” (Azure/Hyper-V), or “Google Compute Engine” (GCP) for cloud VMs. Check /proc/1/cgroup for container indicators.
Try it: If you have Docker installed, run a test container and examine the evidence landscape: docker run -d --name forensic-test ubuntu sleep 3600. Then run: docker inspect forensic-test | head -50 (see the configuration data available to an investigator). Run docker diff forensic-test (see what files changed from the base image — currently none because you have not modified anything). Run docker exec forensic-test ls -la /proc/1/ (see the process information available inside the container). Run docker export forensic-test > /tmp/container-export.tar (capture the filesystem). Then docker rm -f forensic-test to clean up. You have just performed the core container evidence collection workflow.
Beyond This Investigation
The three-environment model applies to every investigation in this course. LX4–LX8 focus primarily on bare-metal and VM evidence (filesystem, logs, processes). LX9 focuses specifically on container forensics. LX10 focuses on cloud VM compromise with the cloud control plane evidence. LX11 (Lateral Movement) spans all three environments — the attacker may pivot from a container to the host, from the host to the cloud API, and from the cloud API to other VMs. The investigator who understands all three evidence planes can follow the attacker across all of them.
Check your understanding:
- A Kubernetes pod was restarted by the liveness probe 15 minutes before you were notified. What evidence from the previous pod instance is still available, and what is lost?
- An attacker compromises an AWS EC2 instance via SSRF against the metadata service. The SSRF request is in the web server access log on the instance. Where is the evidence of what the attacker did with the stolen IAM credentials?
- Why is disk snapshot via the cloud provider’s API preferred over logging into the VM and running
dd? - You need to investigate a compromised Docker container that is still running. List the three commands you run first and what each collects.
You're reading the free modules of this course
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.