In this section

TR1.9 Live Response Scripting and Automation

6-7 hours · Module 1 · Free

What you already know

From TR1.2–TR1.8 you've captured evidence manually across cloud, Windows, and Linux environments, verified integrity with SHA256, maintained chain of custody, and correlated evidence across environments. Every command was run interactively by an analyst. This section converts that manual process into repeatable automation — scripts that produce identical evidence packages regardless of which analyst runs them, remote collection tools that work without direct network access to the endpoint, and fleet-wide collection for incidents that span dozens or hundreds of systems.

Scenario

After the NE incident is contained and the investigation is underway, Rachel reviews the triage performance. The evidence from WS-FIN-042 was complete because Priya followed the capture sequence from TR1.3 precisely. The evidence from web-prod-01 had gaps because Tom, under time pressure, skipped the container diff and forgot to capture kernel module listings. Rachel asks: "How do we make sure the next incident produces complete evidence from every system, regardless of which analyst responds?" The answer is triage scripts — and for the 47 other endpoints in NE's fleet that need scoping, Velociraptor hunts.

Figure TR1.9 — Evidence collection scales from manual commands (1 endpoint) through scripted collection with Live Response (1-3 endpoints) to Velociraptor fleet hunts (hundreds of endpoints). The artifacts collected are identical at every scale.

The consistency problem

Manual triage — typing commands interactively during an incident — produces inconsistent results. Different analysts collect different artifacts, in different orders, with different levels of completeness. Priya's evidence package from WS-FIN-042 included process listings, network connections, active sessions, memory dump, KAPE triage collection, and SHA256 hashes. Tom's evidence from web-prod-01 included process listings and network connections but missed the container layer diffs, the kernel module listing, and the /proc/PID/environ capture for the suspicious process.

Both analysts are competent — the difference is that Priya had the capture sequence memorised from a previous incident, while Tom was working from memory at 03:00 after being woken by an alert.

The problem compounds with team size. If three analysts respond to an incident across six endpoints, the investigation team receives six evidence packages of varying completeness. The investigator spends time figuring out what's present and what's missing from each package before analysis can begin. A missing artifact isn't just a gap — it's an unknown.

The investigator doesn't know whether the artifact was absent from the system (useful information — it means the attacker didn't use that technique on this endpoint) or whether the analyst simply forgot to collect it (wasted investigation time — someone needs to go back and collect it, if the system is still available).

In the worst case, the system has already been reimaged or restored by the time the evidence gap is discovered. The missing artifact is now unrecoverable, and the investigation has a permanent blind spot. Post-incident, the lessons-learned review identifies the gap, but the damage is done — the investigation conclusion carries an asterisk: "based on available evidence, which was incomplete for three of six endpoints."

Windows triage script

The PowerShell triage script captures every volatile artifact from TR1.3 in the standard volatility order, computes SHA256 hashes, and generates the evidence manifest automatically.

PowerShell

# NE-Triage-Windows.ps1 — Volatile Evidence Collection
# Run as Administrator from IR USB (E:\)
$outDir = "E:\IR\$env:COMPUTERNAME-$(Get-Date -Format yyyyMMdd-HHmmss)"
New-Item -Path $outDir -ItemType Directory -Force | Out-Null
# Tier 1: Volatile state (processes, connections, sessions)
Get-Process | Select-Object Id, ProcessName, Path, StartTime, CPU |
    Export-Csv "$outDir\processes.csv" -NoTypeInformation
Get-NetTCPConnection | Where-Object State -eq Established |
    Export-Csv "$outDir\connections.csv" -NoTypeInformation
quser 2>$null | Out-File "$outDir\sessions.txt"
ipconfig /displaydns | Out-File "$outDir\dns_cache.txt"
arp -a | Out-File "$outDir\arp_cache.txt"
# Tier 2: Memory acquisition (WinPMem)
& E:\Tools\winpmem_mini_x64.exe "$outDir\memory.raw"
# Tier 3: KAPE targeted collection
& E:\Tools\KAPE\kape.exe --tsource C: --tdest "$outDir\KAPE" `
    --target !SANS_Triage --vhdx KAPE_Collection
# Hash all evidence
Get-ChildItem $outDir -Recurse -File | ForEach-Object {
    Get-FileHash $_.FullName -Algorithm SHA256
} | Export-Csv "$outDir\evidence_hashes.csv" -NoTypeInformation
Write-Host "Collection complete: $outDir"

The script runs the same collection every time — same artifacts, same order, same hash generation. The analyst runs one command (. E:\NE-Triage-Windows.ps1) and receives a timestamped evidence directory with the complete Tier 1–3 collection and an evidence manifest. The investigation team knows exactly what to expect in every Windows evidence package because every package was produced by the same script.

The production version of this script adds error handling for each collection step. If Get-NetTCPConnection fails (which happens on Server Core with certain feature configurations), the script logs the failure and continues to the next artifact. The final evidence manifest includes both the successfully collected files and a failure log documenting what was attempted but not collected, and why.

This failure log eliminates the "forgot or wasn't there?" ambiguity — the investigation team sees that the collection was attempted and the specific error that prevented it.

Linux triage script

The Bash equivalent follows the same structure — the TR1.4 capture sequence, wrapped in a single script with hash generation.

Bash

#!/bin/bash
# ne-triage-linux.sh — Volatile Evidence Collection
# Run as root from IR USB
HOST=$(hostname)
TS=$(date -u +%Y%m%d-%H%M%S)
OUT="/evidence/${HOST}-${TS}"
mkdir -p "$OUT"
# Tier 1: Volatile state
ps auxwf > "$OUT/processes.txt" 2>&1
ss -tnp > "$OUT/connections.txt" 2>&1
w > "$OUT/sessions.txt" 2>&1
cat /proc/net/arp > "$OUT/arp_cache.txt" 2>&1
lsmod > "$OUT/kernel_modules.txt" 2>&1
# Container evidence (if Docker present)
if command -v docker &>/dev/null; then
    docker ps -a > "$OUT/containers.txt" 2>&1
    for cid in $(docker ps -q); do
        docker inspect "$cid" > "$OUT/container_${cid}_inspect.json"
        docker diff "$cid" > "$OUT/container_${cid}_diff.txt"
    done
fi
# Tier 2: Memory acquisition
./avml --compress "$OUT/memory.lime.snappy" 2>"$OUT/avml_errors.log"
# Tier 3: Logs and filesystem
cp -a /var/log/auth.log* "$OUT/" 2>&1
journalctl --since "24 hours ago" -o json > "$OUT/journal_24h.json" 2>&1
# Hash all evidence
sha256sum "$OUT"/* > "$OUT/evidence_hashes.txt" 2>&1
echo "Collection complete: $OUT"

The Linux script includes the container evidence collection that Tom missed during the manual triage — docker inspect and docker diff for every running container. The script doesn't require the analyst to remember that containers exist on this server or to know the Docker commands. It checks whether Docker is installed, iterates through running containers, and captures the state automatically. The investigation team receives container evidence from every Docker host in the fleet because the script collects it unconditionally.

Both scripts produce a standardised output structure. The Windows script creates HOSTNAME-TIMESTAMP/ with subdirectories matching the volatility tiers. The Linux script uses the same structure. When the investigation team opens an evidence package from any system, they know exactly where to find process listings, network connections, memory dumps, and log exports — because every package was produced by the same script with the same directory layout.

This standardisation eliminates the investigation overhead of navigating unfamiliar evidence directory structures across 15 different evidence packages collected by 5 different analysts.

Error handling in the Linux script follows the same pattern as the Windows version. Each collection command redirects stderr to the output file using 2>&1. If a command fails — because the process has exited between the ps listing and the /proc capture, or because the analyst lacks permissions on a specific directory — the error is captured in the output file rather than displayed and lost in the terminal.

The evidence manifest includes a collection_errors.log that lists every command that returned a non-zero exit code, enabling the investigation team to distinguish between "this artifact doesn't exist on this system" and "the collection command failed."

Defender Live Response: remote single-endpoint collection

When the triage responder cannot physically access the endpoint — because the server is in a remote data center, the endpoint is an employee's laptop in another city, or the system has already been network-isolated through Defender — Defender Live Response provides a remote command session through Defender for Endpoint's cloud management channel.

The analyst connects to the endpoint through the Microsoft Defender portal, uploads the triage script to the Live Response session, executes it, and downloads the resulting evidence package. The collection happens over the Defender management channel, not the corporate network — no SSH tunnel, no RDP session, no VPN connection. For network-isolated endpoints, this is often the only collection path available because Defender's management channel continues to function even when all other network connectivity is blocked by the isolation policy.

The Live Response workflow has four steps. First, navigate to the device page in the Defender portal and click "Initiate Live Response Session." Second, upload the triage script using the upload command — this transfers the script from the analyst's workstation to a working directory on the endpoint. Third, execute the script with the run command and monitor the output in the console.

Fourth, download the evidence directory using the getfile command, which transfers files from the endpoint back through the Defender channel to the analyst's workstation. The entire workflow requires Advanced hunting permissions in Defender and the endpoint must be enrolled and communicating with the Defender service.

One operational consideration: Live Response has a 30-minute session timeout. For systems with large RAM (32+ GB), the memory acquisition step alone may approach this limit. The triage script should be designed to capture the most volatile evidence first (Tier 1 process and connection state), then memory, then KAPE — so that if the session times out during the KAPE collection, the highest-priority evidence has already been captured and downloaded.

The limitation is that Live Response operates on one endpoint at a time. Each session takes 5–10 minutes for the triage collection depending on system size and network bandwidth. When the incident scope expands to 15 endpoints, sequential Live Response sessions take 75–150 minutes — well outside the triage window. Live Response is the right tool for the initial 1–3 confirmed compromised endpoints. For fleet-wide scoping, you need a different approach.

Incident Comment — Collection Method Selection

Confirmed compromised: 3 endpoints (WS-FIN-042, web-prod-01, DC-PROD-01). Collecting via direct access (USB/SSH) for Windows/Linux, Live Response for DC-PROD-01 (restricted physical access).

Scoping required: 47 remaining endpoints in NE fleet. Need to determine whether svc-backup authenticated to additional systems and whether the attacker's C2 callback (185.220.101.34) appears in other endpoints' network connections.

Collection method: Direct script for 3 confirmed. Velociraptor hunt for 47 scoping targets — query for svc-backup logon events + network connections to 185.220.101.34 + process tree anomalies.

Velociraptor: fleet-wide collection

Velociraptor solves the scale problem. With a Velociraptor server deployed, the analyst creates a hunt — a query that runs simultaneously across every endpoint in the fleet. A single hunt can collect volatile evidence from hundreds of endpoints in minutes, returning results to a centralised console where the triage responder reviews them as they arrive.

The hunt query is version-controlled and tested in advance, so it produces identical evidence packages regardless of which endpoint it runs on. The analyst doesn't log into 47 endpoints one at a time — they define what to collect, specify the target scope (all Windows endpoints in the finance OU, all Linux servers in the production subnet, or simply "every enrolled endpoint"), and launch the hunt. Velociraptor's agent on each endpoint executes the collection locally and streams results back to the server.

For scoping hunts during the NE incident, Rachel's team would create three targeted hunts: one searching for svc-backup authentication events across all Windows endpoints, one checking for network connections to 185.220.101.34 on all endpoints, and one collecting process tree snapshots from the finance department's endpoints where lateral movement is most likely. The results arrive within minutes and tell the investigation team which additional endpoints are compromised — narrowing the investigation scope from "possibly all 50 endpoints" to "confirmed 3, suspected 2, clean 45."

The difference between a scoping hunt and a full triage hunt is the amount of data collected. A scoping hunt asks a narrow question — "did this account authenticate here?" or "is there a connection to this IP?" — and returns a small result set that the analyst can review in the Velociraptor console in real time.

A full triage hunt collects the complete volatile evidence package (the same artifacts as the triage script) and produces large evidence files that need to be downloaded, hashed, and processed. Scoping hunts are fast and lightweight — run them first to identify which endpoints need full triage collection. Then run full triage hunts only on the confirmed or suspected endpoints, keeping the evidence volume manageable and the investigation focused.

Velociraptor's notebook feature allows the analyst to document findings directly in the hunt results — marking which endpoints are confirmed compromised, which are clean, and which need further investigation. This documentation becomes part of the investigation record and provides the investigation team with a scoping summary generated during triage rather than reconstructed afterward.

The prerequisite is deployment: the Velociraptor agent must be installed on endpoints before the incident. Deploying Velociraptor during an incident is possible but adds 30–60 minutes of setup time and requires network connectivity that may not be available after containment actions. Organisations that deploy Velociraptor as part of their incident readiness posture can launch a fleet-wide hunt within minutes of confirming a compromise.

Those that don't have it deployed face the sequential Live Response approach or, worse, no fleet-wide visibility at all — discovering additional compromised endpoints only when the investigation team manually reviews logs from systems that were never triaged.

Script version control is the final discipline. The triage scripts (PowerShell, Bash) and Velociraptor hunt queries should live in a git repository with tagged releases. When a new artifact is added to the collection — because a new attack technique was encountered, or because the investigation team requested an artifact that wasn't in the original script — the change is committed, tested against a lab system, and released as a new version.

During an incident, the analyst runs the latest tagged release, not a modified version edited on the fly. Post-incident, the lessons learned review identifies any evidence gaps and the scripts are updated for the next incident.

"I'll just modify the script during the incident"

An analyst encounters an unfamiliar service on a compromised endpoint and adds an ad-hoc collection command to the triage script during the incident. The modification captures the artifact but introduces an untested command that fails on three other endpoints — producing partial collections with no error handling. The investigation team receives incomplete evidence from three endpoints and spends hours determining that the collection failure was caused by an untested script modification, not by the absence of the artifact. Modify scripts in the lab, not during incidents. If you need an artifact that isn't in the current script, run the command manually, document it in the evidence manifest, and add it to the script in the post-incident review.

Triage Principle

Scripts make evidence collection a solved problem. The script captures every artifact in the correct order, hashes the output, generates the manifest, and logs any failures — the analyst's job shifts from "remember the commands" to "run the script and monitor the output." The quality of the evidence package depends on how well the script was written and tested before the incident, not on how well the analyst performs under stress at 03:00.

Section 1.10 puts everything from this module into practice. You'll work through an interactive lab with three scenarios presenting different attacker states across cloud, Windows, and Linux environments. For each scenario, you determine preservation vs containment order, select which evidence to capture first, choose the containment action with the least evidence impact, and decide whether to use manual collection, Live Response, or a fleet-wide Velociraptor hunt.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →