In this module

NF1.10 Sensor Performance Tuning and Troubleshooting

10 hours · Module 1 · Free

What you already know

NF1.1 through NF1.9 built and validated your sensor and taught you to keep it running. This sub covers what to do when it's running but not right — the sensor is up, but it's dropping packets, or Zeek is silent on connections you know occurred, or Suricata is firing alerts on traffic that doesn't match the rules. These are the operational situations that separate "I deployed a sensor" from "I operate a sensor." The diagnostic workflow is narrow and specific — four categories of failure, each with a signature, an investigation path, and a fix.

Operational Objective

A sensor that's running but not reliable is worse than a sensor that's down — you don't know the evidence is incomplete until an investigation needs it and finds the gap. The three failure modes that matter in practice are packet loss under load, Zeek emitting logs you didn't expect or missing logs you did, and Suricata firing wrong. Each has a diagnostic signature you can recognize in under five minutes, and a remediation path that doesn't require rebuilding the sensor from scratch. This sub gives you the workflow for all three, plus a fourth category covering the slow decay of sensor health over time — the drift that doesn't show up on day-one validation but eats investigation credibility by month six.

Deliverable: A diagnostic runbook you can pin next to your sensor console: four failure categories, the specific commands that identify each, and the remediation path for each. Runnable against your own sensor in under 30 minutes.

Estimated completion: 35-45 minutes

Figure 1.10.1 — Four sensor failure categories and the signature-diagnosis-fix pattern for each.

Category 1 — Packet loss under load

Packet loss is the failure mode that matters most for evidentiary integrity. Every packet the sensor missed is a gap in your capture. If the loss happens during an attack window, you've missed part of the attack. The signature is visible, the diagnostic is mechanical, and the fix is one of four well-trodden paths depending on where the bottleneck sits.

Signature. Zeek's capture_loss.log begins emitting entries. Or ethtool -S shows rx_dropped or rx_no_buffer counters climbing. Or Suricata's stats.log shows the capture.kernel_drops counter incrementing. Any of these three indicates the sensor is receiving traffic it cannot process fast enough. Sub-0.1% loss during quiet periods is usually fine; sustained loss above 0.1% during peak is the threshold for investigation.

Diagnostic path. Four questions, in order.

First, is the CPU saturated? Run top or htop and check whether Zeek or Suricata is pinning a core at 100%. If one worker is pinning and others are idle, the workload isn't being distributed across cores — AF_PACKET clustering or PF_RING is misconfigured or absent. If all cores are at 100%, you're CPU-bound at the current traffic level and need either more cores, faster cores, or fewer features (rule count, script load).

Second, is the kernel dropping before the packets reach the sensor application? Check ethtool -S for rx_dropped, rx_no_buffer, or rx_missed_errors. These are kernel-level drops. Increase the interface ring buffer with ethtool -G rx (typically 4096 or 8192). Verify RX queues are bound to dedicated CPUs with set_irq_affinity. If your NIC has hardware offloads enabled (LRO, GRO, checksum offload), they can corrupt captured packet timing — disable them on the capture interface with ethtool -K gro off lro off tso off gso off.

Third, is the sensor application's internal buffer full? Zeek's capture_loss.log distinguishes sensor-side loss from kernel-side loss. If the loss is on Zeek's side and the kernel is fine, Zeek can't process packets as fast as the kernel delivers them. Add more workers if you're on AF_PACKET clustering. Check for expensive scripts running synchronously in Zeek's main event loop — profile with zeek -r capture.pcap prof-time-budget to see which scripts consume the most CPU.

Fourth, is storage I/O the bottleneck? Less common but possible if you're writing PCAP to spinning disk. Check iostat -x 1 during peak — if %util on the storage device pegs at 100% with high await, the disk can't keep up. Move PCAP to NVMe, or write metadata only during peak and queue PCAP writing for off-peak flush.

Remediation paths. The four fixes in rough order of effort: enable AF_PACKET clustering with 2-4 workers (cheapest, usually sufficient for traffic under 2-3 Gbps); enable PF_RING if the NIC supports it (moderate effort, significant performance gain); tune ring buffer and disable NIC offloads (quick, always worth doing); and finally, upgrade hardware (expensive, last resort). Most capture loss I see in practice comes from step 1 not being done — the sensor was deployed with a single Zeek worker and nobody enabled clustering.

Category 2 — Zeek log anomalies

The second-most-common failure mode is Zeek running but not producing the logs you expected. Conn.log is emitting but smtp.log is empty. Or dns.log is full of entries for a single internal host. Or weird.log has grown to 2GB of parse errors. These aren't capture problems — they're analysis problems inside Zeek.

Signature. Missing connections in conn.log relative to what you can see in tcpdump -n -i . Empty protocol logs (dns.log, http.log, ssl.log) when you know the protocol is on the wire. Large weird.log or reporter.log files. Connections showing history=S (SYN only) for traffic you can verify is completing normally.

Diagnostic path. Start with Zeek's own health.

Run zeekctl status (or broctl status on older versions). All workers should be running. If any are crashed or stopped, check zeekctl diag for the crash reason and the corresponding worker's crash/ directory for the core dump or reason log.

Check the log rotation state. Zeek rotates logs via local.zeek's log-rotate configuration and zeekctl cron. If rotation stalls, logs stop writing. ls -la /opt/zeek/logs/current/ should show all expected logs with recent mtimes. If a log is present but not updating, the process managing it has stalled.

Check loaded scripts. zeekctl check validates the configuration. A script that references an undefined identifier or has a syntax error will prevent Zeek from loading — but a script that references an identifier that's only defined in a specific traffic context (e.g., ssl events when no TLS is seen) will load but produce no output for that protocol. Review local.zeek and site/local.zeek for recent changes.

Check disk space. df -h /opt/zeek/logs — if the logs partition is 100% full, Zeek may be writing to a partial log or silently dropping writes. Clear old rotated logs, expand the partition, or adjust retention policy.

Check the weird.log. Every line is a protocol anomaly Zeek flagged. A handful is normal; thousands in an hour means Zeek is seeing traffic it can't parse. Common causes: asymmetric routing (Zeek sees one direction of a flow but not the other), tunnel encapsulation Zeek doesn't decode by default, or malformed traffic from the attacker itself.

Remediation. Restart Zeek with zeekctl deploy (which reloads scripts and restarts workers). Fix any script errors surfaced by zeekctl check. Clear stale state in /opt/zeek/spool/ if a previous crash left it inconsistent. For asymmetric routing, either correct the network topology or configure Zeek to handle asymmetric flows via redef ConnSize::register_time = 1sec. For tunnel traffic, load the appropriate decapsulation scripts (@load protocols/conn/mac-logging for layer 2 tunnels, specific protocol scripts for GRE/GENEVE/VXLAN).

Category 3 — Suricata firing wrong

Suricata either fires on traffic it shouldn't (false positives, alert storms) or doesn't fire on traffic it should (false negatives, missed detection). Both are rule-tuning problems, but they're detected differently and fixed differently. The discipline is the same: reproduce against a known PCAP, then fix the rule, then retest against the same PCAP.

Signature for false positives. Alert count jumps 100x from baseline in a short window. Specific rule SIDs dominating the alert feed. Analysts ignoring alerts from the sensor because signal-to-noise is too low.

Signature for false negatives. You know a technique was used in traffic (from PCAP you captured during a red-team exercise or a known-bad PCAP you replayed) but Suricata produced no alert. Or your organization's detection maturity baseline (MITRE ATT&CK coverage from DE0) shows gaps that ought to be covered by the loaded ruleset.

Diagnostic path for false positives. Identify the firing SID — eve.json includes alert.signature_id for every alert. Group by SID and count. Pull the rule definition from the ruleset (/etc/suricata/rules/*.rules or wherever your ruleset is deployed). Read the rule's match conditions.

Then ask: is the rule actually matching malicious behavior but your environment produces legitimate traffic that looks the same (the classic "legitimate use case that the rule can't distinguish")? Or is the rule poorly written — matching on a common string that has no actual forensic value? Or is the rule firing on traffic that was never meant to be in scope (e.g., a rule for Microsoft SQL Server matching on a test-lab host that you forgot to exclude)?

The fix for each: for legitimate-use-case matches, add threshold or suppress logic (suricata.yaml → threshold-file references a threshold.config where you can suppress specific source/dest combinations, or threshold to rate-limit alerts per SID). For badly-written rules, disable the rule (suricata-update disable-source ) and either replace with a better one or accept the detection gap. For out-of-scope traffic, add a BPF filter at the sensor level (capture.interface.bpf-filter) or suppress at the rule level.

Diagnostic path for false negatives. Reproduce against a known-bad PCAP. If you don't have one, generate one — for most common techniques (C2 beacons, DNS tunnelling, common web exploits), public PCAP corpora like malware-traffic-analysis.net or Kaggle have reference captures. Replay through the sensor with suricata -r -l /tmp/suricata-test -c /etc/suricata/suricata.yaml and check whether the expected alert fires.

If the alert doesn't fire against the known-bad PCAP, the rule isn't loaded or isn't matching. Check suricata -T for rule-loading errors. Check the rule's matching conditions against the actual PCAP content — the HTTP header format, the TLS fingerprint, the payload bytes — with Wireshark or tshark. Common causes: rule requires a protocol-specific signature that Suricata's parser didn't identify for the session (the alert http rule doesn't fire because Suricata classified the session as generic TCP rather than HTTP), or the rule's content: matches are offset-dependent and your traffic has a different framing.

Category 4 — Slow decay

The fourth category is the hardest to catch because it's invisible on any single day. The sensor is running, logs are flowing, alerts are firing at a normal rate. And yet six months later you realize the logs you have from month four aren't what you think they are. Sensor integrity erodes slowly, and the discipline that catches it is trend-based rather than snapshot-based.

Signatures you catch with monthly review. Log volume shifted 30% in one direction without a corresponding traffic change. The sensor's NTP status drifted (verify with ntpq -p or chronyc sources). A TLS certificate used by the sensor's management interface or its export to SIEM expired without alerts. The ruleset hasn't been updated in 6 weeks because an suricata-update cron silently started failing. Storage is 85% full and the retention policy hasn't been adjusted since deployment.

Signatures you catch with trend-based alerting. Set up alerts that compare this week's metrics to last month's baseline. Weekly log volume deviating by more than 20%. Percentile values (p50, p99) of connection duration shifting significantly. Unique destinations per day trending up or down outside normal business-cycle variation. The alerts don't name specific failures — they name drift, and drift is almost always worth investigating even when the cause turns out to be benign.

Remediation — monthly sensor review cadence. Put it on the calendar. Thirty minutes per sensor per month. Check: NTP status and clock drift. Certificate dates for anything the sensor presents to SIEM, dashboards, or export targets. Storage trend (is the fill rate matching the retention policy or are you heading for a surprise?). Ruleset update status — when did suricata-update last succeed, when did zkg last pull updates for the Zeek scripts? Log-volume trend against last month's baseline. Alert-volume distribution — have any SIDs started dominating that weren't before?

"Monitor the monitor" principle. Your SIEM or monitoring platform should treat the sensor itself as a monitored asset. Heartbeat metrics (Zeek is up, Suricata is up, NTP is synchronised, disk is below 85%). Log-volume freshness (a 15-minute window with zero new log lines is an alert). Ruleset freshness (hash of the loaded ruleset, alerting if the hash hasn't changed in 14 days despite upstream updates being available). If the monitoring of the sensor breaks and no-one notices for a month, the sensor's silence doesn't mean nothing happened — it means you can't distinguish nothing from something.

Guided Procedure — Diagnose packet loss on your sensor

Step 1. Confirm loss is occurring. Run tail -f /opt/zeek/logs/current/capture_loss.log in one terminal. In another, check Suricata stats with jq 'select(.event_type=="stats") | .stats.capture' /var/log/suricata/eve.json | tail -5. Compare against kernel drops with ethtool -S <iface> | grep -i drop.

Expected output: Either capture_loss.log is empty and Suricata shows `kernel_drops: 0` (no loss — Category 1 isn't your problem), or you see concrete loss numbers. For sustained sub-0.1% loss during peak traffic, note it but don't remediate — investigate only if loss climbs above that threshold or coincides with attack windows.

If it fails: If capture_loss.log doesn't exist, Zeek's capture_loss script isn't loaded — add `@load policy/misc/capture-loss.zeek` to `local.zeek` and redeploy. If ethtool doesn't show drop counters, the NIC driver may not expose them — use `cat /proc/net/dev` as a fallback for `rx_dropped`.

Step 2. Identify the bottleneck layer. Check CPU with top -H -p $(pgrep -d, zeek) — if one thread is at 100% and others idle, clustering isn't working. Check ring buffer with ethtool -g <iface> — if current is much smaller than max, increase it. Check NIC offloads with ethtool -k <iface> | grep on — offloads should be off on the capture interface.

Expected output: A clear diagnostic — either clustering issue (single thread pinned), ring-buffer issue (small current vs large max), offload issue (GRO/LRO/TSO/GSO enabled on capture NIC), or none of the three (you're genuinely CPU-bound and need more cores).

If it fails: If CPU is low, ring buffer is sized, offloads are off, and you're still dropping — check for micro-bursts. `ethtool -S` might not show drops but sFlow or flow-records from the switch ahead of the sensor could reveal SPAN oversubscription. SPAN mirroring a 2x10Gbps uplink pair onto a 10Gbps mirror port drops silently at the switch during traffic spikes; ethtool on the sensor never sees the missing packets.

Step 3. Apply the fix for the identified bottleneck. For clustering: configure AF_PACKET in `local.zeek` with `@load policy/misc/capture-loss; redef Threading::heartbeat_interval = 1sec;` and set worker count in `node.cfg`. For ring buffer: `ethtool -G <iface> rx 4096` (or 8192 for 10Gbps+). For offloads: `ethtool -K <iface> gro off lro off tso off gso off rx-vlan-offload off` — note these settings don't persist across reboots without configuring via network manager or systemd-networkd.

Expected output: After applying the fix and restarting Zeek/Suricata, drops in the next 15-minute window have decreased significantly or stopped. If drops continue at the same rate, your diagnosis was wrong — return to Step 2 with more data.

If it fails: Don't apply multiple fixes simultaneously; you lose the ability to attribute the improvement. Apply one fix, measure for 15-30 minutes, move on only if drops persist. If all four remediation paths fail to improve the situation, the sensor is under-sized for the traffic level and needs hardware upgrade — this is NF13's territory.

Step 4. Validate the fix held. Return to Step 1's diagnostic commands 24 hours later. Drops should remain low through peak traffic. If they climb back up during a specific time window (morning rush, backup jobs, whatever), you've identified a load-specific gap that clustering or buffer tuning alone doesn't close — either the peak load is beyond what the current sensor hardware can handle, or a specific traffic pattern (e.g., a backup job producing 100k small flows in 10 seconds) is a workload the general tuning didn't address.

Expected output: 24-hour trend shows drops staying low through normal peak windows. A log note in your sensor runbook records what was changed, when, and what the measured improvement was — so the next time this sensor or a similar one has the same problem, you've got a reference.

If it fails: Drops persisting after the fix means the diagnosis was incomplete. Common second-order issues: DNS resolution in Zeek scripts blocking the event loop, a custom Zeek script doing synchronous file I/O, or the SIEM export channel backpressuring the sensor. Profile Zeek with `zeek -r prof-time-budget` and check the top time-consuming scripts.

Decision point

Your sensor is dropping 0.5% of packets during morning peak. You've tuned the ring buffer and disabled offloads. Clustering is configured with four AF_PACKET workers. Drops persist. The CISO asks whether you should add more hardware to the sensor or accept the 0.5% loss as within operational tolerance.

The knee-jerk response is "add hardware" — more CPU cores, faster NIC, faster disk. It's always technically possible to throw more hardware at packet loss. But the right answer depends on what the 0.5% represents.

If 0.5% is uniform across all traffic types — that is, it's random drops that affect every protocol and every destination roughly equally — then the operational impact on investigations is small. You might miss a handful of packets from a connection, but conn.log will still record the flow, dns.log will still record the queries, ssl.log will still capture the TLS fingerprint. Investigations work on metadata; random 0.5% drops rarely corrupt investigative conclusions.

If 0.5% is concentrated in specific flows — that is, one particular protocol or one particular host's traffic is disproportionately affected — then the operational impact is larger and depends on which protocol. Dropping 0.5% of DNS queries randomly is probably fine. Dropping 0.5% of HTTP/HTTPS connection establishments means you miss 0.5% of new sessions; still usually fine. Dropping 0.5% of the TLS ClientHello is the problematic case — you lose JA3 fingerprinting for those sessions. If the drops cluster on sessions you care about for investigation, the hardware upgrade is justified.

The operational lesson: packet-loss tolerance is a function of which packets are being lost and what questions the investigation needs to answer. "Is 0.5% loss acceptable" isn't a numerical question — it's a question about which evidence goes missing. Quantify the loss distribution before recommending the hardware spend.

Compliance Myth: "If the sensor is running and the dashboard looks green, the capture is complete"

The dashboard shows you the sensor's self-reported health — the process is up, the CPU is under 80%, alerts are flowing. None of those metrics tell you whether the sensor is actually capturing the traffic you think it is.

A sensor can be fully green on its self-reported metrics while: the SPAN port upstream is misconfigured and sending only one direction of the flows, the VLAN tags are being stripped at the mirror point so the sensor sees packets that don't match its expected topology, the MTU mismatch between the mirror port and the sensor causes fragmentation that Zeek's reassembly silently drops, or a switch upgrade six months ago quietly removed the sensor's SPAN configuration and no-one noticed.

The discipline that catches this: periodic end-to-end validation. Pick a known endpoint, generate traffic from it (ping, a curl to a specific destination, a DNS lookup for a specific hostname), and verify the sensor actually logged it. NF1.7 taught this as a deploy-time validation; the hygiene is to repeat it quarterly. If the sensor's dashboard says it's capturing but your end-to-end test shows it isn't, the dashboard is lying. Trust end-to-end verification over self-reported health.

The myth has a second form: "if Zeek's capture_loss.log is empty, we're not losing packets." capture_loss.log only measures what Zeek knows to measure — specifically, the gap between expected TCP sequence numbers and received ones for connections Zeek is already tracking. It doesn't measure connections that never reached the sensor because the SPAN port didn't mirror them. A silent mirror failure leaves capture_loss.log empty and shows zero drops on ethtool, while 100% of the affected traffic is invisible. End-to-end validation is the only reliable check.

NF1.11 — Interactive lab. You've built, validated, maintained, and now troubleshot your sensor. The lab brings the full skill set into a single exercise — deploy the sensor, validate it against traffic generated by the lab pack, and produce the first round of investigation queries. If you can take a fresh VM to a working, validated, investigation-ready sensor in an afternoon, you have the foundation NF2 through NF14 assumes.

Try it: identify the failure category from a symptom description

Setup. Five scenarios below. For each, write one sentence naming the failure category (1, 2, 3, or 4) and one sentence on what you'd check first.

Task. (1) "Zeek's conn.log shows half as many connections per day this week compared to two weeks ago; traffic volume hasn't changed." (2) "Suricata alert count jumped from 200 per day to 8,000 per day last Tuesday; single rule SID dominates the new alerts." (3) "Sensor deployed six months ago; last week found that two days of Zeek logs from month three are missing entirely." (4) "During the 09:00-09:30 window every weekday, Zeek's capture_loss.log shows 1-2% loss; rest of the day is clean." (5) "Sensor reports healthy, but the analyst testing it noticed that HTTPS traffic from a specific internal subnet shows up in conn.log but never in ssl.log."

Expected result. (1) Category 2 (Zeek log anomaly) — check zeekctl status, log rotation, recent script changes, disk space. (2) Category 3 (Suricata firing wrong, false positive) — identify the SID, read the rule, check whether legitimate environmental traffic triggers it. (3) Category 4 (slow decay) — check storage, NTP, ruleset update status, and add monthly review if not already in place. (4) Category 1 (packet loss under load) — concentrated in a time window; likely morning backup job or meeting-start traffic spike; check ring buffer, clustering, NIC offloads. (5) Category 2 — traffic is being captured (it's in conn.log) but Zeek's SSL analyzer isn't parsing it; likely an asymmetric routing issue or a custom script interfering with protocol identification.

Debugging branch. If you categorized (5) as Category 1, you missed the distinction: packets reached Zeek (they're in conn.log) so the capture layer is fine; the analysis layer isn't producing ssl.log. If you categorized (2) as Category 4, you're reading single-event spikes as decay patterns — decay is slow drift, not step changes. Recognizing the signature-to-category match is the whole point of the diagnostic workflow.

Checkpoint — before moving on

You should be able to do the following without referring back to this sub. If you can't, the sections to re-read are noted.

1. Name the four failure categories and give one signature that identifies each. (§ Categories 1 through 4)

2. Distinguish capture-layer loss from analysis-layer issues using the diagnostic output from ethtool, capture_loss.log, and Suricata stats. (§ Category 1 diagnostic path, § Category 2 diagnostic path)

3. Defend against the "dashboard is green, therefore capture is complete" assumption — name the end-to-end validation test you'd run quarterly and what failure mode it catches. (§ Compliance Myth)

Operational Artifact — Sensor Diagnostic Runbook

Pin next to your sensor console. Four failure categories, diagnostic commands, remediation paths.

Category 1 — Packet loss under load. Signature: capture_loss.log, ethtool rx_dropped, Suricata kernel_drops. Diagnostic: check CPU saturation, ring buffer size, NIC offloads, storage I/O in order. Fix: AF_PACKET clustering (most common), PF_RING (moderate effort), ring-buffer increase (quick), offload disable (quick), hardware upgrade (last resort).

Category 2 — Zeek log anomalies. Signature: missing logs, missing connections, large weird.log, empty protocol logs. Diagnostic: zeekctl status, log-rotation state, script syntax, loaded policies, disk space. Fix: zeekctl deploy after resolving config issues, clear stale spool, repair log rotation, load missing protocol scripts.

Category 3 — Suricata firing wrong. Signature: alert storms OR silence on known-bad traffic. Diagnostic: identify firing SID, read rule, test against known PCAP with offline replay. Fix: suppress/threshold for legitimate-use-case matches, disable badly-written rules, BPF-filter out-of-scope traffic, replace with better rules for detection gaps.

Extended reference — slow decay, end-to-end validation, and monitor-the-monitor discipline

Category 4 — Slow decay. The hardest failure to catch because it's invisible on single-day metrics. Monthly review cadence: NTP status (ntpq -p or chronyc sources), certificate expiry (anything the sensor presents to SIEM/dashboards), storage fill trend, ruleset update freshness, log-volume trend vs last month's baseline. Trend-based alerting at the monitoring layer: alert on weekly deviations > 20% from rolling baseline, not on absolute thresholds.

End-to-end validation (quarterly). Pick a known endpoint. Generate specific traffic from it — a curl to a specific destination, a DNS lookup for a specific hostname you control, a ping. Verify the sensor's logs (conn.log, dns.log) contain the corresponding entries with matching timestamps. This catches silent SPAN failures, VLAN tag stripping, MTU mismatches, and other upstream problems that the sensor's self-reported metrics cannot detect.

Monitor-the-monitor. Every sensor is a monitored asset in your SIEM. Required alerts: process up (heartbeat every 5 minutes), log-volume freshness (alert if 15-minute window has zero new entries), disk below 85%, NTP synchronised, ruleset update successful in last 7 days. If the monitoring of the sensor breaks and no-one notices for a month, the sensor's silence can't be distinguished from a successful quiet period.

Packet-loss tolerance. Not a fixed number. Sub-0.1% random loss is usually investigation-harmless because most investigation work happens on metadata (conn.log, dns.log) which records the existence of the flow even when some packets were dropped. Loss concentrated in specific protocols — especially TLS ClientHello, which feeds JA3 fingerprinting — has disproportionate investigation impact. Quantify the loss distribution before making hardware-spend decisions.

Hardware ceilings. At sustained 10 Gbps with full protocol analysis and a moderate Suricata ruleset, expect to need 4-8 cores dedicated to capture with AF_PACKET clustering, or transition to PF_RING. Beyond 10 Gbps, DPDK or hardware-accelerated capture (Napatech, Accolade) enters the picture. NF13 covers enterprise scaling; NF1.10 covers the single-sensor tuning that gets you to the point where scaling becomes an architecture question rather than a tuning question.

Workload-specific patterns. Backup windows, VDI boot storms, Monday-morning email deluge, month-end batch jobs — all produce predictable peaks that exceed average traffic by 3-5x. If your tuning holds during the average but fails during a known peak window, the fix is workload-aware: either increase the ring buffer aggressively (kernel accepts temporary over-buffering during short bursts), or schedule heavy Zeek/Suricata features (expensive scripts, detailed ruleset) to disable during the specific windows where you're bandwidth-bound and metadata matters more than detection depth.

The "I'll come back to this" trap. Packet loss that doesn't affect today's investigation feels low-priority. But month six's ransomware investigation will ask for month three's C2 connection data, and if month three had sustained loss during the peak when the initial beacon established, you've lost the evidence that would have reconstructed the attack's initial access. Tune the sensor proactively. Don't wait for an investigation to reveal the gap.

A sensor you trust is a sensor you've validated. The diagnostic runbook above is the gap between "we deployed a sensor six months ago" and "we know what the sensor actually captured last month." Pin it somewhere you'll see it — on the monthly review calendar, on the sensor's admin page, wherever. The discipline isn't complicated; it's applied, or it isn't.

You've built the sensor and mapped the evidence landscape.

NF0 established why network evidence matters when every other source is compromised. NF1 built your Zeek + Suricata sensor with the 10 investigation query patterns. From here, every module teaches protocol-specific investigation against real attack scenarios.

DNS deep dive (NF3) — tunnelling detection, DGA analysis, passive DNS infrastructure mapping, and the INC-NE-2026-0227 AiTM phishing DNS trail
Protocol analysis (NF4–NF7) — HTTP/HTTPS, SMB lateral movement, SSH tunnelling, and email protocol investigation with Zeek metadata and PCAP
Detection and hunting (NF8–NF11) — Suricata rule writing, C2 beacon detection with JA3, NetFlow analytics, and proactive network threat hunting
NSM architecture (NF13) — production sensor deployment at 1–10 Gbps with Arkime, Security Onion, and enterprise storage planning
INC-NE-2026-0830 capstone (NF14) — multi-stage investigation using only network evidence: phishing → domain-fronted C2 → lateral movement → DNS tunnel exfiltration

Unlock the full course with Premium See Full Syllabus

Cancel anytime

← Previous Next →