In this module

NF0.3 The NSM Philosophy

8 hours · Module 0 · Free

What you already know

You understand that monitoring is ongoing, not one-off — you run EDR continuously, you keep event log collection running, you don't spin up SIEM forwarding only after an incident starts. This sub extends that principle to the network layer and introduces the philosophy that determines whether network evidence exists when you need it.

Operational Objective

Most organizations approach network monitoring as a product: deploy a firewall, enable IDS, check the box. The result is tools that generate alerts nobody investigates, logs that nobody queries, and captures that expire before anyone looks at them. When the incident arrives, the team discovers their "network visibility" is a dashboard of untuned Suricata alerts and a firewall log that records permit/deny but nothing about what happened inside the permitted connections.

Network Security Monitoring (NSM) is an operational philosophy, not a product. The term was formalized by Richard Bejtlich in the early 2000s, and its core principle hasn't changed: you collect network evidence continuously so that when an incident occurs, the data is already there. You don't start capturing when the alert fires — by then, the evidence you need is in the past. NSM is the discipline that ensures past network activity is available for investigation.

This sub introduces the NSM philosophy, the three data categories it organizes around, and the operational practices that make it work — because having the tools deployed is the easy part. Keeping the data useful is the hard part.

Deliverable: The three NSM data categories (full content, extracted content, session data), the operational practices that make NSM useful in production, and the common failure modes that reduce NSM from a monitoring philosophy to an expensive log generator.

Estimated completion: 25 minutes

Figure NF0.3 — The three NSM data categories and the core principle. Full content data is the raw packets. Extracted content is files and objects pulled from network streams. Session and transaction data is the structured metadata that Zeek, NetFlow, and IDS alerts provide. NSM collects all three continuously — the data must exist before the incident is detected.

The Core Principle: Collect Before the Incident

NSM is not a response capability. It's a preparation capability. The data collection happens before anyone knows an incident is occurring, and the investigation happens after.

The fundamental problem with reactive network forensics — "we'll capture traffic when we see something suspicious" — is that it can't travel backward in time. When the SOC analyst sees a Defender XDR alert at 14:00 and decides to start a packet capture, the attacker's initial access at 09:00 is already five hours in the past. The C2 beacon that's been running since last week is only captured from now forward. The DNS query that resolved the attacker's phishing domain happened three days ago. Reactive capture gives you evidence of the investigation, not evidence of the attack.

NSM inverts this. The Zeek sensor was running at 09:00 when the attacker sent the phishing email. It was running last week when the C2 beacon started. It was running three days ago when the phishing domain was resolved. The conn.log, dns.log, and ssl.log entries for all of these events were captured as they happened and stored on the sensor. When the SOC analyst gets the alert at 14:00, the evidence is already there — waiting to be queried.

This isn't a novel concept. You already do this with endpoint forensics: Windows Event Logs collect continuously, SIEM forwarders run continuously, EDR telemetry streams continuously. Nobody suggests starting event log collection only after an incident is detected. NSM applies the same principle to the network layer.

The Three Data Categories

Richard Bejtlich's original NSM framework organizes network evidence into three categories. These map directly to the five evidence types from NF0.2, but the categories clarify the relationship between them.

Full content data is the raw packet capture — the PCAP files that record every byte on the wire. This is the most detailed evidence and the most expensive to store. You use full content data when you need to reconstruct exactly what happened in a specific session: extract a downloaded file, read an unencrypted HTTP request, or prove in legal proceedings that specific bytes were transmitted. In the NF0.2 taxonomy, this is Type 1 (Full PCAP).

Extracted content is objects pulled from the network streams — files transferred over HTTP, email attachments sent over SMTP, certificates presented during TLS handshakes. Zeek's file extraction capability and tools like NetworkMiner produce extracted content from full PCAP. The value is that extracted content is pre-processed: instead of searching through a 2 TB PCAP file for a malware binary, you have the binary already extracted and hashed. The cost is that extraction requires parsing, and parsed extraction can miss content that uses uncommon encoding or non-standard protocol implementations.

Session and transaction data is the structured metadata about network activity — Zeek logs, NetFlow records, IDS alerts. This is the evidence you query first and query most. Session data tells you that a connection existed, how long it lasted, how much data was transferred, what protocol was used, and what the IDS thought about it. It doesn't tell you what the bytes contained. In the NF0.2 taxonomy, this covers Types 2 through 5 (Zeek metadata, IDS alerts, NetFlow, DNS logs).

The practical relationship: session data answers most investigation questions. When it doesn't, extracted content narrows the search. When you need byte-level proof, full content provides it. You work from the cheapest evidence to the most detailed, and you stop as soon as your question is answered.

What Makes NSM Work in Production

Deploying tools is the easy part. The operational practices below determine whether your NSM deployment produces useful evidence or expensive noise.

The first operational practice is sensor health monitoring. A Zeek sensor that crashes at 03:00 and isn't restarted until the next business day creates a gap in your evidence. If the attacker's lateral movement happened during that window, you have nothing. NSM sensors need monitoring — process watchdogs, disk space alerts, packet loss counters. The sensor is an evidence source. Treat it with the same availability requirements as your SIEM.

The second practice is retention policy alignment. Your retention periods need to match your detection capability. If your average time to detect an intrusion is 21 days (the industry median for organizations without active threat hunting), your Zeek log retention needs to be at least 21 days — ideally 60–90 days to cover the full investigation. Retaining 7 days of metadata when your detection takes 21 days means you've lost the first 14 days of every investigation. PCAP retention is shorter (days to weeks), but Zeek metadata should cover the full expected dwell time.

The third practice is regular query exercises. If nobody queries the Zeek logs between incidents, two things happen: the team loses proficiency with the query tools, and configuration drift goes undetected. A Zeek parser that stopped handling HTTP/2 traffic after an update goes unnoticed for months — until the investigation needs HTTP/2 data that was never parsed. Regular hunting exercises (weekly or biweekly) keep both the team and the sensor sharp.

The fourth practice is baseline profiling. You can't identify anomalous network behavior without knowing what normal looks like. How many DNS queries does a typical workstation make per hour? What's the normal connection duration for HTTPS traffic to your SaaS providers? What internal servers normally communicate with external IPs? Baselines are built from Zeek logs over time, and they're what make network threat hunting possible. Module NF11 covers baseline building and hunting methodology.

Guided Procedure — Evaluating Your NSM Readiness

Step 1. Determine your organization's mean time to detect (MTTD) for security incidents.

Expected output: If you have SOC metrics, use the actual MTTD. If not, estimate: do most incidents come from real-time alerting (MTTD < 24 hours), weekly review (MTTD 7 days), or external notification (MTTD 30+ days)?

If you don't know: The industry median is approximately 21 days for intrusions detected internally. If your organization relies primarily on external notifications (ISP, law enforcement, customer reports), it's likely longer. For this exercise, use 30 days as a conservative estimate.

Step 2. Compare your MTTD to your current network evidence retention.

Expected output: If your MTTD is 30 days and your firewall logs retain 90 days, you have coverage. If your MTTD is 30 days and your only network evidence is PCAP with 3-day retention, you have a 27-day evidence gap for every investigation.

If you have no network evidence retention: This is the gap the course addresses. Your endpoint evidence may cover the detection, but the network questions (exfiltration volume, C2 infrastructure, lateral movement patterns) remain unanswered.

Step 3. Calculate the Zeek storage requirement for your environment.

Expected output: Estimate your average network throughput in Gbps. Multiply by roughly 10-15 GB per Gbps per day. Multiply by your target retention in days. A 1 Gbps link retained for 90 days: approximately 1.35 TB. This is the storage you need for Zeek metadata — a fraction of what PCAP would require.

If you don't know your throughput: Check your firewall's bandwidth utilization dashboard or ask your network team. For the course labs, throughput doesn't matter — you'll work with pre-captured data.

Common Failure Modes

These are the patterns that reduce NSM from a monitoring philosophy to an expensive log generator. Recognizing them is the first step to avoiding them.

Deploy-and-forget. The Suricata sensor was installed two years ago. The rules haven't been updated since. Zeek was deployed but the JSON output format was never configured, so the logs aren't ingesting into the SIEM. The PCAP disk filled up six months ago and nobody noticed. Deploy-and-forget is the most common NSM failure mode because it's invisible — the tools appear to be running until someone actually needs the evidence.

Alert-only monitoring. The organization monitors Suricata alerts but doesn't collect or query Zeek metadata. This means they detect known-bad signatures but can't investigate anything that doesn't match a rule. When the alert fires for "ET MALWARE Cobalt Strike Beacon," they know it's bad, but they can't use conn.log to find every other host that communicated with the same C2 server, because they don't have conn.log.

No baseline, no hunting. The Zeek data exists but nobody queries it between incidents. Without baseline profiling, the team can't tell whether 500 DNS queries per hour from a workstation is normal or anomalous. Without regular hunting, attackers with low-and-slow techniques go undetected for months. The data was there the whole time — nobody looked.

Retention mismatch. The organization detects incidents an average of 30 days after initial compromise, but their PCAP retention is 5 days and their Zeek retention is 14 days. Every investigation starts with a 16-day evidence gap. The first two weeks of attacker activity — the initial access, the credential theft, the early lateral movement — are gone before the investigation begins.

Decision point

NE's security team ran a Suricata sensor for 18 months. It generates approximately 400 alerts per day. The SOC reviews the alerts and investigates the high-severity ones. They don't have Zeek deployed. They don't have PCAP retention beyond Suricata's per-alert packet capture (a few packets around each alert trigger).

During the ransomware investigation (INC-NE-2026-0418), the team had Suricata alerts showing the PsExec lateral movement pattern and the Cobalt Strike beacon signature. But they couldn't determine: how many hosts communicated with the C2 server (no conn.log), what DNS queries preceded the compromise (no dns.log), how much data was exfiltrated (no byte counts), or when the attacker first connected (no connection history beyond the alert timestamps).

The Suricata alerts confirmed the attack was real. They couldn't support the investigation that followed. The team could identify the incident but couldn't scope it, quantify the damage, or determine the full attack timeline.

This is the difference between alert-based monitoring and NSM. Alerts tell you something happened. NSM gives you the evidence to understand what happened, when, how, and how much.

Compliance Myth: "Our compliance framework requires IDS — we deployed Suricata, so we're covered"

Compliance frameworks like PCI DSS (Requirement 11.4) and NIST 800-53 (SI-4) require intrusion detection capabilities. Deploying Suricata technically satisfies the control. But compliance and security capability are different things. A Suricata deployment with stale rules, no Zeek metadata, and no investigation workflow satisfies the auditor while leaving the team unable to investigate the incidents Suricata detects.

The compliance control asks: "Do you have intrusion detection?" The operational question is: "When intrusion detection fires, can you investigate what happened?" If Suricata alerts on a C2 beacon but you can't determine which hosts are affected, how long the attacker has been present, or what data they accessed — you have compliance without capability.

NSM gives you both. The same sensor that runs Suricata for compliance also runs Zeek for investigation. The additional storage cost for Zeek metadata is modest (10-15 GB/day at 1 Gbps). The investigation capability it provides is the difference between closing a ticket and resolving an incident.

NF0.4 — The Investigation Methodology. You understand the NSM philosophy and why evidence must be collected before the incident. NF0.4 introduces the Capture-Detect-Investigate methodology that structures every paid module in this course — the systematic process for working network evidence from initial alert to investigation closure.

Try it: Assess an NSM deployment against the failure modes

Setup. Use your own organization's network monitoring setup, or use NE's pre-incident state described in the decision point above.

Task. Score the deployment against the four failure modes: deploy-and-forget (are tools maintained?), alert-only monitoring (is metadata collected beyond alerts?), no baseline/hunting (is the data queried between incidents?), retention mismatch (does retention exceed MTTD?). Rate each as GREEN (not present), AMBER (partially present), or RED (fully present).

Expected result. Most organizations score RED on at least two of four. The most common combination is alert-only monitoring (RED) plus no baseline/hunting (RED) — the tools are deployed and maintained, but the investigation data isn't collected and nobody hunts proactively.

Debugging branch. If you scored GREEN on all four: you're running a mature NSM program. This course will still provide value in the protocol-specific investigation methodology (NF3–NF7) and the detection techniques (NF8–NF11), but your foundations are solid.

Checkpoint — before moving on

1. Explain the core NSM principle — why evidence must be collected before the incident is detected — and why reactive capture fails. (§ The Core Principle)

2. Name the three NSM data categories and give an example of each from the five evidence types in NF0.2. (§ The Three Data Categories)

3. Identify at least two of the four NSM failure modes and explain how each reduces investigation capability. (§ Common Failure Modes)

Operational Artifact — NSM Health Checklist

Use this checklist monthly to verify your NSM deployment is producing useful evidence. A sensor that's running isn't the same as a sensor that's working.

Sensor availability. Verify Zeek and Suricata processes are running. Check for packet loss (Zeek's capture_loss.log, Suricata's stats counters). Confirm the sensor hasn't silently stopped capturing due to disk full, interface down, or process crash. Expected: zero unplanned capture gaps in the past 30 days.

Rule currency. Verify suricata-update ran successfully within the past 7 days. Check the rule count and confirm new rules were loaded. Review the last 30 days of Suricata alerts — if zero alerts fired, either your network is perfectly clean or your rules aren't matching. Investigate.

Retention verification. Query the oldest Zeek conn.log entry. Confirm the timestamp meets your retention target. If your target is 90 days and the oldest entry is 45 days ago, your storage is undersized or your rotation is too aggressive.

Data completeness. Run a known connection (e.g., curl a known URL from a monitored host) and verify it appears in conn.log, dns.log, and http.log within 60 seconds. If the test connection doesn't appear, the sensor has a parsing or capture problem that needs investigation.

You've built the sensor and mapped the evidence landscape.

NF0 established why network evidence matters when every other source is compromised. NF1 built your Zeek + Suricata sensor with the 10 investigation query patterns. From here, every module teaches protocol-specific investigation against real attack scenarios.

DNS deep dive (NF3) — tunnelling detection, DGA analysis, passive DNS infrastructure mapping, and the INC-NE-2026-0227 AiTM phishing DNS trail
Protocol analysis (NF4–NF7) — HTTP/HTTPS, SMB lateral movement, SSH tunnelling, and email protocol investigation with Zeek metadata and PCAP
Detection and hunting (NF8–NF11) — Suricata rule writing, C2 beacon detection with JA3, NetFlow analytics, and proactive network threat hunting
NSM architecture (NF13) — production sensor deployment at 1–10 Gbps with Arkime, Security Onion, and enterprise storage planning
INC-NE-2026-0830 capstone (NF14) — multi-stage investigation using only network evidence: phishing → domain-fronted C2 → lateral movement → DNS tunnel exfiltration

Unlock the full course with Premium See Full Syllabus

Cancel anytime

← Previous Next →