In this module
NF0.6 Normal vs Malicious Traffic
You already understand baseline thinking from endpoint investigation — you know that svchost.exe running from System32 is normal while svchost.exe running from Temp is suspicious. The same principle applies to network traffic: distinguishing attack traffic from normal traffic requires knowing what normal looks like.
A Zeek conn.log for a 500-person organization generates hundreds of thousands of entries per day. The vast majority are legitimate — Microsoft 365 traffic, SaaS application connections, cloud storage sync, browser updates. The attacker's C2 beacon is 1 connection out of 200,000. Finding it without baseline knowledge is looking for a needle in a haystack you've never seen before.
The fundamental network investigation skill isn't packet analysis. It's pattern recognition. Knowing that a workstation making 50 DNS queries per minute to unique subdomains of a .xyz domain is anomalous. Knowing that a server sending 500 MB outbound at 03:00 to an IP in a hosting block it's never communicated with before is suspicious. Knowing that an HTTPS connection with a 60-second beacon interval and consistent 300-byte payload is a C2 channel.
This sub introduces the baseline-thinking model that underpins every detection and hunting technique in this course.
Deliverable: The six dimensions of network baseline (volume, timing, destination, protocol, duration, directionality) and the anomaly patterns for each that indicate malicious activity. An NE baseline example for each dimension.
Figure NF0.6 — The six dimensions of network baseline. Each dimension has a normal range specific to the host and its role. Anomalies are deviations from that host-specific baseline, not from a generic threshold. A 12 GB transfer is anomalous for a workstation but normal for a backup server — context determines the investigation priority.
The Six Baseline Dimensions
Every network investigation starts with the same implicit question: is this normal? You can only answer that question if you know what normal looks like for the specific host, protocol, and time period.
Volume measures how much data a host transfers in a session or over a time period. Baseline: a typical workstation at NE sends 30-80 MB/day outbound to the internet — mostly web browsing, SaaS applications, and email. The anomaly in INC-NE-2026-0418 was 12.1 GB outbound from FS01-NGE in a single 32-minute session. That's 150× the daily baseline. Even without knowing the destination, the volume alone is investigation-worthy.
Timing tracks when network activity occurs. Business hours traffic from workstations is normal. Bulk outbound transfers at 03:00 from a file server are not — file servers don't decide to send data to external IPs at 3 AM. The attacker in INC-NE-2026-0418 timed the exfiltration for off-hours specifically because they expected less monitoring. Timing baselines catch this pattern.
Destination identifies where traffic goes. A workstation connecting to Microsoft 365, Slack, and GitHub is expected. The same workstation connecting to a VPS in a hosting provider block it's never communicated with before is an investigation trigger. First-seen destination analysis — "this host has never connected to this IP before" — is one of the most effective anomaly detection methods in Zeek conn.log.
Protocol examines what protocols are used and how. DNS queries are normal. DNS queries with 200-character subdomain labels containing base64-encoded data are DNS tunneling. HTTPS to port 443 is normal. HTTPS on port 8443 to an IP with no reverse DNS may warrant investigation. Protocol baselines are protocol-specific — Modules NF3-NF7 cover each protocol's normal and anomalous patterns in detail.
Duration measures how long connections last. A web browsing session to a CDN lasts seconds. A C2 beacon connection lasts hours or persists indefinitely with periodic check-ins. Long-lived connections to external IPs — especially when combined with consistent timing intervals — are a primary C2 indicator. Zeek's conn.log records connection duration in seconds, making this trivially queryable.
Directionality tracks the ratio of inbound to outbound data. Normal web browsing downloads more than it uploads (you request a page, the server sends it). Exfiltration reverses this — the compromised host sends more data than it receives. A connection where the originator bytes are 100× the responder bytes, on a host that normally downloads more than it uploads, is a data exfiltration indicator.
Building Baselines from Zeek Data
You don't need a commercial tool to build baselines. Zeek conn.log and standard command-line tools produce operational baselines for any host or network segment.
The baseline building process is straightforward: collect Zeek conn.log for a representative period (7-14 days of normal operation), then compute statistics per host and per dimension. For volume, aggregate originator and responder bytes per host per day. For timing, count connections per hour per host. For destinations, count unique external IPs per host per day. For duration, compute the median and 95th percentile connection duration per host.
The output is a per-host profile: "DESKTOP-NGE042 typically makes 800-1200 external connections per day, to 40-60 unique external IPs, with a total outbound volume of 30-80 MB, during 08:00-18:00 UTC." Any activity that falls significantly outside this profile warrants investigation.
Module NF11 covers baseline building and hunting methodology in depth. For now, the principle is: baselines are per-host, per-role, and per-time-period. A generic threshold ("alert on any transfer over 1 GB") produces false positives from backup servers, cloud sync, and software updates. A host-specific baseline ("alert when this workstation transfers 10× its daily average") produces actionable signals.
You're reviewing Zeek conn.log for baseline anomalies and you find a workstation (DESKTOP-NGE015) that makes 2,400 HTTPS connections per day to 180 unique external IPs. The daily baseline for workstations at NE is 800-1200 connections to 40-60 IPs. This host is 2× the connection count and 3× the unique destination count.
Before flagging this as suspicious, you check the host's role: it's a developer workstation running container builds that pull images from multiple registries. The high connection count and diverse destinations are explained by legitimate development activity.
This is the baseline principle in action. A generic threshold would have flagged this host every day. A role-specific baseline ("developer workstations make 2000-3000 connections to 150-200 unique IPs") correctly identifies this as normal. The same numbers on a finance team workstation would be strongly anomalous.
The decision: do you build role-specific baselines (more accurate, more effort) or generic baselines (less accurate, less effort)? For a first deployment, start with generic baselines and maintain an exception list for hosts with known elevated activity. Refine to role-specific baselines as you learn your network's patterns.
The most effective network anomaly detection is a Zeek conn.log query that compares today's traffic to the baseline. "Show me connections from this host to external IPs it's never connected to before" is a query, not a machine learning model. "Show me hosts transferring 10× their daily average outbound volume" is arithmetic.
Machine learning adds value for complex patterns at scale — detecting subtle beaconing variations, identifying DGA domains, or profiling encrypted traffic. But the foundational anomaly detection that catches the majority of attacks (new destinations, unusual volumes, off-hours activity, periodic connections) is arithmetic applied to structured data. You have that data in Zeek conn.log. You have the arithmetic in awk.
Commercial products that layer ML on top of network metadata are doing the same thing with better dashboards. The underlying detection — "this is different from the baseline" — is the same. Start with the queries. Graduate to ML when the queries aren't enough.
Try it: Build a 5-minute baseline from your own traffic
Setup. If you have Zeek running (from the NF0.5 Try It) or access to any Zeek conn.log data, use it. Otherwise, use the sample PCAP from malware-traffic-analysis.net.
Task. From a Zeek conn.log, compute: total connections, unique external IPs, total outbound bytes, and the top 5 destinations by connection count. Use: cat conn.log | zeek-cut id.resp_h | sort | uniq -c | sort -rn | head -5
Expected result. The top destinations are usually CDNs, DNS resolvers, and major cloud services (Microsoft, Google, Amazon). Any destination that doesn't match a known service is worth investigating. This 5-minute exercise is the foundation of network baselining.
Debugging branch. If the output contains internal IPs mixed with external: filter for external-only using grep -v "^10\.\|^192\.168\.\|^172\.1[6-9]\.\|^172\.2[0-9]\.\|^172\.3[0-1]\." before the sort.
cat conn.log | zeek-cut id.orig_h id.resp_h orig_bytes | awk '$3 > 100000000' | sort -t$'\t' -k3 -rn | head -20cat conn.log | zeek-cut ts id.orig_h id.resp_h | awk -F'\t' '{split($1,t,":"); if(t[1]+0 < 6 || t[1]+0 > 20) print}' | head -20cat baseline-conn.log | zeek-cut id.resp_h | sort -u > known_ips.txt
cat today-conn.log | zeek-cut id.orig_h id.resp_h | sort -u | awk -F'\t' '{print $2}' | sort -u > today_ips.txt
comm -23 today_ips.txt known_ips.txtcat conn.log | zeek-cut id.orig_h id.resp_h id.resp_p | sort | uniq -c | sort -rn | head -20You've built the sensor and mapped the evidence landscape.
NF0 established why network evidence matters when every other source is compromised. NF1 built your Zeek + Suricata sensor with the 10 investigation query patterns. From here, every module teaches protocol-specific investigation against real attack scenarios.
- DNS deep dive (NF3) — tunnelling detection, DGA analysis, passive DNS infrastructure mapping, and the INC-NE-2026-0227 AiTM phishing DNS trail
- Protocol analysis (NF4–NF7) — HTTP/HTTPS, SMB lateral movement, SSH tunnelling, and email protocol investigation with Zeek metadata and PCAP
- Detection and hunting (NF8–NF11) — Suricata rule writing, C2 beacon detection with JA3, NetFlow analytics, and proactive network threat hunting
- NSM architecture (NF13) — production sensor deployment at 1–10 Gbps with Arkime, Security Onion, and enterprise storage planning
- INC-NE-2026-0830 capstone (NF14) — multi-stage investigation using only network evidence: phishing → domain-fronted C2 → lateral movement → DNS tunnel exfiltration
Cancel anytime