In this module

NF0.6 Normal vs Malicious Traffic

8 hours · Module 0 · Free
What you already know

You already understand baseline thinking from endpoint investigation — you know that svchost.exe running from System32 is normal while svchost.exe running from Temp is suspicious. The same principle applies to network traffic: distinguishing attack traffic from normal traffic requires knowing what normal looks like.

Operational Objective

A Zeek conn.log for a 500-person organization generates hundreds of thousands of entries per day. The vast majority are legitimate — Microsoft 365 traffic, SaaS application connections, cloud storage sync, browser updates. The attacker's C2 beacon is 1 connection out of 200,000. Finding it without baseline knowledge is looking for a needle in a haystack you've never seen before.

The fundamental network investigation skill isn't packet analysis. It's pattern recognition. Knowing that a workstation making 50 DNS queries per minute to unique subdomains of a .xyz domain is anomalous. Knowing that a server sending 500 MB outbound at 03:00 to an IP in a hosting block it's never communicated with before is suspicious. Knowing that an HTTPS connection with a 60-second beacon interval and consistent 300-byte payload is a C2 channel.

This sub introduces the baseline-thinking model that underpins every detection and hunting technique in this course.

Deliverable: The six dimensions of network baseline (volume, timing, destination, protocol, duration, directionality) and the anomaly patterns for each that indicate malicious activity. An NE baseline example for each dimension.

Estimated completion: 30 minutes
SIX DIMENSIONS OF NETWORK BASELINE Volume Normal: 50 MB/day Anomaly: 12 GB in one session Timing Normal: 09:00–17:00 Anomaly: 03:00 bulk transfer outbound Destination Normal: O365, SaaS Anomaly: hosting IP never seen before Protocol Normal: HTTPS, DNS Anomaly: DNS TXT queries with encoding Duration Normal: seconds Anomaly: 6-hour persistent session Direction Normal: more inbound than out Anomaly: reversed BASELINE THINKING: THE QUESTION IS ALWAYS "COMPARED TO WHAT?" 12 GB outbound is anomalous for a workstation. It's normal for a backup server. A connection at 03:00 is anomalous for a user. It's normal for a scheduled task. The baseline is specific to the host, the role, the time, and the history. Generic thresholds produce false positives.

Figure NF0.6 — The six dimensions of network baseline. Each dimension has a normal range specific to the host and its role. Anomalies are deviations from that host-specific baseline, not from a generic threshold. A 12 GB transfer is anomalous for a workstation but normal for a backup server — context determines the investigation priority.

The Six Baseline Dimensions

Every network investigation starts with the same implicit question: is this normal? You can only answer that question if you know what normal looks like for the specific host, protocol, and time period.

Volume measures how much data a host transfers in a session or over a time period. Baseline: a typical workstation at NE sends 30-80 MB/day outbound to the internet — mostly web browsing, SaaS applications, and email. The anomaly in INC-NE-2026-0418 was 12.1 GB outbound from FS01-NGE in a single 32-minute session. That's 150× the daily baseline. Even without knowing the destination, the volume alone is investigation-worthy.

Timing tracks when network activity occurs. Business hours traffic from workstations is normal. Bulk outbound transfers at 03:00 from a file server are not — file servers don't decide to send data to external IPs at 3 AM. The attacker in INC-NE-2026-0418 timed the exfiltration for off-hours specifically because they expected less monitoring. Timing baselines catch this pattern.

Destination identifies where traffic goes. A workstation connecting to Microsoft 365, Slack, and GitHub is expected. The same workstation connecting to a VPS in a hosting provider block it's never communicated with before is an investigation trigger. First-seen destination analysis — "this host has never connected to this IP before" — is one of the most effective anomaly detection methods in Zeek conn.log.

Protocol examines what protocols are used and how. DNS queries are normal. DNS queries with 200-character subdomain labels containing base64-encoded data are DNS tunneling. HTTPS to port 443 is normal. HTTPS on port 8443 to an IP with no reverse DNS may warrant investigation. Protocol baselines are protocol-specific — Modules NF3-NF7 cover each protocol's normal and anomalous patterns in detail.

Duration measures how long connections last. A web browsing session to a CDN lasts seconds. A C2 beacon connection lasts hours or persists indefinitely with periodic check-ins. Long-lived connections to external IPs — especially when combined with consistent timing intervals — are a primary C2 indicator. Zeek's conn.log records connection duration in seconds, making this trivially queryable.

Directionality tracks the ratio of inbound to outbound data. Normal web browsing downloads more than it uploads (you request a page, the server sends it). Exfiltration reverses this — the compromised host sends more data than it receives. A connection where the originator bytes are 100× the responder bytes, on a host that normally downloads more than it uploads, is a data exfiltration indicator.

Building Baselines from Zeek Data

You don't need a commercial tool to build baselines. Zeek conn.log and standard command-line tools produce operational baselines for any host or network segment.

The baseline building process is straightforward: collect Zeek conn.log for a representative period (7-14 days of normal operation), then compute statistics per host and per dimension. For volume, aggregate originator and responder bytes per host per day. For timing, count connections per hour per host. For destinations, count unique external IPs per host per day. For duration, compute the median and 95th percentile connection duration per host.

The output is a per-host profile: "DESKTOP-NGE042 typically makes 800-1200 external connections per day, to 40-60 unique external IPs, with a total outbound volume of 30-80 MB, during 08:00-18:00 UTC." Any activity that falls significantly outside this profile warrants investigation.

Module NF11 covers baseline building and hunting methodology in depth. For now, the principle is: baselines are per-host, per-role, and per-time-period. A generic threshold ("alert on any transfer over 1 GB") produces false positives from backup servers, cloud sync, and software updates. A host-specific baseline ("alert when this workstation transfers 10× its daily average") produces actionable signals.

Guided Procedure — Spot the Anomaly in NE Traffic
Step 1. The following Zeek conn.log entries are from FS01-NGE during a 24-hour period. Identify which entries are normal file server activity and which are anomalous.
Expected output: Internal-to-internal SMB connections (port 445) with moderate byte counts are normal for a file server. An outbound connection to an external IP on port 443 with 12.1 GB originator bytes at 03:00 is anomalous — file servers don't initiate large outbound HTTPS transfers to external IPs.
If you flagged all external connections: Some file servers do make legitimate external connections — for Windows Update, antivirus updates, or management agents. The anomaly isn't "external connection" — it's "12.1 GB outbound to an unfamiliar IP at 03:00." Context matters.
Step 2. IT03-NGE (a helpdesk workstation) shows 847 connections to 185.193.125.44 over 72 hours, each lasting 0.3-0.5 seconds, with 300-500 bytes in each direction, at approximately 60-second intervals.
Expected output: This is a textbook C2 beacon pattern: consistent interval (~60 seconds), consistent byte count (300-500 bytes), consistent duration (0.3-0.5 seconds), to a single external IP over 72 hours. Normal traffic doesn't produce this regularity. The anomaly dimensions are: timing (periodic), volume (consistent), duration (consistent), and destination (single external IP over 3 days).
If you thought this could be a heartbeat monitor or health check: It could be — some applications do periodic check-ins. The investigation would verify by checking the TLS certificate, JA3 fingerprint, and whether the destination IP hosts a known service. The regularity alone isn't proof of C2, but it's a strong enough signal to investigate.
Step 3. DESKTOP-NGE042 (a finance user's workstation) makes 50 DNS TXT queries per minute to unique subdomains of data-cdn-service.xyz for 15 minutes. Each subdomain is 40+ characters of base64-encoded text.
Expected output: This is DNS tunneling. The anomaly dimensions are: protocol (TXT queries with encoded data — normal workstations don't make TXT queries at this rate), volume (50/minute — far above normal DNS rate), and destination (a single domain receiving encoded data). The base64 subdomains are the data being exfiltrated — encoded and sent as DNS queries to the attacker's authoritative DNS server.
If you weren't sure about the DNS TXT pattern: Module NF3 covers DNS investigation in depth, including tunneling detection. For now, the key signal is: high-rate DNS queries with unusually long, encoded subdomain labels to a single domain are almost always DNS tunneling.
Decision point

You're reviewing Zeek conn.log for baseline anomalies and you find a workstation (DESKTOP-NGE015) that makes 2,400 HTTPS connections per day to 180 unique external IPs. The daily baseline for workstations at NE is 800-1200 connections to 40-60 IPs. This host is 2× the connection count and 3× the unique destination count.

Before flagging this as suspicious, you check the host's role: it's a developer workstation running container builds that pull images from multiple registries. The high connection count and diverse destinations are explained by legitimate development activity.

This is the baseline principle in action. A generic threshold would have flagged this host every day. A role-specific baseline ("developer workstations make 2000-3000 connections to 150-200 unique IPs") correctly identifies this as normal. The same numbers on a finance team workstation would be strongly anomalous.

The decision: do you build role-specific baselines (more accurate, more effort) or generic baselines (less accurate, less effort)? For a first deployment, start with generic baselines and maintain an exception list for hosts with known elevated activity. Refine to role-specific baselines as you learn your network's patterns.

Compliance Myth: "Anomaly detection means AI and machine learning — we need a commercial product"

The most effective network anomaly detection is a Zeek conn.log query that compares today's traffic to the baseline. "Show me connections from this host to external IPs it's never connected to before" is a query, not a machine learning model. "Show me hosts transferring 10× their daily average outbound volume" is arithmetic.

Machine learning adds value for complex patterns at scale — detecting subtle beaconing variations, identifying DGA domains, or profiling encrypted traffic. But the foundational anomaly detection that catches the majority of attacks (new destinations, unusual volumes, off-hours activity, periodic connections) is arithmetic applied to structured data. You have that data in Zeek conn.log. You have the arithmetic in awk.

Commercial products that layer ML on top of network metadata are doing the same thing with better dashboards. The underlying detection — "this is different from the baseline" — is the same. Start with the queries. Graduate to ML when the queries aren't enough.

Next
NF0.7 — Network Architecture for Investigators. You know what to look for. NF0.7 covers where to look — how network architecture (switches, routers, VLANs, DMZs, cloud VPCs) determines what your sensor can see and where blind spots exist.
Try it: Build a 5-minute baseline from your own traffic

Setup. If you have Zeek running (from the NF0.5 Try It) or access to any Zeek conn.log data, use it. Otherwise, use the sample PCAP from malware-traffic-analysis.net.

Task. From a Zeek conn.log, compute: total connections, unique external IPs, total outbound bytes, and the top 5 destinations by connection count. Use: cat conn.log | zeek-cut id.resp_h | sort | uniq -c | sort -rn | head -5

Expected result. The top destinations are usually CDNs, DNS resolvers, and major cloud services (Microsoft, Google, Amazon). Any destination that doesn't match a known service is worth investigating. This 5-minute exercise is the foundation of network baselining.

Debugging branch. If the output contains internal IPs mixed with external: filter for external-only using grep -v "^10\.\|^192\.168\.\|^172\.1[6-9]\.\|^172\.2[0-9]\.\|^172\.3[0-1]\." before the sort.

Checkpoint — before moving on
1. Name the six dimensions of network baseline and give one anomaly indicator for each. (§ The Six Baseline Dimensions)
2. Explain why baselines should be per-host and per-role rather than generic thresholds, using the developer workstation example. (§ Decision Point)
3. Identify the C2 beacon pattern from the guided procedure and state which baseline dimensions it violates. (§ Guided Procedure, Step 2)
cat conn.log | zeek-cut id.orig_h id.resp_h orig_bytes | awk '$3 > 100000000' | sort -t$'\t' -k3 -rn | head -20
cat conn.log | zeek-cut ts id.orig_h id.resp_h | awk -F'\t' '{split($1,t,":"); if(t[1]+0 < 6 || t[1]+0 > 20) print}' | head -20
cat baseline-conn.log | zeek-cut id.resp_h | sort -u > known_ips.txt
cat today-conn.log | zeek-cut id.orig_h id.resp_h | sort -u | awk -F'\t' '{print $2}' | sort -u > today_ips.txt
comm -23 today_ips.txt known_ips.txt
cat conn.log | zeek-cut id.orig_h id.resp_h id.resp_p | sort | uniq -c | sort -rn | head -20

You've built the sensor and mapped the evidence landscape.

NF0 established why network evidence matters when every other source is compromised. NF1 built your Zeek + Suricata sensor with the 10 investigation query patterns. From here, every module teaches protocol-specific investigation against real attack scenarios.

  • DNS deep dive (NF3) — tunnelling detection, DGA analysis, passive DNS infrastructure mapping, and the INC-NE-2026-0227 AiTM phishing DNS trail
  • Protocol analysis (NF4–NF7) — HTTP/HTTPS, SMB lateral movement, SSH tunnelling, and email protocol investigation with Zeek metadata and PCAP
  • Detection and hunting (NF8–NF11) — Suricata rule writing, C2 beacon detection with JA3, NetFlow analytics, and proactive network threat hunting
  • NSM architecture (NF13) — production sensor deployment at 1–10 Gbps with Arkime, Security Onion, and enterprise storage planning
  • INC-NE-2026-0830 capstone (NF14) — multi-stage investigation using only network evidence: phishing → domain-fronted C2 → lateral movement → DNS tunnel exfiltration
Unlock the full course with Premium See Full Syllabus

Cancel anytime