In this module
NF1.8 First Investigation Queries
You've seen individual zeek-cut commands in NF1.3 and NF1.4. This sub brings them together into complete investigation workflows — the query sequences you'll repeat in every module. By the end of this sub, you'll have a toolkit of 10 reusable query patterns that cover the most common investigation questions.
The investigation methodology from NF0.4 defines the steps — Scope, Identify, Correlate. But steps need tools. The tool for network investigation is the command line — zeek-cut piped through grep, awk, sort, and uniq. These aren't complicated commands, but combining them into investigation workflows requires practice.
This sub establishes the 10 query patterns you'll use throughout the course. Each pattern answers a specific investigation question. By the time you finish this sub, you should be able to run any of them from memory.
Deliverable: Ten reusable investigation query patterns, tested against your sensor data, saved as shell aliases or a script file for repeated use.
Figure NF1.8 — The 10 investigation query patterns organized by methodology step. Patterns 1-3 scope the investigation. Patterns 4-6 identify suspicious activity. Patterns 7-10 correlate across log files and tools.
Scope Patterns
Pattern 1 — All connections from/to a specific host:
cat conn.log | zeek-cut ts id.orig_h id.resp_h id.resp_p proto service duration orig_bytes resp_bytes | grep "10.0.1.15"This is the starting query for almost every investigation. Given an IOC (an IP from an alert, an endpoint investigation, or threat intelligence), find every connection involving it. The output shows what the host communicated with, on which ports, for how long, and how much data was transferred.
Pattern 2 — DNS resolutions for a domain or IP:
# By domain name
cat dns.log | zeek-cut ts id.orig_h query answers rcode_name | grep "suspicious-domain"
# By resolved IP
cat dns.log | zeek-cut ts id.orig_h query answers | grep "203.0.113.88"DNS is the first step in nearly every attack chain. This pattern finds which hosts queried a domain and what IP it resolved to (or which domain resolved to a known-bad IP).
Pattern 3 — Top external destinations by connection count:
cat conn.log | zeek-cut id.orig_h id.resp_h | \
grep -v "^10\.\|^192\.168\.\|^172\.1[6-9]\.\|^172\.2[0-9]\.\|^172\.3[0-1]\." | \
awk -F'\t' '{print $2}' | sort | uniq -c | sort -rn | head -20This shows which external IPs received the most connections from internal hosts. Unusual entries — IPs you don't recognize in hosting provider ranges — warrant investigation.
Identify Patterns
Pattern 4 — Largest outbound transfers (exfiltration candidates):
cat conn.log | zeek-cut ts id.orig_h id.resp_h id.resp_p orig_bytes | \
awk -F'\t' '$5 > 100000000' | sort -t$'\t' -k5 -rn | head -20Connections where the originator sent more than 100 MB. Sort by originator bytes descending to put the largest transfers first. Cross-reference with the destination IP to determine if it's a known service or suspicious.
Pattern 5 — Long-duration connections (C2 candidates):
cat conn.log | zeek-cut ts id.orig_h id.resp_h id.resp_p duration | \
awk -F'\t' '$5 > 3600' | sort -t$'\t' -k5 -rn | head -20Connections lasting more than 1 hour. Legitimate long connections include VPN tunnels, streaming media, and persistent WebSocket connections. Illegitimate long connections include C2 channels and interactive shells.
Pattern 6 — Beaconing candidates (consistent connection intervals):
cat conn.log | zeek-cut id.orig_h id.resp_h id.resp_p | \
sort | uniq -c | sort -rn | head -20This counts how many times each unique source-destination-port combination appears. A C2 beacon that checks in every 60 seconds for 72 hours produces approximately 4,320 connections to the same destination. Normal browsing produces connections to many different destinations. A high count to a single destination is a beacon indicator.
Correlate Patterns
Pattern 7 — UID pivot from conn.log to protocol logs:
# Find a suspicious connection
cat conn.log | zeek-cut uid id.orig_h id.resp_h id.resp_p | grep "203.0.113.88" | head -1
# Take the UID and search all logs
UID="THE_UID_FROM_ABOVE"
grep "$UID" *.logThe UID links the same network session across all Zeek logs. A connection to port 443 will appear in conn.log (connection metadata) and ssl.log (TLS handshake). The UID ties them together.
Pattern 8 — Community ID pivot from Suricata to Zeek:
# Get CID from Suricata alert
CID=$(cat eve.json | jq -r 'select(.event_type=="alert") | .community_id' | head -1)
# Find in Zeek
grep "$CID" conn.logCommunity ID links the same flow across tools. Suricata says "this is bad." The Community ID lets you find everything Zeek recorded about the same session.
Pattern 9 — All internal hosts communicating with a C2 IP:
cat conn.log | zeek-cut id.orig_h id.resp_h | grep "203.0.113.88" | \
awk -F'\t' '{print $1}' | sort -uAfter you identify a C2 IP, this pattern finds every internal host that communicated with it. Each unique originator IP is potentially compromised.
Pattern 10 — TLS fingerprint search (JA3):
# Find the JA3 hash for a known-bad connection
cat ssl.log | zeek-cut ja3 server_name | grep "suspicious-domain" | head -1
# Search for that JA3 across all TLS connections
JA3="THE_HASH_FROM_ABOVE"
cat ssl.log | zeek-cut ts id.orig_h id.resp_h server_name ja3 | grep "$JA3"JA3 fingerprinting identifies the client software by its TLS handshake. If a C2 beacon has a distinctive JA3 hash, searching for that hash across all TLS connections finds every session using the same client — even to different destinations.
You're investigating a host and Pattern 3 shows it connected to 120 unique external IPs in 24 hours. Most are Microsoft 365, Google, and CDN addresses. Three IPs are in hosting provider ranges you don't recognize. You don't have time to investigate all 120.
The prioritization: investigate the three unknown IPs first. Run Pattern 2 (DNS lookup) for each. If any resolved from a recently-registered domain, that's your highest priority. If the DNS lookup returns nothing (direct-to-IP connection), that's suspicious — most legitimate traffic uses DNS.
Then check the known service IPs for anomalous behavior: Pattern 4 (large transfers to Microsoft IPs could be legitimate OneDrive sync or attacker exfiltrating to a compromised Azure tenant) and Pattern 5 (long durations to CDN IPs are normal for streaming, suspicious for static content).
The patterns work in combination. No single pattern is definitive. The investigation builds confidence through convergence — when multiple patterns point to the same connection as suspicious.
A SIEM (Splunk, Elastic, Sentinel) provides a GUI, dashboards, and automated alerting on Zeek data. These are valuable for operational SOC workflows. But the investigation queries in this sub — zeek-cut, grep, awk, sort, uniq — run on the sensor itself with zero infrastructure. A SIEM adds latency (ingestion pipeline), cost (license fees or infrastructure), and complexity (schema mapping, field extraction).
For investigation, command-line queries against local Zeek logs are faster and more flexible than SIEM queries. You can pipe, filter, transform, and combine outputs in ways that SIEM query languages can't match. And the queries run on the sensor — no network transfer, no ingestion delay.
Use a SIEM for automated detection and dashboarding. Use the command line for investigation. Both have their place. Don't assume you can't investigate without a SIEM.
Try it: Save the 10 patterns as shell aliases
Setup. Your sensor VM with a Zeek conn.log from a test PCAP analysis.
Task. Create a file /opt/sensor/query-patterns.sh with all 10 patterns as bash functions. Source it in your shell: source /opt/sensor/query-patterns.sh. Test at least three patterns against your data.
Expected result. You can run nf_host 10.0.1.15 instead of the full zeek-cut pipeline. The function names match the investigation action: nf_host, nf_dns, nf_top_dst, nf_large_tx, nf_long_conn, nf_beacon, nf_uid, nf_cid, nf_c2_hosts, nf_ja3.
Debugging branch. If bash functions aren't working: ensure the file uses function nf_host() { ... } syntax and is sourced (not executed). Add source /opt/sensor/query-patterns.sh to your ~/.bashrc for persistence.
# Extract the 24-hour window around the incident
awk -F'\t' '$1 >= "2026-04-18T00:00:00" && $1 < "2026-04-19T00:00:00"' conn.log > conn-incident.log
# Now run queries against conn-incident.log (smaller, faster)parallel --pipe --block 100M grep -c "COMMUNITY_ID" < conn.log | awk '{sum+=$1} END {print sum}'# Get flow_ids of all alerts
ALERT_FLOWS=$(cat eve.json | jq -r 'select(.event_type=="alert") | .flow_id' | sort -u)
# Get the flow records for those flows
for fid in $ALERT_FLOWS; do
cat eve.json | jq --argjson f "$fid" 'select(.event_type=="flow" and .flow_id==$f)'
donecat eve.json | jq -r 'select(.event_type=="alert") | [.timestamp[0:13], .alert.category] | @tsv' | \
sort | uniq -c | sort -rncat eve.json | jq -r 'select(.event_type=="http") | [.src_ip, .http.hostname, .http.url] | @tsv' | \
sort | uniq -c | sort -rn | head -20You've built the sensor and mapped the evidence landscape.
NF0 established why network evidence matters when every other source is compromised. NF1 built your Zeek + Suricata sensor with the 10 investigation query patterns. From here, every module teaches protocol-specific investigation against real attack scenarios.
- DNS deep dive (NF3) — tunnelling detection, DGA analysis, passive DNS infrastructure mapping, and the INC-NE-2026-0227 AiTM phishing DNS trail
- Protocol analysis (NF4–NF7) — HTTP/HTTPS, SMB lateral movement, SSH tunnelling, and email protocol investigation with Zeek metadata and PCAP
- Detection and hunting (NF8–NF11) — Suricata rule writing, C2 beacon detection with JA3, NetFlow analytics, and proactive network threat hunting
- NSM architecture (NF13) — production sensor deployment at 1–10 Gbps with Arkime, Security Onion, and enterprise storage planning
- INC-NE-2026-0830 capstone (NF14) — multi-stage investigation using only network evidence: phishing → domain-fronted C2 → lateral movement → DNS tunnel exfiltration
Cancel anytime