In this module

NF1.8 First Investigation Queries

10 hours · Module 1 · Free

What you already know

You've seen individual zeek-cut commands in NF1.3 and NF1.4. This sub brings them together into complete investigation workflows — the query sequences you'll repeat in every module. By the end of this sub, you'll have a toolkit of 10 reusable query patterns that cover the most common investigation questions.

Operational Objective

The investigation methodology from NF0.4 defines the steps — Scope, Identify, Correlate. But steps need tools. The tool for network investigation is the command line — zeek-cut piped through grep, awk, sort, and uniq. These aren't complicated commands, but combining them into investigation workflows requires practice.

This sub establishes the 10 query patterns you'll use throughout the course. Each pattern answers a specific investigation question. By the time you finish this sub, you should be able to run any of them from memory.

Deliverable: Ten reusable investigation query patterns, tested against your sensor data, saved as shell aliases or a script file for repeated use.

Estimated completion: 35 minutes

Figure NF1.8 — The 10 investigation query patterns organized by methodology step. Patterns 1-3 scope the investigation. Patterns 4-6 identify suspicious activity. Patterns 7-10 correlate across log files and tools.

Scope Patterns

Pattern 1 — All connections from/to a specific host:

cat conn.log | zeek-cut ts id.orig_h id.resp_h id.resp_p proto service duration orig_bytes resp_bytes | grep "10.0.1.15"

This is the starting query for almost every investigation. Given an IOC (an IP from an alert, an endpoint investigation, or threat intelligence), find every connection involving it. The output shows what the host communicated with, on which ports, for how long, and how much data was transferred.

Pattern 2 — DNS resolutions for a domain or IP:

# By domain name
cat dns.log | zeek-cut ts id.orig_h query answers rcode_name | grep "suspicious-domain"

# By resolved IP
cat dns.log | zeek-cut ts id.orig_h query answers | grep "203.0.113.88"

DNS is the first step in nearly every attack chain. This pattern finds which hosts queried a domain and what IP it resolved to (or which domain resolved to a known-bad IP).

Pattern 3 — Top external destinations by connection count:

cat conn.log | zeek-cut id.orig_h id.resp_h | \
  grep -v "^10\.\|^192\.168\.\|^172\.1[6-9]\.\|^172\.2[0-9]\.\|^172\.3[0-1]\." | \
  awk -F'\t' '{print $2}' | sort | uniq -c | sort -rn | head -20

This shows which external IPs received the most connections from internal hosts. Unusual entries — IPs you don't recognize in hosting provider ranges — warrant investigation.

Identify Patterns

Pattern 4 — Largest outbound transfers (exfiltration candidates):

cat conn.log | zeek-cut ts id.orig_h id.resp_h id.resp_p orig_bytes | \
  awk -F'\t' '$5 > 100000000' | sort -t$'\t' -k5 -rn | head -20

Connections where the originator sent more than 100 MB. Sort by originator bytes descending to put the largest transfers first. Cross-reference with the destination IP to determine if it's a known service or suspicious.

Pattern 5 — Long-duration connections (C2 candidates):

cat conn.log | zeek-cut ts id.orig_h id.resp_h id.resp_p duration | \
  awk -F'\t' '$5 > 3600' | sort -t$'\t' -k5 -rn | head -20

Connections lasting more than 1 hour. Legitimate long connections include VPN tunnels, streaming media, and persistent WebSocket connections. Illegitimate long connections include C2 channels and interactive shells.

Pattern 6 — Beaconing candidates (consistent connection intervals):

cat conn.log | zeek-cut id.orig_h id.resp_h id.resp_p | \
  sort | uniq -c | sort -rn | head -20

This counts how many times each unique source-destination-port combination appears. A C2 beacon that checks in every 60 seconds for 72 hours produces approximately 4,320 connections to the same destination. Normal browsing produces connections to many different destinations. A high count to a single destination is a beacon indicator.

Correlate Patterns

Pattern 7 — UID pivot from conn.log to protocol logs:

# Find a suspicious connection
cat conn.log | zeek-cut uid id.orig_h id.resp_h id.resp_p | grep "203.0.113.88" | head -1

# Take the UID and search all logs
UID="THE_UID_FROM_ABOVE"
grep "$UID" *.log

The UID links the same network session across all Zeek logs. A connection to port 443 will appear in conn.log (connection metadata) and ssl.log (TLS handshake). The UID ties them together.

Pattern 8 — Community ID pivot from Suricata to Zeek:

# Get CID from Suricata alert
CID=$(cat eve.json | jq -r 'select(.event_type=="alert") | .community_id' | head -1)

# Find in Zeek
grep "$CID" conn.log

Community ID links the same flow across tools. Suricata says "this is bad." The Community ID lets you find everything Zeek recorded about the same session.

Pattern 9 — All internal hosts communicating with a C2 IP:

cat conn.log | zeek-cut id.orig_h id.resp_h | grep "203.0.113.88" | \
  awk -F'\t' '{print $1}' | sort -u

After you identify a C2 IP, this pattern finds every internal host that communicated with it. Each unique originator IP is potentially compromised.

Pattern 10 — TLS fingerprint search (JA3):

# Find the JA3 hash for a known-bad connection
cat ssl.log | zeek-cut ja3 server_name | grep "suspicious-domain" | head -1

# Search for that JA3 across all TLS connections
JA3="THE_HASH_FROM_ABOVE"
cat ssl.log | zeek-cut ts id.orig_h id.resp_h server_name ja3 | grep "$JA3"

JA3 fingerprinting identifies the client software by its TLS handshake. If a C2 beacon has a distinctive JA3 hash, searching for that hash across all TLS connections finds every session using the same client — even to different destinations.

Guided Procedure — Run All 10 Patterns Against Your Data

Step 1. Run Zeek against a test PCAP and pick an IP address from the conn.log as your investigation target.

If all IPs are internal: Your PCAP may be from an internal-only capture. Use a PCAP from malware-traffic-analysis.net that includes external traffic.

Step 2. Run Patterns 1, 2, and 3 (Scope) against your target IP.

Expected output: Pattern 1 shows all connections involving the IP. Pattern 2 shows the DNS domain that resolved to the IP. Pattern 3 shows where the IP ranks among all external destinations.

If Pattern 2 returns nothing: The connection may have been direct-to-IP (no DNS resolution). This is itself an investigation indicator — legitimate traffic usually uses DNS.

Step 3. Run Pattern 7 (UID pivot) to find the same connection in ssl.log or http.log.

Expected output: The UID from the conn.log entry appears in ssl.log (showing the TLS certificate and JA3 hash) or http.log (showing the HTTP request). Not all connections have protocol-specific log entries — but most port 443 connections appear in ssl.log.

If the UID isn't found in any other log: The connection used a protocol Zeek doesn't parse (raw TCP, unknown application protocol). The conn.log entry is still valid evidence — it just doesn't have protocol-level detail.

Decision point

You're investigating a host and Pattern 3 shows it connected to 120 unique external IPs in 24 hours. Most are Microsoft 365, Google, and CDN addresses. Three IPs are in hosting provider ranges you don't recognize. You don't have time to investigate all 120.

The prioritization: investigate the three unknown IPs first. Run Pattern 2 (DNS lookup) for each. If any resolved from a recently-registered domain, that's your highest priority. If the DNS lookup returns nothing (direct-to-IP connection), that's suspicious — most legitimate traffic uses DNS.

Then check the known service IPs for anomalous behavior: Pattern 4 (large transfers to Microsoft IPs could be legitimate OneDrive sync or attacker exfiltrating to a compromised Azure tenant) and Pattern 5 (long durations to CDN IPs are normal for streaming, suspicious for static content).

The patterns work in combination. No single pattern is definitive. The investigation builds confidence through convergence — when multiple patterns point to the same connection as suspicious.

Compliance Myth: "You need a SIEM to query Zeek logs effectively"

A SIEM (Splunk, Elastic, Sentinel) provides a GUI, dashboards, and automated alerting on Zeek data. These are valuable for operational SOC workflows. But the investigation queries in this sub — zeek-cut, grep, awk, sort, uniq — run on the sensor itself with zero infrastructure. A SIEM adds latency (ingestion pipeline), cost (license fees or infrastructure), and complexity (schema mapping, field extraction).

For investigation, command-line queries against local Zeek logs are faster and more flexible than SIEM queries. You can pipe, filter, transform, and combine outputs in ways that SIEM query languages can't match. And the queries run on the sensor — no network transfer, no ingestion delay.

Use a SIEM for automated detection and dashboarding. Use the command line for investigation. Both have their place. Don't assume you can't investigate without a SIEM.

NF1.9 — Sensor Maintenance and Monitoring. Your sensor works and you can query it. NF1.9 covers the ongoing maintenance that keeps it producing reliable evidence — log rotation, rule updates, health monitoring, and troubleshooting.

Try it: Save the 10 patterns as shell aliases

Setup. Your sensor VM with a Zeek conn.log from a test PCAP analysis.

Task. Create a file /opt/sensor/query-patterns.sh with all 10 patterns as bash functions. Source it in your shell: source /opt/sensor/query-patterns.sh. Test at least three patterns against your data.

Expected result. You can run nf_host 10.0.1.15 instead of the full zeek-cut pipeline. The function names match the investigation action: nf_host, nf_dns, nf_top_dst, nf_large_tx, nf_long_conn, nf_beacon, nf_uid, nf_cid, nf_c2_hosts, nf_ja3.

Debugging branch. If bash functions aren't working: ensure the file uses function nf_host() { ... } syntax and is sourced (not executed). Add source /opt/sensor/query-patterns.sh to your ~/.bashrc for persistence.

Checkpoint — before moving on

1. Run Patterns 1-3 (Scope) against a target IP and interpret the results. (§ Scope Patterns)

2. Demonstrate Pattern 7 (UID pivot) — find a conn.log entry and locate the same session in ssl.log or dns.log. (§ Correlate Patterns)

3. Explain what Pattern 6 (beaconing) looks for and why a high connection count to a single destination is suspicious. (§ Identify Patterns)

# Extract the 24-hour window around the incident
awk -F'\t' '$1 >= "2026-04-18T00:00:00" && $1 < "2026-04-19T00:00:00"' conn.log > conn-incident.log
# Now run queries against conn-incident.log (smaller, faster)

parallel --pipe --block 100M grep -c "COMMUNITY_ID" < conn.log | awk '{sum+=$1} END {print sum}'

# Get flow_ids of all alerts
ALERT_FLOWS=$(cat eve.json | jq -r 'select(.event_type=="alert") | .flow_id' | sort -u)
# Get the flow records for those flows
for fid in $ALERT_FLOWS; do
  cat eve.json | jq --argjson f "$fid" 'select(.event_type=="flow" and .flow_id==$f)'
done

cat eve.json | jq -r 'select(.event_type=="alert") | [.timestamp[0:13], .alert.category] | @tsv' | \
  sort | uniq -c | sort -rn

cat eve.json | jq -r 'select(.event_type=="http") | [.src_ip, .http.hostname, .http.url] | @tsv' | \
  sort | uniq -c | sort -rn | head -20

Operational Artifact — 10 Investigation Query Patterns

SCOPE: (1) All connections for a host: zeek-cut ts id.orig_h id.resp_h id.resp_p duration orig_bytes | grep "TARGET". (2) DNS for a domain: zeek-cut ts id.orig_h query answers | grep "DOMAIN". (3) Top external destinations: pipe to sort | uniq -c | sort -rn.

IDENTIFY: (4) Large outbound: zeek-cut ... orig_bytes | awk '$N > 100000000'. (5) Long duration: awk '$N > 3600'. (6) Beaconing: zeek-cut id.orig_h id.resp_h id.resp_p | sort | uniq -c | sort -rn.

CORRELATE: (7) UID pivot: grep "UID" *.log. (8) CID pivot: jq ... eve.json → grep "CID" conn.log. (9) All C2 hosts: grep "C2_IP" conn.log | awk '{print $1}' | sort -u. (10) JA3 search: zeek-cut ja3 | grep "HASH".

These patterns compose. Pattern 1 finds a suspicious connection → Pattern 7 pivots to ssl.log → Pattern 10 searches for the same JA3 across all TLS sessions → Pattern 9 finds all hosts using that fingerprint. The chain goes: connection → fingerprint → scope.

Extended reference — query performance, jq patterns, and SIEM translation

Query performance at scale. The patterns above work on PCAP-replay-scale log files (hundreds of MB to a few GB). At production scale (tens of GB per day, multi-terabyte archives), they become slow enough to matter.

Three performance considerations.

Use grep before awk. grep is faster than awk for simple string matching. grep "10.0.1.50" conn.log | awk -F'\t' '$7 > 100000000' is faster than awk -F'\t' '$3 == "10.0.1.50" && $7 > 100000000' conn.log for large files.

Pre-filter by time window. Most investigations have a known time range. Extract the window into a smaller working file first; subsequent queries run against the subset.

Parallel processing for large files. parallel splits a large file across multiple CPU cores. Example — count Community ID occurrences across 10 GB of conn.log:

For routine investigation at lab scale, the simple pipelines work. For incident response against weeks of production logs, these patterns matter.

jq patterns for eve.json. Suricata's eve.json is newline-delimited JSON. Common patterns beyond the event_type == "alert" filter.

Extract alert-plus-flow pair. Match the alert's flow_id back to the corresponding flow record for full flow context.

Summarise alert categories by hour. Shows which alert categories spike when.

Extract HTTP URIs from events. For identifying C2 callback patterns or suspicious URIs.

Translating to SIEM query languages. The bash patterns above map to SIEM equivalents. The translation matters because most production investigation happens in a SIEM, not at the command line.

Splunk SPL example for Pattern 1: index=zeek source=conn.log TARGET | table _time orig_h resp_h resp_p duration orig_bytes.

Elasticsearch KQL example for Pattern 4: event.dataset: "zeek.conn" and destination.bytes > 100000000.

Microsoft Sentinel KQL example for Pattern 6: Zeek_conn_CL | summarize count() by orig_h_s, resp_h_s, resp_p_d | where count_ > 100.

The underlying investigation logic is identical; the syntax differs. Learning the bash patterns first teaches the pattern; the SIEM translation is straightforward once the pattern is understood.

When to use zeek-cut vs jq vs raw grep. Three tools for three situations.

zeek-cut — for Zeek TSV logs (conn.log, dns.log, http.log, etc.). Strips the header metadata, picks named columns, handles the Zeek-specific (empty) placeholder for missing fields. Use for almost all Zeek log queries.

jq — for Suricata's eve.json (or Zeek in JSON mode). Parses JSON structure, supports complex filtering, pretty-prints. Use for all JSON-format logs.

grep — for initial string matching across any format, and for quick string searches when you're not sure of the field structure. Use as the first tool when exploring unknown data.

Combining works well: grep "SUSPICIOUS_DOMAIN" dns.log | zeek-cut query answers | sort -u. grep narrows the file, zeek-cut extracts the specific fields.

The investigation chain revisited. The "scope → identify → correlate" ordering isn't arbitrary. Scope questions set the boundary of your investigation (which host, which time window, which external entity). Identify questions find the suspicious activity within that boundary. Correlate questions link the suspicious activity to other evidence (other logs, other hosts, other times).

Attempting correlation before identification produces random connections; attempting identification before scoping produces either too many or too few findings. The ten patterns sit naturally in this ordering. Most investigations start with a Pattern 1 (all connections for a host) because the alert typically names a host; they end with a Pattern 9 or 10 (scope expansion across the environment) because the final question is always "what else is compromised?"

Building your own pattern library. These ten are the starting set. Build your own as investigations reveal patterns you reach for repeatedly. The discipline: every time you write a non-trivial query that resolves an investigation question, save it with a label ("find hosts that beacon to a single external IP every 60±5 seconds"). Over a year of investigations, you accumulate 50-100 patterns that match your environment's specific concerns. This library becomes the most valuable operational artefact you produce.

You've built the sensor and mapped the evidence landscape.

NF0 established why network evidence matters when every other source is compromised. NF1 built your Zeek + Suricata sensor with the 10 investigation query patterns. From here, every module teaches protocol-specific investigation against real attack scenarios.

DNS deep dive (NF3) — tunnelling detection, DGA analysis, passive DNS infrastructure mapping, and the INC-NE-2026-0227 AiTM phishing DNS trail
Protocol analysis (NF4–NF7) — HTTP/HTTPS, SMB lateral movement, SSH tunnelling, and email protocol investigation with Zeek metadata and PCAP
Detection and hunting (NF8–NF11) — Suricata rule writing, C2 beacon detection with JA3, NetFlow analytics, and proactive network threat hunting
NSM architecture (NF13) — production sensor deployment at 1–10 Gbps with Arkime, Security Onion, and enterprise storage planning
INC-NE-2026-0830 capstone (NF14) — multi-stage investigation using only network evidence: phishing → domain-fronted C2 → lateral movement → DNS tunnel exfiltration

Unlock the full course with Premium See Full Syllabus

Cancel anytime

← Previous Next →