11.3 Writing Effective Hunting Queries
Writing Effective Hunting Queries
Introduction
Required role: Microsoft Sentinel Reader (minimum for hunting queries). Sentinel Contributor for bookmark and hunt management.
Hunting queries differ from analytics rule queries in several important ways. Analytics rules are optimised for scheduled execution — they must be fast, precise, and low in false positives. Hunting queries are optimised for discovery — they can be broader, more complex, and computationally expensive because they run on-demand. A hunting query that returns 200 results requiring manual review is perfectly acceptable. An analytics rule that generates 200 alerts per day is operationally destructive.
This subsection teaches the KQL patterns that power effective hunting. Each pattern targets a different category of threat.
Pattern 1: Rare event discovery
Find events that occur infrequently — things that happen once or twice in 30 days are more interesting to a hunter than things that happen thousands of times.
| |
Rare processes are high-value hunting targets. A legitimate tool used by one administrator appears here — but so does an attacker’s custom tool, a renamed system binary, and a malware dropper. The hunter reviews the list and investigates any unfamiliar or suspicious entries.
| |
Pattern 2: Statistical outlier detection
Find entities whose behaviour deviates significantly from the population.
| |
Users more than 3 standard deviations above their average are statistical outliers. This catches: token replay attacks (automated sign-ins at high frequency), compromised accounts being used for bulk operations, and legitimate but unusual activity (a user running a bulk migration).
Pattern 3: First-time occurrence
Find things happening for the first time — a user’s first sign-in from a new country, a device’s first connection to a new external IP, an application’s first consent grant.
| |
First-time country sign-ins are classic indicators of credential compromise — an attacker signing in from their own location, which differs from the victim’s historical locations.
| |
First-time service principal sign-in from a new IP. Service principals should have predictable sign-in patterns — new IPs are anomalous.
| |
Service principal sign-ins from new IPs are higher-fidelity signals than user sign-ins from new IPs — because service principals should only authenticate from known infrastructure. Any new IP is suspicious by default.
Interpreting hunting query results
Raw query results require interpretation. These guidelines help distinguish threats from noise.
High-confidence threat indicators: Multiple corroborating signals (sign-in from unusual IP + inbox rule creation + email forwarding). Activity from IPs in threat intelligence databases. Service accounts performing interactive operations. Processes with encoded command lines. Activity at unusual times from non-shift-working users.
Medium-confidence suspicious indicators: Single anomalous event without corroboration. First-time country access (could be travel). Rare process execution on a single device (could be an IT tool). Elevated UEBA score without specific anomaly type.
Low-confidence/likely benign indicators: Activity from known VPN providers. Sign-ins during business hours from the user’s home country. Processes executed by IT administrators on IT infrastructure. Automated operations from known service accounts.
The analyst’s decision tree: For each hunting result: (1) Is it in TI databases? → Yes: high confidence, promote to incident. (2) Does it have corroborating signals? → Yes: medium-high confidence, investigate further. (3) Is it a first-time occurrence for this entity? → Yes: medium confidence, verify with the user. (4) Is there a plausible benign explanation? → Yes: low confidence, bookmark as benign.
Pattern 4: Temporal anomaly detection
Find activity occurring at unusual times — outside business hours, on weekends, or during holidays.
| |
Exclude known night-shift workers and on-call staff via a watchlist. Remaining night-time sign-ins are suspicious — either an attacker operating in a different time zone, or a legitimate user working late (which still warrants verification).
Pattern 5: Cross-table correlation hunting
Join data across multiple tables to find patterns invisible in any single table.
| |
This is the AiTM phishing hunt — finding the attack chain that Module 14 investigates in full. The cross-table join reveals a sequence that no single table can show: sign-in from unusual IP followed by inbox rule creation. This is more powerful than any single-table analytics rule.
Pattern 6: Stacking (frequency analysis)
Count occurrences and examine the distribution. The most common values are normal. The least common values are interesting.
| |
User agents used by fewer than 3 users but appearing more than 5 times suggest scripted or automated sign-ins — potentially an attacker using a custom tool or a token replay script that does not mimic a standard browser user agent.
Pattern 7: Process tree analysis
Reconstruct parent-child process relationships to find suspicious execution chains.
| |
Office applications and browsers spawning command interpreters is the classic initial access pattern: a user opens a phishing document → the macro executes PowerShell → the attacker has code execution. The grandparent process (explorer.exe → outlook.exe → powershell.exe) reveals the full chain.
| |
LOLBins are legitimate Windows binaries abused by attackers for download, execution, and defence evasion. The query filters for command-line arguments that indicate abuse (URLs, network paths, decoding, script execution).
Pattern 8: DNS hunting
DNS queries reveal communication patterns that other logs miss — particularly C2 communication using domain generation algorithms (DGA) or DNS tunnelling.
| |
DGA domains have long, random-looking subdomains (e.g., “a7f2b9e4c1d3.evil.com”). Legitimate domains rarely have subdomains exceeding 20 characters. This query surfaces DGA patterns that may indicate malware calling home.
| |
Hunting query optimisation for large datasets
Hunting queries run on-demand, but slow queries waste analyst time. Optimise for large datasets.
Filter before summarise. Apply where clauses as early as possible in the query — this reduces the data volume that summarize processes.
Use let for materialised lookups. If your query joins with a watchlist or small table, materialise it first: let KnownIPs = materialize(_GetWatchlist('CorporateIPs') | project SearchKey);. This evaluates the watchlist once and caches it.
Limit time ranges to what you need. A 90-day hunt sounds thorough but may be unnecessarily slow. Start with 7 days. If the hypothesis is not confirmed in 7 days, expand to 30 days. Only search 90 days if the hypothesis specifically requires it (e.g., “the campaign was active 3 months ago”).
Use take for exploration. When developing a new hunting query, add | take 100 to preview results quickly before running the full query. Remove take for the final execution.
Avoid unnecessary unions. If your hypothesis only applies to one data source, query that table directly — do not use ASIM unifying parsers that query all sources unnecessarily.
Pattern 7: Process tree analysis (endpoint hunting)
Trace the parent-child relationship of processes to identify suspicious execution chains. Attackers often use legitimate tools (powershell.exe, cmd.exe, wscript.exe) in unusual parent-child combinations.
| |
A rare parent-child process combination — especially something like outlook.exe spawning powershell.exe — is a strong indicator of macro-based malware or phishing-delivered payload execution.
Pattern 8: Beacon detection (network hunting)
Identify regular-interval network connections that may indicate command-and-control beaconing. C2 implants typically call home at predictable intervals.
| |
A remote IP receiving connections at near-exact intervals (regularity > 0.9) from a single device is the beaconing signature. Legitimate services also beacon (Windows Update, telemetry) — filter these known services by IP or domain before investigating.
Every query pattern in this subsection tests a specific hypothesis about attacker behaviour. Rare events = "has an unusual tool been used?" Statistical outliers = "is anyone behaving abnormally?" First-time occurrences = "has something new appeared?" Temporal anomalies = "is anyone active when they should not be?" Cross-table correlation = "do multiple events form an attack chain?" Stacking = "what is the distribution of this attribute, and are there outliers?" Master these six patterns and you can hunt for any threat.
Try it yourself
Run Pattern 3 (first-time country) and Pattern 4 (temporal anomaly) against your workspace. Review the results: are there users signing in from new countries? Are there sign-ins during unusual hours? For each result, determine: is this suspicious, benign, or inconclusive? If suspicious, create a bookmark. This is one complete hunting iteration — hypothesis (new country = potential compromise), query, analysis, and evidence collection.
What you should observe
In a lab, results may be limited. In a production environment, first-time country queries typically return 5-20 results per week — most are legitimate travel, but 1-2 may warrant investigation. Night-time sign-in queries surface both legitimate late workers and potential compromises. The skill is distinguishing between them — which requires correlating with additional context (was the user on a business trip? Is the IP a known VPN?).
Knowledge check
NIST CSF: DE.AE-1 (Baseline of operations established), PR.DS-1 (Data-at-rest is protected). ISO 27001: A.8.15 (Logging), A.8.16 (Monitoring activities). SOC 2: CC7.2 (Monitor system components). Every configuration in this subsection contributes to the logging and monitoring controls that auditors verify.
Check your understanding
1. Which hunting pattern would you use to find an attacker using a custom tool that your analytics rules do not detect?