In this section

TH1.3 Collection: Iterative Querying

3-4 hours · Module 1 · Free

Operational Objective

Hunt queries are not detection rules. A detection rule runs one query once and fires if the pattern matches. A hunt is a conversation with the data — starting broad to understand the landscape, narrowing based on what you find, and pivoting to related data when the initial results suggest a direction. This subsection teaches the iterative query pattern used in every campaign module, including how to manage Advanced Hunting's execution limits and how to document the query chain for reproducibility.

Deliverable: The ability to execute a multi-step hunt query chain — broad exploration, targeted refinement, entity pivoting — and to document every step for the hunt record.

⏱ Estimated completion: 30 minutes

Hunting is iterative

A detection rule is a single query that fires when conditions match. A hunt is five to fifteen queries, each informed by the results of the previous one, converging on a finding or confirming absence.

The pattern is consistent across every campaign in this course:

// Step 1: Orientation — how much sign-in data are we working with?
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0  // Successful sign-ins only
| summarize
    TotalSignIns = count(),
    UniqueUsers = dcount(UserPrincipalName),
    UniqueIPs = dcount(IPAddress),
    UniqueCountries = dcount(
        tostring(LocationDetails.countryOrRegion))
// Understand the data volume before narrowing
// If TotalSignIns is 350,000 — you know results of 50 are a small fraction

// Step 2: Indicator — users with new IPs not in their 30-day baseline
let baseline = SigninLogs
| where TimeGenerated between (ago(37d) .. ago(7d))
| where ResultType == 0
| summarize KnownIPs = make_set(IPAddress, 50)
    by UserPrincipalName;
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| join kind=inner baseline on UserPrincipalName
| where not(IPAddress in (KnownIPs))
| summarize
    NewIPCount = dcount(IPAddress),
    NewIPs = make_set(IPAddress, 5),
    Countries = make_set(
        tostring(LocationDetails.countryOrRegion), 5)
    by UserPrincipalName
// Result: maybe 30 users with sign-ins from new IPs
// Most will be legitimate (VPN changes, travel, new devices)

// Step 3: Enrichment — add context to narrow further
// From the 30 users above, which also had a new MFA method
// registered in the same window?
let suspectUsers = SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
// ... (same filter as step 2 producing the 30 users)
| distinct UserPrincipalName;
AuditLogs
| where TimeGenerated > ago(7d)
| where OperationName has "User registered security info"
| where InitiatedBy.user.userPrincipalName in (suspectUsers)
// New IP + new MFA method in the same week = elevated concern
// Result: maybe 3 users

// Step 4: Pivot — what did these 3 users do?
let confirmedSuspect = datatable(UPN:string)
["j.morrison@northgateeng.com"];  // Example
union
    (CloudAppEvents
    | where TimeGenerated > ago(7d)
    | where AccountId == "j.morrison@northgateeng.com"
    | project TimeGenerated, ActionType, Application,
        RawEventData),
    (EmailEvents
    | where TimeGenerated > ago(7d)
    | where SenderFromAddress == "j.morrison@northgateeng.com"
    | project TimeGenerated, Subject, RecipientEmailAddress)
// Full activity timeline for the suspect account
// Looking for: inbox rules, email forwarding, file access,
//   internal phishing from the compromised account

Expand for Deeper Context

Step 1: Orientation query. Broad. Establishes volume, distribution, and shape of the data you are about to hunt in. "How many sign-in events per day do I have in this time window? How many unique users? What is the distribution by location?" This query does not find threats — it gives you the context to interpret the queries that follow. If you do not know that your environment produces 50,000 sign-ins per day, you cannot judge whether a hunt result of 200 anomalous sign-ins is a lot or a little.

Step 2: Indicator queries. Targeted. Each query tests one aspect of the hypothesis. "How many users have sign-ins from IPs not in their 30-day baseline?" "Of those, how many are from countries not seen before?" "Of those, how many occurred within 24 hours of a new MFA method registration?" Each query narrows the result set. The funnel goes from tens of thousands of events to hundreds to dozens to the handful that warrant investigation.

Step 3: Enrichment queries. Contextual. For each suspicious result from step 2, add the context needed to make a judgment. "What applications did this user access from the anomalous IP?" "Were any inbox rules created in the same time window?" "What was the user's normal activity pattern in the week before the anomaly?" The enrichment query does not find threats — it provides the information the analyst needs to decide whether the anomaly is an attack or a business trip.

Step 4: Pivot queries. Expansive. If step 3 reveals a confirmed or probable compromise, pivot to related data. "The user accessed SharePoint from the anomalous IP — what files were downloaded?" "The user's session was from an IP in Romania — are there other users with sessions from the same IP?" "An inbox rule was created — are there similar rules on other mailboxes?" The pivot expands the investigation from one indicator to the full scope.

The funnel in practice

Figure TH1.3 — The iterative query funnel. Each step narrows the result set. Four queries take 350,000 events to 1 confirmed finding — or to a documented negative at any stage.

Here is a simplified example from the identity compromise hunt (TH4), showing the iterative narrowing:

Four queries. From 350,000 sign-ins to 3 suspicious users to 1 confirmed investigation target. Each query is documented in the hunt record with its purpose, the result count, and the decision it informed.

Managing Advanced Hunting limits

Defender XDR Advanced Hunting has execution constraints you need to work within. The queries will fail or return partial results if you do not account for them:

10-minute query timeout. Complex joins across 30 days of high-volume tables can exceed this. Mitigation: use let statements to pre-filter before joining. The baseline pattern above — building the baseline in a let block and then joining against a filtered detection window — is designed to stay within timeout limits.

10,000-row result limit. Queries return a maximum of 10,000 rows by default. If your step 2 query returns 10,000 results, the data is truncated — you are not seeing the full picture. Mitigation: use summarize to aggregate before reaching the result limit. Count by user, not by event.

30-day maximum lookback. Advanced Hunting queries the last 30 days of data. For longer windows, use Sentinel's Log Analytics query interface (which respects your configured retention period) or search jobs for archived data.

Concurrent query limits. Running multiple expensive queries simultaneously can hit quota limits. Run one query at a time during hunts, review results, then run the next. The iterative pattern naturally enforces this.

Document every query

Every query you run during a hunt becomes part of the hunt record — not just the ones that produced interesting results. The queries that returned nothing are evidence that you looked and did not find. The queries that returned noise are evidence that you considered and excluded legitimate activity.

The hunt documentation standard (TH1.7) covers the full template. For the collection step, the minimum documentation per query:

- Query text (copy-paste from Advanced Hunting) - Purpose (what this query is testing) - Result count - Result assessment (expected/unexpected, interesting/noise) - Decision (narrow further, pivot, escalate, close)

This documentation is what makes your hunting reproducible, auditable, and convertible to detection rules. The query that produced the positive finding in step 3 above — new IP + new MFA registration in the same window — is the query that becomes an analytics rule in the Convert step. Without documentation, the query lives only in the analyst's Advanced Hunting query history and is lost when the session expires.

Try it yourself

Exercise: Run the four-step funnel

Using SigninLogs in your environment, execute the four-step pattern above:

Step 1: Run the orientation query. Record the total sign-in volume, unique users, and unique countries for the last 7 days.

Step 2: Run the baseline comparison. How many users have sign-ins from IPs not in their 30-day baseline? This number is your initial result set.

Step 3: For the users identified in step 2, check AuditLogs for new MFA method registrations in the same window. How many overlap?

Step 4: If any users overlap, examine their full activity timeline. What did they do from the new IP?

Document each query, the result count, and your assessment. This is your first hunt record. If you found something suspicious — congratulations, you have a finding. If you found nothing — you have a documented negative finding and the beginning of a detection rule (step 2's query, deployed as a scheduled analytics rule).

⚠ Compliance Myth: "One query is enough to test a hypothesis"

The myth: Write a query, run it, check the results. If nothing comes back, the hunt is done.

The reality: A single query tests a single aspect of the hypothesis under a single set of parameters. The attacker who used a slightly different method (Graph API instead of PowerShell), operated in a slightly different time window, or targeted a slightly different data path is missed by a single query. Hunting is iterative specifically because attack techniques have variants. The multi-step funnel — orientation, indicator, enrichment, pivot — is designed to catch variants that a single query would miss and to build the contextual understanding that a single result set cannot provide.

Extend this approach

The iterative pattern described here uses manual execution — you run each query, review results, and decide the next query. For hunts you run repeatedly (monthly cadence hunts from TH14), consider using Sentinel notebooks (Jupyter + MSTICPy) to chain the queries programmatically. The notebook executes the full funnel in sequence, presenting results at each stage for analyst review. TH16 covers notebook-based hunting. For initial learning and for hunts you run for the first time, manual iteration is preferred — you learn more about the data by examining each intermediate result than by running the full chain at once.

📋 Operational Artifact — Query Chain Documentation Template

Hunt: [ID] | Step: Collection

| Query # | Purpose | Table(s) | Results | Assessment | Next |

Step 1 (Orientation — data volume): queried SigninLogs, found 347K events across 2.4K users. Assessment: expected volume. Decision: proceed to indicator query. Step 2 (New IPs vs 30-day baseline): queried SigninLogs with baseline comparison, found 28 users with new IPs. Assessment: expected — VPN and travel account for most. Decision: enrich with MFA context. Step 3 (New IP + new MFA in 7 days): joined SigninLogs with AuditLogs, found 2 users with both a new IP and a new MFA registration within 7 days. Assessment: elevated — this combination warrants investigation. Decision: pivot to activity analysis for these 2 users.

| 4 | Full activity timeline | CloudAppEvents + EmailEvents | 1 confirmed suspicious | Inbox rule + file download | Escalate to IR |

Decision: Escalate user j.morrison@northgateeng.com for IR investigation. Evidence: new IP (Romania) + new MFA registration + inbox rule created + SharePoint file downloads — all within 72-hour window.

References Used in This Subsection

Microsoft. "Advanced Hunting — Query Best Practices." Microsoft Learn. https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-best-practices
Microsoft. "Advanced Hunting — Quotas and Usage Parameters." Microsoft Learn. https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-limits

Decision point

You write a complex KQL hunt query that runs for 3 minutes and returns 50,000 rows. The query is technically correct but operationally unusable. Do you refine the query or adjust the hypothesis?

Both. A 3-minute query with 50,000 rows indicates: the hypothesis is too broad (the query is not specific enough to isolate suspicious activity from normal activity), or the time window is too large, or the filter criteria are too permissive. Refine: add filters that exclude known-good patterns, narrow the time window, or aggregate the results to identify statistical anomalies within the 50,000 rows rather than examining each row individually. A hunt query that returns < 100 rows of genuinely anomalous activity is more valuable than 50,000 rows that require manual review.

A hunt query returns 200 results. You have 4 hours remaining in the hunt window. You can investigate 20 results thoroughly or review all 200 superficially. Which approach produces better hunt outcomes?

Review all 200 — you might miss a critical finding in the 180 you skip.

Investigate 20 thoroughly. A superficial review of 200 results produces 200 'looked at it, seemed okay' assessments that provide no investigative value and no documentation for future reference. A thorough investigation of 20 results produces: confirmed findings (true positives requiring remediation), confirmed benign patterns (documented baselines for future comparison), and inconclusive results (flagged for monitoring). Prioritise the 20 by: highest anomaly score, highest-value assets involved, and highest-risk users involved. Document why the remaining 180 were not investigated and recommend a follow-up hunt with refined query criteria to reduce the result set.

Investigate 20 — but only if they are from the most recent 24 hours.

Neither — refine the query first to reduce the result set below 50.

You understand the detection gap and the hunt cycle.

TH0 showed you what detection rules fundamentally cannot catch. TH1 gave you the hypothesis-driven methodology that closes that gap. Now you run the hunts.

10 complete hunt campaigns — from hypothesis through KQL execution through finding disposition, each campaign based on a real TTP
70 production hunt queries — every one mapped to MITRE ATT&CK and tested against realistic telemetry
Advanced KQL for hunting — UEBA composite risk scoring, retroactive IOC sweeps, and hunt management metrics
Hypothesis-Driven Hunt Toolkit lab pack — 30 days of realistic M365 and endpoint telemetry with multiple attack patterns seeded in
TH16 — Scaling hunts across a team — the operating model for a production hunt program

Unlock the full course with Premium See Full Syllabus

← Previous Next →