In this section

TH1.1 Formulating Hunt Hypotheses

3-4 hours · Module 1 · Free

Operational Objective

A hunt without a hypothesis is a fishing expedition. A hunt with a vague hypothesis produces vague results. A hunt with a specific, testable hypothesis produces either confirmed evidence of compromise or documented evidence of absence — both operationally valuable. This subsection teaches you to formulate hypotheses that are specific enough to test, grounded enough to be relevant, and structured enough to produce detection rules when validated.

Deliverable: The ability to generate testable hunt hypotheses from six distinct sources, evaluate hypothesis quality before investing hunt hours, and maintain a prioritized hypothesis backlog.

⏱ Estimated completion: 30 minutes

What makes a hypothesis testable

A good hunt hypothesis has four properties. Miss any one of them and the hunt degrades — either producing results you cannot interpret or consuming hours on a question you cannot answer with the available data.

Specific. The hypothesis names a technique, a behavior, or an indicator. "There might be threats in our environment" is not a hypothesis. "Compromised accounts are using OAuth applications with Mail.ReadWrite permissions to maintain persistent mailbox access after password resets" is. The specificity determines what you query, what you look for in the results, and how you know when you are done.

// Hypothesis testability check — does the data exist?
// Replace the table name with whatever your hypothesis requires
let requiredTable = "AADNonInteractiveUserSignInLogs";
let requiredWindow = 30d;
union isfuzzy=true
    (AADNonInteractiveUserSignInLogs
    | where TimeGenerated > ago(requiredWindow)
    | summarize EventCount = count(),
        FirstEvent = min(TimeGenerated),
        LastEvent = max(TimeGenerated))
// If EventCount is 0, the table is empty or not ingested
// If FirstEvent is recent but not 30 days ago, retention is shorter
//   than your hypothesis window — adjust the scope accordingly
// Run this for every table your hypothesis requires
//   before writing the hunt queries

Expand for Deeper Context

Testable. The hypothesis can be confirmed or refuted with data you actually have. A hypothesis about DNS tunnelling is not testable if you do not ingest DNS query logs. A hypothesis about endpoint persistence is not testable if Defender for Endpoint is not deployed to the relevant systems. Before you write the first query, confirm that the data sources required to test the hypothesis exist in your environment — the readiness check from TH0.8 applies to every individual hunt, not just the program overall.

Grounded. The hypothesis comes from a credible source — threat intelligence, ATT&CK coverage gaps, prior incident findings, or environmental changes. A hypothesis invented from imagination is untethered from the threat landscape and likely to produce wasted effort. Grounding ensures the technique you are hunting for is something an attacker would actually use against an environment like yours.

Actionable. If the hypothesis is confirmed — if you find evidence of compromise — you know what to do next. Escalate to IR. Revoke sessions. Disable accounts. Remove persistence. If the hypothesis is refuted — no evidence found — you can document the negative finding and convert the hunt query into a detection rule. A hypothesis whose confirmation would leave you saying "now what?" is not ready.

The hypothesis formula

A structured format removes ambiguity:

"If [attacker behavior based on a specific technique], then [data source] will contain [observable indicator] that differs from [baseline or expected normal]."

Examples:

"If an attacker used AiTM phishing to steal session tokens, then AADNonInteractiveUserSignInLogs will contain token refresh events from IP addresses that do not appear in the user's 30-day interactive sign-in baseline."

"If a BEC operator created inbox rules to hide security notifications, then CloudAppEvents will contain New-InboxRule operations targeting keywords like 'password reset,' 'security alert,' or 'suspicious' — created via Graph API rather than Outlook client."

"If ransomware operators are staging in our environment, then DeviceProcessEvents will contain vssadmin.exe or wmic.exe execution with command lines matching shadow copy deletion patterns, within a 48-hour window of backup service disruption events."

Each example names the technique, the data source, the observable indicator, and the baseline against which the indicator is measured. The query logic writes itself from the hypothesis.

Before investing hunt hours, confirm the hypothesis is testable — the data source has events in the required time window:

Six sources for hypothesis generation

You do not need to invent hypotheses from nothing. Six sources provide an ongoing pipeline of grounded, relevant hypotheses — and at least three of them require no external subscription or threat intelligence investment.

Source 1: ATT&CK coverage gaps. TH3 walks through this in full detail, but the principle is simple. Map your detection rules to ATT&CK techniques. Every technique with no detection rule is a candidate hypothesis. "We have no rule for T1098.003 (Additional Cloud Roles). Have any unauthorized application permission grants occurred in the last 90 days?" This source is free, requires only your own Sentinel data, and produces a hypothesis backlog that can sustain monthly hunting for years.

Source 2: Prior incident findings. Every incident investigation produces findings. Some of those findings raise questions about wider scope. The AiTM investigation found that the attacker created inbox rules to intercept password reset emails — were those the only inbox rules, or are there similar rules on other mailboxes across the tenant that were never investigated? The BEC investigation discovered that the attacker accessed SharePoint document libraries — was the compromised account the only one accessing those libraries anomalously, or are there others? Each post-incident question becomes a hunt hypothesis. This source requires no external intelligence — only your own incident history.

Source 3: Threat intelligence. A Microsoft Security Blog post describes Storm-1567 using Cloudflare Workers to host AiTM proxy pages with Turnstile CAPTCHAs. The hypothesis writes itself: "Have any of our users been redirected to Cloudflare Workers domains that proxy Microsoft authentication pages in the last 30 days?" A CISA advisory publishes indicators for a ransomware group targeting your industry. The hypothesis: "Do any of these IOC IP addresses or domains appear in our network or authentication telemetry?" TI-driven hypotheses are high-quality because they are grounded in observed attacker behavior. They are also perishable — the IOCs and techniques have a relevance window that decreases over time as attackers cycle infrastructure.

Source 4: Environmental changes. Your organization deployed Defender for Cloud Apps last month. New telemetry is now flowing into Sentinel. Hypothesis: "What OAuth applications have been consented in our environment that were not visible before this data source was connected?" Your organization completed an acquisition and integrated the acquired company's M365 tenant. Hypothesis: "Are there authentication patterns from the newly integrated tenant that indicate pre-existing compromise carried into our environment?" Environmental changes create new attack surface and new data sources simultaneously — both generate hypotheses.

Source 5: Detection rule failures. A detection rule has not fired in 12 months. Is it a well-targeted rule for a rare event, or a broken rule? Hypothesis: "Using the same technique targeted by [rule name], test whether the rule would fire if the technique occurred today — by manually examining the data the rule queries for indicators that match the rule's logic but may have been missed due to threshold, scope, or exclusion configuration." This is effectively purple-teaming your detection rules through hunting. If the hunt finds evidence the rule should have caught, you have found both a potential compromise and a detection failure.

Source 6: Peer community sharing. ISAC advisories, conference talks, vendor-specific user communities (Microsoft Security Community, SANS DFIR forums), and colleague conversations all produce hunting leads. "Another organization in our sector found OAuth consent phishing via Teams messages. Have we seen the same pattern?" Peer hypotheses are valuable because they come from environments similar to yours — but they require validation against your specific data sources and threat profile before investing hunt hours.

Figure TH1.1 — Six hypothesis sources. Three (orange) require no external investment. Three (green/yellow) benefit from threat intelligence or community engagement. All six produce grounded, testable hypotheses.

Common hypothesis failures

Three patterns produce wasted hunt time. Recognize them before you start.

Too broad. "Are there any compromised accounts in our environment?" You cannot test this — it has no specific technique, no defined indicator, and no data source focus. The analyst opens Advanced Hunting and stares at the query editor without knowing what to write. Narrow it: "Are there accounts whose non-interactive sign-in IPs do not match their interactive sign-in IPs in the last 7 days?" Now the query writes itself.

Untestable with available data. "Are attackers using DNS tunnelling for data exfiltration?" Testable in principle — but if you do not ingest DNS query logs into Sentinel, you cannot test it. Before committing hours to a hunt, confirm the data source exists. If it does not, the hypothesis is still valid — but the action is "enable this data source" rather than "hunt for this technique." Add the data gap to the detection engineering backlog and move to a hypothesis you can test today.

Unfalsifiable. "The attacker is using unknown techniques that leave no traces in any log." This cannot be refuted by any evidence — because the hypothesis defines itself as invisible. Every hunt result (positive or negative) is consistent with the hypothesis. Unfalsifiable hypotheses are not hypotheses. They are anxiety. Hunting addresses the known-unknown layer (techniques that are documented and leave traces in specific data sources). It does not address magical thinking about invisible attackers.

Building your hypothesis backlog

A single hunt campaign tests one hypothesis (occasionally two closely related hypotheses if they share data sources). A sustainable hunting program needs more hypotheses than hunt hours — so you are always choosing the highest-priority hypothesis from a backlog rather than hunting whatever comes to mind.

The backlog is a simple prioritized list. Each entry has:

- The hypothesis statement (specific, testable, grounded, actionable) - The source (which of the six sources generated it) - The data sources required - The priority (based on threat relevance × data availability × detection gap severity) - The estimated effort (hours) - The status (not started, in progress, completed, converted to detection)

TH3 (ATT&CK Coverage Analysis) produces the initial backlog from coverage gaps. Each subsequent module, each threat intelligence report, each incident investigation, and each peer conversation adds to it. The backlog should grow faster than you can hunt — which means you are always selecting the most valuable hypothesis, not hunting everything.

Try it yourself

Exercise: Write three hypotheses from three different sources

Write one hypothesis from each of these three sources. Each must follow the formula: "If [attacker behavior], then [data source] will contain [observable indicator] that differs from [baseline]."

From prior incidents: Think about the last incident your team investigated. What questions did the investigation raise about wider scope? Write the hypothesis.

From ATT&CK coverage gaps: Pick one ATT&CK technique relevant to M365 that you believe has no detection rule in your environment (T1098.003 — Additional Cloud Roles, T1550.001 — Application Access Token, and T1114.002 — Remote Email Collection are common gaps). Write the hypothesis.

From environmental change: What changed in your M365 environment in the last 90 days? A new application deployed, a new conditional access policy, a new user population? Write a hypothesis about what that change might have exposed.

Evaluate each hypothesis against the four properties: specific? testable with your data? grounded? actionable? If any property is missing, refine until all four are met. These three hypotheses are your first hunting backlog entries.

⚠ Compliance Myth: "A good hunter does not need a hypothesis — they just follow their instincts"

The myth: The best threat hunters operate on intuition. They sense something is wrong and investigate until they find it. Hypotheses are an unnecessary constraint that limits creative investigation.

The reality: Intuition is valuable — experienced analysts develop pattern recognition that surfaces suspicions worth investigating. But intuition without structure produces undocumented, unrepeatable, unmeasurable work. The hypothesis does not replace intuition. It translates intuition into a testable prediction that can be confirmed, refuted, documented, and converted to a detection rule. "I have a feeling about OAuth apps" becomes "If an attacker consented to a high-privilege application, AuditLogs will contain Consent to application operations with Mail.ReadWrite or Files.ReadWrite.All permissions from non-admin users in the last 90 days." The intuition is the same. The hypothesis makes it operational.

Extend this approach

If your organization has a formal threat intelligence function (or subscribes to a TI platform like Recorded Future, Mandiant Advantage, or Microsoft Defender Threat Intelligence), integrate TI reports into the hypothesis backlog systematically. Each TI report that describes a technique relevant to M365 environments should produce a backlog entry within 48 hours. The SOC Operations course (Module S12.5) covers the TI-to-detection pipeline in detail. For hunting, the process is identical — except the output is a hunt hypothesis rather than a detection rule. The hypothesis tests whether the technique has already occurred in your environment during the period before the TI report was published.

📋 Operational Artifact — Hypothesis Quality Checklist

Before investing hunt hours, every hypothesis must pass:

☐ Specific — Names a technique, a behavior, or an indicator. Not a vague category.

☐ Testable — The data source required exists in your environment and is ingested into Sentinel or available in Advanced Hunting.

☐ Grounded — Comes from one of six sources: ATT&CK gaps, prior incidents, threat intelligence, environmental changes, detection rule failures, or peer community.

☐ Actionable — If confirmed, you know the next step (escalate to IR, contain, remediate). If refuted, you can document the negative finding and convert the query to a detection rule.

Hypothesis formula: "If [attacker behavior], then [data source] will contain [observable indicator] that differs from [baseline]."

If any checkbox is unchecked: Refine the hypothesis before hunting. Investing hours on a hypothesis that fails any of these properties produces wasted effort.

References Used in This Subsection

MITRE Corporation. "MITRE ATT&CK — Enterprise Matrix." https://attack.mitre.org
MITRE ATT&CK Techniques referenced: T1098.003 (Additional Cloud Roles), T1550.001 (Application Access Token), T1114.002 (Remote Email Collection)
Sqrrl (now Amazon). "A Framework for Cyber Threat Hunting." — hypothesis-driven hunting methodology reference
Course cross-references: TH0.8 (readiness prerequisites), TH3 (ATT&CK coverage analysis), SOC Operations S12.5 (TI-to-detection pipeline)

Decision point

You have time for one hunt this quarter. Do you hunt for the threat in the latest advisory or for the gap in your ATT&CK coverage matrix?

Hunt the coverage gap. Advisories describe threats that are CURRENT but may not target NE. Coverage gaps describe techniques that COULD target NE and would succeed undetected. The coverage gap hunt produces a detection rule (closing the gap permanently). The advisory-driven hunt produces a point-in-time assessment (confirming the specific threat is not present today). Both are valuable — but the coverage gap hunt has a longer-lasting impact because it produces a permanent detection improvement.

A hunt query returns 200 results. You have 4 hours remaining in the hunt window. You can investigate 20 results thoroughly or review all 200 superficially. Which approach produces better hunt outcomes?

Review all 200 — you might miss a critical finding in the 180 you skip.

Investigate 20 thoroughly. A superficial review of 200 results produces 200 'looked at it, seemed okay' assessments that provide no investigative value and no documentation for future reference. A thorough investigation of 20 results produces: confirmed findings (true positives requiring remediation), confirmed benign patterns (documented baselines for future comparison), and inconclusive results (flagged for monitoring). Prioritise the 20 by: highest anomaly score, highest-value assets involved, and highest-risk users involved. Document why the remaining 180 were not investigated and recommend a follow-up hunt with refined query criteria to reduce the result set.

Investigate 20 — but only if they are from the most recent 24 hours.

Neither — refine the query first to reduce the result set below 50.

You understand the detection gap and the hunt cycle.

TH0 showed you what detection rules fundamentally cannot catch. TH1 gave you the hypothesis-driven methodology that closes that gap. Now you run the hunts.

10 complete hunt campaigns — from hypothesis through KQL execution through finding disposition, each campaign based on a real TTP
70 production hunt queries — every one mapped to MITRE ATT&CK and tested against realistic telemetry
Advanced KQL for hunting — UEBA composite risk scoring, retroactive IOC sweeps, and hunt management metrics
Hypothesis-Driven Hunt Toolkit lab pack — 30 days of realistic M365 and endpoint telemetry with multiple attack patterns seeded in
TH16 — Scaling hunts across a team — the operating model for a production hunt program

Unlock the full course with Premium See Full Syllabus

← Previous Next →