In this section
TH1.11 Working with False Positives: The Tuning Cycle
False positives are the most useful part of your hunt
When your step 2 query returns 28 users with new IPs and only 3 warrant investigation, the 25 legitimate results are not waste. They are the most detailed false positive analysis you will ever get for this technique — because you, the analyst, manually examined each one and determined why it was legitimate.
Those 25 determinations are the exclusion list for the detection rule you build in the Convert step. The rule inherits the precision of your human analysis rather than requiring weeks of post-deployment tuning.
The FP analysis loop
// Example: identify VPN IPs from your false positive analysis
// After examining 25 FPs and noting 18 were VPN-related
// Extract the IPs that appeared most frequently as FPs
let huntFPIPs = SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| summarize UserCount = dcount(UserPrincipalName) by IPAddress
| where UserCount > 10
// IPs used by 10+ users are likely corporate infrastructure
// These are your VPN/proxy exclusion candidates
| project IPAddress, UserCount
| sort by UserCount desc
// Validate: confirm these are known corporate egress IPs
// Add confirmed IPs to the detection rule exclusion listTry it yourself
Exercise: Categorize your false positives
From the TH1.3 exercise results, identify the legitimate results you excluded during analysis. For each, document: what it was, why it was legitimate, which of the four FP categories it falls into (infrastructure, role-based, temporal, onboarding), and what exclusion it implies for a detection rule.
Count the FPs per category. The category with the most FPs is the category your detection rule must address first. If 80% of FPs are VPN-related, a VPN IP allowlist resolves most of the noise before the rule deploys.
The tuning feedback loop
Every false positive from a hunt informs the next hunt iteration. If a hunt for anomalous RDP activity produces 15 results, and 13 are IT administrators performing legitimate remote maintenance, the next iteration should exclude the IT admin group from the hunt scope OR adjust the anomaly threshold to account for their elevated baseline. This is not "reducing sensitivity" — it is refining the signal-to-noise ratio so that the 2 genuine anomalies in the next run are not buried in 13 known-benign results. The tuning log (which exclusions were added, why, and what data supported the decision) prevents scope drift — exclusions accumulate over months, and without documentation, no one remembers why a particular group was excluded.
The queries developed during this exercise become reusable templates in your personal hunting library. Parameterise the hardcoded values (user names, IP addresses, time windows) and add a header comment explaining the hypothesis each query tests. A mature hunting program maintains 50-100 parameterised query templates that any team member can execute — reducing the per-hunt preparation time from hours to minutes and ensuring consistent methodology across analysts.
The tuning log should be version-controlled alongside the hunt queries themselves. When a future analyst asks 'why is the IT admin group excluded from this hunt?' the log provides the answer: date of exclusion, the data that justified it, and the analyst who approved it. Without this history, exclusions accumulate without accountability, and the hunt's scope narrows until it no longer detects the threats it was designed to find.
The myth: If the hunt returns 100 results and 95 are false positives, the hypothesis was poorly formed and the hunt was unproductive.
The reality: A hunt that returns 100 results with 5 true positives and 95 false positives has found 5 genuine security findings AND produced the data to build a detection rule with a 5% false positive rate (after exclusions from the 95 FP analysis). The 95 FPs tell you exactly what legitimate activity looks like for this technique in your environment. A detection rule built from this data — with informed exclusions — will be far more precise than one deployed without it. The hunt was highly productive. It found threats AND calibrated detection.
Extend this approach
Maintain a running false positive knowledge base — a document or Sentinel watchlist that captures the exclusion patterns identified across all hunts. Over time, this knowledge base becomes a reusable asset: the VPN IP list, the travel-heavy user list, the new-user threshold, the quarterly reporting spike window. New hunts reference the knowledge base when building exclusions rather than rediscovering the same patterns. TH15 covers hunt knowledge management in detail.
References Used in This Subsection
- Microsoft. "Sentinel Watchlists." Microsoft Learn. https://learn.microsoft.com/en-us/azure/sentinel/watchlists
- Course cross-references: TH1.4 (analysis framework), TH1.6 (detection rule conversion with exclusions)
NE environmental considerations
NE's detection environment includes specific factors that influence this rule's operation:
Device diversity: 768 P2 corporate workstations with full Defender for Endpoint telemetry, 58 P1 manufacturing workstations with basic cloud-delivered protection, and 3 RHEL rendering servers with Syslog-only coverage. Rules targeting DeviceProcessEvents operate with full fidelity on P2 devices but may have reduced visibility on P1 devices. Manufacturing workstations in Sheffield and Sunderland represent a detection gap for endpoint-level detections.
You have time for one hunt this quarter. Do you hunt for the threat in the latest advisory or for the gap in your ATT&CK coverage matrix?
Hunt the coverage gap. Advisories describe threats that are CURRENT but may not target NE. Coverage gaps describe techniques that COULD target NE and would succeed undetected. The coverage gap hunt produces a detection rule (closing the gap permanently). The advisory-driven hunt produces a point-in-time assessment (confirming the specific threat is not present today). Both are valuable — but the coverage gap hunt has a longer-lasting impact because it produces a permanent detection improvement.
You understand the detection gap and the hunt cycle.
TH0 showed you what detection rules fundamentally cannot catch. TH1 gave you the hypothesis-driven methodology that closes that gap. Now you run the hunts.
- 10 complete hunt campaigns — from hypothesis through KQL execution through finding disposition, each campaign based on a real TTP
- 70 production hunt queries — every one mapped to MITRE ATT&CK and tested against realistic telemetry
- Advanced KQL for hunting — UEBA composite risk scoring, retroactive IOC sweeps, and hunt management metrics
- Hypothesis-Driven Hunt Toolkit lab pack — 30 days of realistic M365 and endpoint telemetry with multiple attack patterns seeded in
- TH16 — Scaling hunts across a team — the operating model for a production hunt program