TH1.11 Working with False Positives: The Tuning Cycle
False positives are the most useful part of your hunt
When your step 2 query returns 28 users with new IPs and only 3 warrant investigation, the 25 legitimate results are not waste. They are the most detailed false positive analysis you will ever get for this technique — because you, the analyst, manually examined each one and determined why it was legitimate.
Those 25 determinations are the exclusion list for the detection rule you build in the Convert step. The rule inherits the precision of your human analysis rather than requiring weeks of post-deployment tuning.
The FP analysis loop
For every result you determine is legitimate during analysis (TH1.4), document three things:
What was it? The specific activity that matched the hypothesis. “User signed in from new IP 198.51.100.12 in Germany.”
Why is it legitimate? The specific reason. “User is based in Germany — this is their office IP, which was not in the 30-day baseline because their account was created 15 days ago.” Or: “IP belongs to corporate VPN egress range — legitimate VPN rotation.”
What exclusion does it imply? The rule modification that would prevent this legitimate activity from triggering a future detection. “Exclude IPs in the corporate VPN range (198.51.100.0/24).” Or: “Exclude users with account age < baseline window.”
| |
Categories of hunting false positives
Four categories appear consistently. Each has a different exclusion pattern.
Infrastructure FPs. Corporate VPN, proxy, and cloud gateway IPs. These are legitimate IP addresses used by many users. Exclusion: IP range allowlist maintained as a Sentinel watchlist. Review quarterly as infrastructure changes.
Role-based FPs. Users whose job requires the activity the hunt flags. IT administrators legitimately modify directory settings. Finance users legitimately download large numbers of files during quarter-end. Travel-heavy roles legitimately sign in from new countries. Exclusion: role-based allowlist or threshold adjustment per user group.
Temporal FPs. Activity that is legitimate at certain times — maintenance windows, business cycles, seasonal patterns. Quarterly reporting spikes in data downloads. Monthly patching cycles that look like suspicious process execution. Exclusion: time-based suppression or dynamic thresholds that account for business cycles.
Onboarding FPs. New users, new devices, new applications all produce “first seen” signals during their initial period. Everything about a new user is anomalous against a baseline that does not include them. Exclusion: suppress or flag-as-low-confidence for entities with less than the baseline window of history.
From FP analysis to rule precision
The hunt-to-detection pipeline (TH1.6) converts hunt queries into analytics rules. The FP analysis determines the rule’s precision — the percentage of alerts that are true positives.
A rule deployed without FP analysis might have a 10% true positive rate: 9 out of 10 alerts are VPN rotations, travel, or new employees. The SOC tunes it over weeks, adding exclusions one at a time as each FP is triaged.
A rule deployed with hunt-based FP analysis starts with the exclusions already in place. The VPN range is excluded. The new-user window is handled. The role-based allowlist is configured. The rule may start with a 70–90% true positive rate — because the hunt already identified and documented the false positive patterns.
This is why detection rules built from hunts are better than rules built from theory. The hunt analyst saw the legitimate activity. The exclusions reflect reality, not assumptions.
Figure TH1.11 — False positive analysis pipeline. Hunt FPs produce documented exclusions that make the resulting detection rule precise from day one. Without this analysis, the rule requires weeks of post-deployment tuning.
Try it yourself
Exercise: Categorize your false positives
From the TH1.3 exercise results, identify the legitimate results you excluded during analysis. For each, document: what it was, why it was legitimate, which of the four FP categories it falls into (infrastructure, role-based, temporal, onboarding), and what exclusion it implies for a detection rule.
Count the FPs per category. The category with the most FPs is the category your detection rule must address first. If 80% of FPs are VPN-related, a VPN IP allowlist resolves most of the noise before the rule deploys.
The myth: If the hunt returns 100 results and 95 are false positives, the hypothesis was poorly formed and the hunt was unproductive.
The reality: A hunt that returns 100 results with 5 true positives and 95 false positives has found 5 genuine security findings AND produced the data to build a detection rule with a 5% false positive rate (after exclusions from the 95 FP analysis). The 95 FPs tell you exactly what legitimate activity looks like for this technique in your environment. A detection rule built from this data — with informed exclusions — will be far more precise than one deployed without it. The hunt was highly productive. It found threats AND calibrated detection.
Extend this approach
Maintain a running false positive knowledge base — a document or Sentinel watchlist that captures the exclusion patterns identified across all hunts. Over time, this knowledge base becomes a reusable asset: the VPN IP list, the travel-heavy user list, the new-user threshold, the quarterly reporting spike window. New hunts reference the knowledge base when building exclusions rather than rediscovering the same patterns. TH15 covers hunt knowledge management in detail.
References Used in This Subsection
- Microsoft. “Sentinel Watchlists.” Microsoft Learn. https://learn.microsoft.com/en-us/azure/sentinel/watchlists
- Course cross-references: TH1.4 (analysis framework), TH1.6 (detection rule conversion with exclusions)
You're reading the free modules of this course
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.