In this section

TH1.11 Working with False Positives: The Tuning Cycle

3-4 hours · Module 1 · Free
Operational Objective
Every hunt produces false positives — legitimate activity that matches the hypothesis pattern. Most analysts treat false positives as noise to be discarded. They are not. They are data about what legitimate activity looks like in your environment, and that data is what makes the detection rule you build from the hunt query better than one built without hunting. This subsection teaches you to treat false positives as an input, not waste.
Deliverable: A systematic approach to false positive analysis that produces documented exclusions, calibrated thresholds, and higher-precision detection rules from every hunt.
⏱ Estimated completion: 20 minutes

False positives are the most useful part of your hunt

When your step 2 query returns 28 users with new IPs and only 3 warrant investigation, the 25 legitimate results are not waste. They are the most detailed false positive analysis you will ever get for this technique — because you, the analyst, manually examined each one and determined why it was legitimate.

Those 25 determinations are the exclusion list for the detection rule you build in the Convert step. The rule inherits the precision of your human analysis rather than requiring weeks of post-deployment tuning.

The FP analysis loop

// Example: identify VPN IPs from your false positive analysis
// After examining 25 FPs and noting 18 were VPN-related
// Extract the IPs that appeared most frequently as FPs
let huntFPIPs = SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| summarize UserCount = dcount(UserPrincipalName) by IPAddress
| where UserCount > 10
// IPs used by 10+ users are likely corporate infrastructure
// These are your VPN/proxy exclusion candidates
| project IPAddress, UserCount
| sort by UserCount desc
// Validate: confirm these are known corporate egress IPs
// Add confirmed IPs to the detection rule exclusion list
Expand for Deeper Context

For every result you determine is legitimate during analysis (TH1.4), document three things:

What was it? The specific activity that matched the hypothesis. "User signed in from new IP 198.51.100.12 in Germany."

Why is it legitimate? The specific reason. "User is based in Germany — this is their office IP, which was not in the 30-day baseline because their account was created 15 days ago." Or: "IP belongs to corporate VPN egress range — legitimate VPN rotation."

What exclusion does it imply? The rule modification that would prevent this legitimate activity from triggering a future detection. "Exclude IPs in the corporate VPN range (198.51.100.0/24)." Or: "Exclude users with account age < baseline window."

Categories of hunting false positives

Four categories appear consistently. Each has a different exclusion pattern.

Infrastructure FPs. Corporate VPN, proxy, and cloud gateway IPs. These are legitimate IP addresses used by many users. Exclusion: IP range allowlist maintained as a Sentinel watchlist. Review quarterly as infrastructure changes.

Role-based FPs. Users whose job requires the activity the hunt flags. IT administrators legitimately modify directory settings. Finance users legitimately download large numbers of files during quarter-end. Travel-heavy roles legitimately sign in from new countries. Exclusion: role-based allowlist or threshold adjustment per user group.

Temporal FPs. Activity that is legitimate at certain times — maintenance windows, business cycles, seasonal patterns. Quarterly reporting spikes in data downloads. Monthly patching cycles that look like suspicious process execution. Exclusion: time-based suppression or dynamic thresholds that account for business cycles.

Onboarding FPs. New users, new devices, new applications all produce "first seen" signals during their initial period. Everything about a new user is anomalous against a baseline that does not include them. Exclusion: suppress or flag-as-low-confidence for entities with less than the baseline window of history.

From FP analysis to rule precision

The hunt-to-detection pipeline (TH1.6) converts hunt queries into analytics rules. The FP analysis determines the rule's precision — the percentage of alerts that are true positives.

A rule deployed without FP analysis might have a 10% true positive rate: 9 out of 10 alerts are VPN rotations, travel, or new employees. The SOC tunes it over weeks, adding exclusions one at a time as each FP is triaged.

A rule deployed with hunt-based FP analysis starts with the exclusions already in place. The VPN range is excluded. The new-user window is handled. The role-based allowlist is configured. The rule may start with a 70–90% true positive rate — because the hunt already identified and documented the false positive patterns.

This is why detection rules built from hunts are better than rules built from theory. The hunt analyst saw the legitimate activity. The exclusions reflect reality, not assumptions.

FALSE POSITIVE ANALYSIS → DETECTION RULE PRECISION HUNT RESULTS 28 results: 3 suspicious, 25 legitimate (FP data) FP ANALYSIS Document: what, why, what exclusion INFORMED EXCLUSIONS VPN ranges, role lists, new-user handling PRECISE RULE 70-90% TP Without FP analysis: deploy rule → 10% TP → weeks of tuning → SOC loses trust With FP analysis: deploy rule → 70-90% TP → minimal tuning → SOC trusts the rule

Figure TH1.11 — False positive analysis pipeline. Hunt FPs produce documented exclusions that make the resulting detection rule precise from day one. Without this analysis, the rule requires weeks of post-deployment tuning.

Try it yourself

Exercise: Categorize your false positives

From the TH1.3 exercise results, identify the legitimate results you excluded during analysis. For each, document: what it was, why it was legitimate, which of the four FP categories it falls into (infrastructure, role-based, temporal, onboarding), and what exclusion it implies for a detection rule.

Count the FPs per category. The category with the most FPs is the category your detection rule must address first. If 80% of FPs are VPN-related, a VPN IP allowlist resolves most of the noise before the rule deploys.

The tuning feedback loop

Every false positive from a hunt informs the next hunt iteration. If a hunt for anomalous RDP activity produces 15 results, and 13 are IT administrators performing legitimate remote maintenance, the next iteration should exclude the IT admin group from the hunt scope OR adjust the anomaly threshold to account for their elevated baseline. This is not "reducing sensitivity" — it is refining the signal-to-noise ratio so that the 2 genuine anomalies in the next run are not buried in 13 known-benign results. The tuning log (which exclusions were added, why, and what data supported the decision) prevents scope drift — exclusions accumulate over months, and without documentation, no one remembers why a particular group was excluded.

The queries developed during this exercise become reusable templates in your personal hunting library. Parameterise the hardcoded values (user names, IP addresses, time windows) and add a header comment explaining the hypothesis each query tests. A mature hunting program maintains 50-100 parameterised query templates that any team member can execute — reducing the per-hunt preparation time from hours to minutes and ensuring consistent methodology across analysts.

The tuning log should be version-controlled alongside the hunt queries themselves. When a future analyst asks 'why is the IT admin group excluded from this hunt?' the log provides the answer: date of exclusion, the data that justified it, and the analyst who approved it. Without this history, exclusions accumulate without accountability, and the hunt's scope narrows until it no longer detects the threats it was designed to find.

⚠ Compliance Myth: "A high false positive rate means the hunt hypothesis was wrong"

The myth: If the hunt returns 100 results and 95 are false positives, the hypothesis was poorly formed and the hunt was unproductive.

The reality: A hunt that returns 100 results with 5 true positives and 95 false positives has found 5 genuine security findings AND produced the data to build a detection rule with a 5% false positive rate (after exclusions from the 95 FP analysis). The 95 FPs tell you exactly what legitimate activity looks like for this technique in your environment. A detection rule built from this data — with informed exclusions — will be far more precise than one deployed without it. The hunt was highly productive. It found threats AND calibrated detection.

Extend this approach

Maintain a running false positive knowledge base — a document or Sentinel watchlist that captures the exclusion patterns identified across all hunts. Over time, this knowledge base becomes a reusable asset: the VPN IP list, the travel-heavy user list, the new-user threshold, the quarterly reporting spike window. New hunts reference the knowledge base when building exclusions rather than rediscovering the same patterns. TH15 covers hunt knowledge management in detail.


References Used in This Subsection

NE environmental considerations

NE's detection environment includes specific factors that influence this rule's operation:

Device diversity: 768 P2 corporate workstations with full Defender for Endpoint telemetry, 58 P1 manufacturing workstations with basic cloud-delivered protection, and 3 RHEL rendering servers with Syslog-only coverage. Rules targeting DeviceProcessEvents operate with full fidelity on P2 devices but may have reduced visibility on P1 devices. Manufacturing workstations in Sheffield and Sunderland represent a detection gap for endpoint-level detections.

Expand for Deeper Context

Network topology: 11 offices connected via Palo Alto SD-WAN with full-mesh connectivity. The SD-WAN firewall logs feed CommonSecurityLog in Sentinel. Cross-site lateral movement generates firewall allow events that correlate with DeviceLogonEvents — enabling multi-source detection that single-table rules cannot achieve.

User population: 810 users with distinct behavioral profiles — office workers (predictable hours, consistent applications), field engineers (variable hours, travel patterns), IT administrators (elevated privilege, broad access patterns), and manufacturing operators (fixed shifts, limited application access). Each user population has different detection baselines.

Decision point

You have time for one hunt this quarter. Do you hunt for the threat in the latest advisory or for the gap in your ATT&CK coverage matrix?

Hunt the coverage gap. Advisories describe threats that are CURRENT but may not target NE. Coverage gaps describe techniques that COULD target NE and would succeed undetected. The coverage gap hunt produces a detection rule (closing the gap permanently). The advisory-driven hunt produces a point-in-time assessment (confirming the specific threat is not present today). Both are valuable — but the coverage gap hunt has a longer-lasting impact because it produces a permanent detection improvement.

A hunt query returns 200 results. You have 4 hours remaining in the hunt window. You can investigate 20 results thoroughly or review all 200 superficially. Which approach produces better hunt outcomes?
Review all 200 — you might miss a critical finding in the 180 you skip.
Investigate 20 thoroughly. A superficial review of 200 results produces 200 'looked at it, seemed okay' assessments that provide no investigative value and no documentation for future reference. A thorough investigation of 20 results produces: confirmed findings (true positives requiring remediation), confirmed benign patterns (documented baselines for future comparison), and inconclusive results (flagged for monitoring). Prioritise the 20 by: highest anomaly score, highest-value assets involved, and highest-risk users involved. Document why the remaining 180 were not investigated and recommend a follow-up hunt with refined query criteria to reduce the result set.
Investigate 20 — but only if they are from the most recent 24 hours.
Neither — refine the query first to reduce the result set below 50.

You understand the detection gap and the hunt cycle.

TH0 showed you what detection rules fundamentally cannot catch. TH1 gave you the hypothesis-driven methodology that closes that gap. Now you run the hunts.

  • 10 complete hunt campaigns — from hypothesis through KQL execution through finding disposition, each campaign based on a real TTP
  • 70 production hunt queries — every one mapped to MITRE ATT&CK and tested against realistic telemetry
  • Advanced KQL for hunting — UEBA composite risk scoring, retroactive IOC sweeps, and hunt management metrics
  • Hypothesis-Driven Hunt Toolkit lab pack — 30 days of realistic M365 and endpoint telemetry with multiple attack patterns seeded in
  • TH16 — Scaling hunts across a team — the operating model for a production hunt program
Unlock the full course with Premium See Full Syllabus