In this section
TH1.10 Behavioral Baselining Methodology
What "normal" means in your environment
A baseline is a quantitative description of what normal looks like for a specific entity over a specific time window. "Normal" for the CEO's authentication pattern is different from "normal" for a service account. "Normal" for SharePoint access in December (end-of-year reporting) is different from "normal" in March.
Baselines are per-entity, not global. A global baseline ("the average user signs in from 2.3 unique IPs per week") obscures the individual patterns that make anomaly detection work. The SOC analyst who uses VPN from three countries while traveling has a different normal than the accountant who signs in from the same office every day. A global threshold catches the accountant's first new IP but misses the traveler's fifth — or flags the traveler constantly while ignoring the accountant's one anomaly.
Baseline construction in KQL
// Per-user authentication baseline: IP, location, device, app
let baselineStart = ago(37d); // 30-day baseline
let baselineEnd = ago(7d); // Ends 7 days ago (gap prevents contamination)
let baseline = SigninLogs
| where TimeGenerated between (baselineStart .. baselineEnd)
| where ResultType == 0
| summarize
BaselineIPs = make_set(IPAddress, 30),
BaselineCountries = make_set(
tostring(LocationDetails.countryOrRegion), 10),
BaselineDevices = make_set(
tostring(DeviceDetail.displayName), 15),
BaselineApps = make_set(AppDisplayName, 20),
AvgDailySignIns = count() / 30.0 // Average sign-ins per day
by UserPrincipalName;
// This baseline captures each user's normal:
// which IPs they sign in from
// which countries those IPs resolve to
// which devices they use
// which applications they access
// how many sign-ins per day is typical
// Any deviation from this baseline in the detection window is a candidate anomaly// Identify new users who will not have full baselines
let baselineWindow = 30d;
SigninLogs
| where TimeGenerated > ago(baselineWindow)
| where ResultType == 0
| summarize FirstSeen = min(TimeGenerated) by UserPrincipalName
| where FirstSeen > ago(baselineWindow)
// These users have less than 30 days of sign-in history
// Baseline comparison will produce false positives for them
// Either exclude or flag results as low-confidenceTry it yourself
Exercise: Build and test a per-user IP baseline
Run the baseline construction query from this subsection against your environment. Then run the new-user identification query to find users who will not have full baselines.
Examine 3 users from the baseline results. For each, check: does the baseline IP set match your expectation of their normal behavior? Does the average daily sign-in count seem reasonable for their role? If the baseline does not match your environmental knowledge of the user, the baseline window or the aggregation logic needs adjustment.
This validation step is critical — building a baseline you have not validated is building on assumptions. Validate before hunting against it.
Building the baseline before the hunt
The baseline query runs BEFORE the hunt query. If you are hunting for anomalous SharePoint access, first establish what normal SharePoint access looks like: which users access which libraries, at what volume, during which hours, from which IPs. This baseline becomes the denominator against which the hunt query's results are evaluated. Without the baseline, every finding is ambiguous — is 47 file downloads in one hour anomalous? Without knowing that the user's P95 is 12 downloads per hour, you cannot answer that question. With the baseline, the answer is definitive: 47 downloads is 3.9x the user's P95, which exceeds the 2x anomaly threshold defined in the hunt methodology.
The queries developed during this exercise become reusable templates in your personal hunting library. Parameterise the hardcoded values (user names, IP addresses, time windows) and add a header comment explaining the hypothesis each query tests. A mature hunting program maintains 50-100 parameterised query templates that any team member can execute — reducing the per-hunt preparation time from hours to minutes and ensuring consistent methodology across analysts.
The baseline itself is an artifact worth preserving. Store the baseline query, its results, and the date it was computed alongside each hunt. When the hunt is repeated in 30 days, the baseline may have shifted — seasonal patterns, new employees, infrastructure changes all affect what 'normal' looks like. Comparing the current baseline against the previous baseline reveals environmental drift before it causes false positives in production detection rules.
The myth: Short baselines are sufficient because they capture recent behavior most accurately.
The reality: A 7-day baseline captures one work week. It does not capture monthly activities (first-of-month reporting), biweekly patterns (payroll processing), seasonal variation, or infrequent but legitimate activities (quarterly board meeting access, annual audit preparation). A 30-day baseline captures a full business cycle. A 90-day baseline captures seasonal patterns. Shorter baselines produce more false positives because they treat infrequent-but-legitimate activity as anomalous. The appropriate baseline length depends on the technique: authentication anomalies work well with 30 days. Data exfiltration may need 90 days to capture business cycle variation.
Extend this methodology
TH2 (Advanced KQL for Hunting) introduces `make-series` and `series_decompose_anomalies()` — KQL functions that build statistical baselines automatically and flag deviations. The manual baseline methodology in this subsection is the conceptual foundation. The `make-series` approach automates it for scheduled or repeated hunts. Learn the manual approach first (it builds the intuition for what "normal" means in your data), then apply the automated approach for scale.
References Used in This Subsection
- Microsoft. "KQL make-series Operator." Microsoft Learn. https://learn.microsoft.com/en-us/kusto/query/make-series-operator
- Course cross-references: TH2 (advanced KQL for baselining), TH4 (authentication baseline application), TH8 (download volume baseline), TH13 (insider behavior baseline)
NE environmental considerations
NE's detection environment includes specific factors that influence this rule's operation:
Device diversity: 768 P2 corporate workstations with full Defender for Endpoint telemetry, 58 P1 manufacturing workstations with basic cloud-delivered protection, and 3 RHEL rendering servers with Syslog-only coverage. Rules targeting DeviceProcessEvents operate with full fidelity on P2 devices but may have reduced visibility on P1 devices. Manufacturing workstations in Sheffield and Sunderland represent a detection gap for endpoint-level detections.
You have time for one hunt this quarter. Do you hunt for the threat in the latest advisory or for the gap in your ATT&CK coverage matrix?
Hunt the coverage gap. Advisories describe threats that are CURRENT but may not target NE. Coverage gaps describe techniques that COULD target NE and would succeed undetected. The coverage gap hunt produces a detection rule (closing the gap permanently). The advisory-driven hunt produces a point-in-time assessment (confirming the specific threat is not present today). Both are valuable — but the coverage gap hunt has a longer-lasting impact because it produces a permanent detection improvement.
You understand the detection gap and the hunt cycle.
TH0 showed you what detection rules fundamentally cannot catch. TH1 gave you the hypothesis-driven methodology that closes that gap. Now you run the hunts.
- 10 complete hunt campaigns — from hypothesis through KQL execution through finding disposition, each campaign based on a real TTP
- 70 production hunt queries — every one mapped to MITRE ATT&CK and tested against realistic telemetry
- Advanced KQL for hunting — UEBA composite risk scoring, retroactive IOC sweeps, and hunt management metrics
- Hypothesis-Driven Hunt Toolkit lab pack — 30 days of realistic M365 and endpoint telemetry with multiple attack patterns seeded in
- TH16 — Scaling hunts across a team — the operating model for a production hunt program