TH1.10 Behavioral Baselining Methodology
What “normal” means in your environment
A baseline is a quantitative description of what normal looks like for a specific entity over a specific time window. “Normal” for the CEO’s authentication pattern is different from “normal” for a service account. “Normal” for SharePoint access in December (end-of-year reporting) is different from “normal” in March.
Baselines are per-entity, not global. A global baseline (“the average user signs in from 2.3 unique IPs per week”) obscures the individual patterns that make anomaly detection work. The SOC analyst who uses VPN from three countries while traveling has a different normal than the accountant who signs in from the same office every day. A global threshold catches the accountant’s first new IP but misses the traveler’s fifth — or flags the traveler constantly while ignoring the accountant’s one anomaly.
Baseline construction in KQL
The standard pattern: aggregate historical data per entity over a defined window to create a reference, then compare recent data against that reference.
| |
The gap window: preventing contamination
Notice the 7-day gap between the baseline window end and the present. This is not arbitrary. If the attacker has been present for 5 days, and your baseline extends to the present, the attacker’s activity is in the baseline. The baseline now considers the attacker’s IP as “normal” — because it has been seen during the baseline period. The hunt misses the compromise.
The gap window must be at least as long as the detection window. If you are hunting in the last 7 days, the baseline should end 7 days ago. If the attacker entered during the baseline period (before the gap), their activity is in the baseline — but it will appear as a consistent anomaly in the detection window, which is still detectable through volume and pattern analysis.
For campaigns where longer dwell time is expected (APT, insider threat), extend the gap. A 90-day baseline ending 30 days ago, with a 30-day detection window, provides protection against attackers with up to 30 days of dwell time.
Edge cases that break baselines
New users. A user who joined the organization 2 weeks ago has no 30-day baseline. Every sign-in is “new” by definition. Exclude users with less than the baseline window of history, or use a shorter baseline for new users with a flag that results for new users have lower confidence.
| |
Role changes. A user who transferred from the London office to the New York office last week will sign in from a new country. A user promoted to a new role will access new applications and resources. The baseline reflects the old role. The detection window reflects the new one. Every access in the new role is “anomalous” against the old baseline.
Mitigation: enrich baseline anomalies with HR/directory data. Check AuditLogs for recent role or group membership changes. If the user’s role changed during the gap window, the baseline comparison is less reliable — flag but do not escalate without additional indicators.
Seasonal variation. Download volumes in a finance department spike during quarter-end reporting. Travel-related sign-in anomalies increase during conference season. If your baseline window captures a low-activity period and the detection window captures a high-activity period (or vice versa), the comparison produces systematic bias.
Mitigation: for campaigns sensitive to seasonal variation (TH8 data exfiltration, TH13 insider threat), use a same-period-last-year baseline if data retention allows, or use a 90-day baseline that spans at least one business cycle.
Figure TH1.10 — Baseline construction with gap window. The gap prevents attacker activity from contaminating the baseline, ensuring anomalies in the detection window are measured against genuine pre-attack behavior.
Try it yourself
Exercise: Build and test a per-user IP baseline
Run the baseline construction query from this subsection against your environment. Then run the new-user identification query to find users who will not have full baselines.
Examine 3 users from the baseline results. For each, check: does the baseline IP set match your expectation of their normal behavior? Does the average daily sign-in count seem reasonable for their role? If the baseline does not match your environmental knowledge of the user, the baseline window or the aggregation logic needs adjustment.
This validation step is critical — building a baseline you have not validated is building on assumptions. Validate before hunting against it.
The myth: Short baselines are sufficient because they capture recent behavior most accurately.
The reality: A 7-day baseline captures one work week. It does not capture monthly activities (first-of-month reporting), biweekly patterns (payroll processing), seasonal variation, or infrequent but legitimate activities (quarterly board meeting access, annual audit preparation). A 30-day baseline captures a full business cycle. A 90-day baseline captures seasonal patterns. Shorter baselines produce more false positives because they treat infrequent-but-legitimate activity as anomalous. The appropriate baseline length depends on the technique: authentication anomalies work well with 30 days. Data exfiltration may need 90 days to capture business cycle variation.
Extend this methodology
TH2 (Advanced KQL for Hunting) introduces `make-series` and `series_decompose_anomalies()` — KQL functions that build statistical baselines automatically and flag deviations. The manual baseline methodology in this subsection is the conceptual foundation. The `make-series` approach automates it for scheduled or repeated hunts. Learn the manual approach first (it builds the intuition for what "normal" means in your data), then apply the automated approach for scale.
References Used in This Subsection
- Microsoft. “KQL make-series Operator.” Microsoft Learn. https://learn.microsoft.com/en-us/kusto/query/make-series-operator
- Course cross-references: TH2 (advanced KQL for baselining), TH4 (authentication baseline application), TH8 (download volume baseline), TH13 (insider behavior baseline)
You're reading the free modules of this course
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.