TH1.10 Behavioral Baselining Methodology

3-4 hours · Module 1 · Free
Operational Objective
Many hunt campaigns depend on comparing current behavior against a baseline of "normal." If the baseline is wrong — too short, too noisy, or contaminated by the attack you are looking for — the hunt produces false negatives (missing real anomalies) or false positives (flagging legitimate changes as threats). This subsection teaches how to construct baselines that are reliable enough to hunt against.
Deliverable: The ability to construct per-user and per-entity behavioral baselines in KQL, select appropriate baseline windows, handle edge cases (new users, role changes, seasonal variation), and avoid the contamination trap.
⏱ Estimated completion: 25 minutes

What “normal” means in your environment

A baseline is a quantitative description of what normal looks like for a specific entity over a specific time window. “Normal” for the CEO’s authentication pattern is different from “normal” for a service account. “Normal” for SharePoint access in December (end-of-year reporting) is different from “normal” in March.

Baselines are per-entity, not global. A global baseline (“the average user signs in from 2.3 unique IPs per week”) obscures the individual patterns that make anomaly detection work. The SOC analyst who uses VPN from three countries while traveling has a different normal than the accountant who signs in from the same office every day. A global threshold catches the accountant’s first new IP but misses the traveler’s fifth — or flags the traveler constantly while ignoring the accountant’s one anomaly.

Baseline construction in KQL

The standard pattern: aggregate historical data per entity over a defined window to create a reference, then compare recent data against that reference.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Per-user authentication baseline: IP, location, device, app
let baselineStart = ago(37d);  // 30-day baseline
let baselineEnd = ago(7d);     // Ends 7 days ago (gap prevents contamination)
let baseline = SigninLogs
| where TimeGenerated between (baselineStart .. baselineEnd)
| where ResultType == 0
| summarize
    BaselineIPs = make_set(IPAddress, 30),
    BaselineCountries = make_set(
        tostring(LocationDetails.countryOrRegion), 10),
    BaselineDevices = make_set(
        tostring(DeviceDetail.displayName), 15),
    BaselineApps = make_set(AppDisplayName, 20),
    AvgDailySignIns = count() / 30.0  // Average sign-ins per day
    by UserPrincipalName;
// This baseline captures each user's normal:
//   which IPs they sign in from
//   which countries those IPs resolve to
//   which devices they use
//   which applications they access
//   how many sign-ins per day is typical
// Any deviation from this baseline in the detection window is a candidate anomaly

The gap window: preventing contamination

Notice the 7-day gap between the baseline window end and the present. This is not arbitrary. If the attacker has been present for 5 days, and your baseline extends to the present, the attacker’s activity is in the baseline. The baseline now considers the attacker’s IP as “normal” — because it has been seen during the baseline period. The hunt misses the compromise.

The gap window must be at least as long as the detection window. If you are hunting in the last 7 days, the baseline should end 7 days ago. If the attacker entered during the baseline period (before the gap), their activity is in the baseline — but it will appear as a consistent anomaly in the detection window, which is still detectable through volume and pattern analysis.

For campaigns where longer dwell time is expected (APT, insider threat), extend the gap. A 90-day baseline ending 30 days ago, with a 30-day detection window, provides protection against attackers with up to 30 days of dwell time.

Edge cases that break baselines

New users. A user who joined the organization 2 weeks ago has no 30-day baseline. Every sign-in is “new” by definition. Exclude users with less than the baseline window of history, or use a shorter baseline for new users with a flag that results for new users have lower confidence.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Identify new users who will not have full baselines
let baselineWindow = 30d;
SigninLogs
| where TimeGenerated > ago(baselineWindow)
| where ResultType == 0
| summarize FirstSeen = min(TimeGenerated) by UserPrincipalName
| where FirstSeen > ago(baselineWindow)
// These users have less than 30 days of sign-in history
// Baseline comparison will produce false positives for them
// Either exclude or flag results as low-confidence

Role changes. A user who transferred from the London office to the New York office last week will sign in from a new country. A user promoted to a new role will access new applications and resources. The baseline reflects the old role. The detection window reflects the new one. Every access in the new role is “anomalous” against the old baseline.

Mitigation: enrich baseline anomalies with HR/directory data. Check AuditLogs for recent role or group membership changes. If the user’s role changed during the gap window, the baseline comparison is less reliable — flag but do not escalate without additional indicators.

Seasonal variation. Download volumes in a finance department spike during quarter-end reporting. Travel-related sign-in anomalies increase during conference season. If your baseline window captures a low-activity period and the detection window captures a high-activity period (or vice versa), the comparison produces systematic bias.

Mitigation: for campaigns sensitive to seasonal variation (TH8 data exfiltration, TH13 insider threat), use a same-period-last-year baseline if data retention allows, or use a 90-day baseline that spans at least one business cycle.

BASELINE CONSTRUCTION — THE GAP WINDOWBASELINE WINDOW (30 days)GAP (7d)DETECTION (7 days)"What does normal look like?"Preventscontamination"What is different now?"now()Without the gap: an attacker present during the baseline period is treated as "normal."The gap ensures the baseline reflects pre-attack behavior.

Figure TH1.10 — Baseline construction with gap window. The gap prevents attacker activity from contaminating the baseline, ensuring anomalies in the detection window are measured against genuine pre-attack behavior.

Try it yourself

Exercise: Build and test a per-user IP baseline

Run the baseline construction query from this subsection against your environment. Then run the new-user identification query to find users who will not have full baselines.

Examine 3 users from the baseline results. For each, check: does the baseline IP set match your expectation of their normal behavior? Does the average daily sign-in count seem reasonable for their role? If the baseline does not match your environmental knowledge of the user, the baseline window or the aggregation logic needs adjustment.

This validation step is critical — building a baseline you have not validated is building on assumptions. Validate before hunting against it.

⚠ Compliance Myth: "A 7-day baseline is sufficient for behavioral detection"

The myth: Short baselines are sufficient because they capture recent behavior most accurately.

The reality: A 7-day baseline captures one work week. It does not capture monthly activities (first-of-month reporting), biweekly patterns (payroll processing), seasonal variation, or infrequent but legitimate activities (quarterly board meeting access, annual audit preparation). A 30-day baseline captures a full business cycle. A 90-day baseline captures seasonal patterns. Shorter baselines produce more false positives because they treat infrequent-but-legitimate activity as anomalous. The appropriate baseline length depends on the technique: authentication anomalies work well with 30 days. Data exfiltration may need 90 days to capture business cycle variation.

Extend this methodology

TH2 (Advanced KQL for Hunting) introduces `make-series` and `series_decompose_anomalies()` — KQL functions that build statistical baselines automatically and flag deviations. The manual baseline methodology in this subsection is the conceptual foundation. The `make-series` approach automates it for scheduled or repeated hunts. Learn the manual approach first (it builds the intuition for what "normal" means in your data), then apply the automated approach for scale.


References Used in This Subsection

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus