TH2.1 Percentile and Statistical Deviation for Outlier Detection

4-5 hours · Module 2 · Free
Operational Objective
Most hunting questions reduce to: "which entities are behaving differently from the rest?" Answering that question statistically — rather than by eyeballing a sorted list — requires percentile functions, standard deviation, and z-score calculations in KQL. This subsection teaches these operators with hunting applications: identifying users with abnormal sign-in volumes, abnormal data access, or abnormal application usage.
Deliverable: The ability to use percentile(), stdev(), avg(), and calculated z-scores in KQL to identify statistical outliers in M365 behavioral data.
⏱ Estimated completion: 25 minutes

Why statistics, not sorting

The instinct is to sort results by count and look at the top. “Which user downloaded the most files from SharePoint this week?” Sort by download count descending, examine the top 10. The problem: the top 10 might include the engineering team doing a legitimate data migration. Or the CEO’s assistant preparing for a board meeting. Or the IT admin running a backup script. High volume is not suspicious — it is common.

What is suspicious is volume that deviates from the entity’s own baseline or from the population’s distribution. A user who downloads 500 files in a week when their normal is 5 is more suspicious than a user who downloads 500 files when their normal is 400. The absolute number is the same. The deviation is different. Statistics measure the deviation.

percentile() — where does this entity sit?

percentile() returns the value at a given percentile rank in a distribution. In hunting, it answers: “is this entity in the tail of the distribution?”

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// Which users are above the 95th percentile for sign-in volume?
// Users in the top 5% may warrant investigation  or may be
// admins and service accounts. The statistic identifies them;
// the analyst determines the context.
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| summarize SignInCount = count() by UserPrincipalName
| extend P95 = toscalar(
    SigninLogs
    | where TimeGenerated > ago(7d)
    | where ResultType == 0
    | summarize SignInCount = count() by UserPrincipalName
    | summarize percentile(SignInCount, 95))
| where SignInCount > P95
| project UserPrincipalName, SignInCount, P95
| sort by SignInCount desc
// Results: users whose sign-in volume exceeds the 95th percentile
// Not all are suspicious  but all are statistically unusual
// Enrich with user context (TH1.4) before drawing conclusions

A more efficient pattern using summarize with multiple percentiles to understand the full distribution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Sign-in volume distribution across all users
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| summarize SignInCount = count() by UserPrincipalName
| summarize
    P50 = percentile(SignInCount, 50),
    P75 = percentile(SignInCount, 75),
    P90 = percentile(SignInCount, 90),
    P95 = percentile(SignInCount, 95),
    P99 = percentile(SignInCount, 99),
    Max = max(SignInCount),
    UserCount = count()
// The shape of the distribution tells you about your environment:
// P50 = the typical user's sign-in count
// P95 = the threshold above which only 5% of users fall
// P99 = the extreme tail — these users are doing something unusual
// Max = the single highest volume user — investigate this one

stdev() and z-scores — how far from normal?

Standard deviation measures the spread of a distribution. A z-score measures how many standard deviations a specific value is from the mean. In hunting, z-scores answer: “how unusual is this entity’s behavior compared to the population?”

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Z-score for per-user file download volume
// Identifies users whose download behavior is statistically anomalous
CloudAppEvents
| where TimeGenerated > ago(7d)
| where ActionType == "FileDownloaded"
| summarize DownloadCount = count() by AccountId
| extend PopulationMean = toscalar(
    CloudAppEvents
    | where TimeGenerated > ago(7d)
    | where ActionType == "FileDownloaded"
    | summarize DownloadCount = count() by AccountId
    | summarize avg(DownloadCount))
| extend PopulationStdev = toscalar(
    CloudAppEvents
    | where TimeGenerated > ago(7d)
    | where ActionType == "FileDownloaded"
    | summarize DownloadCount = count() by AccountId
    | summarize stdev(DownloadCount))
| extend ZScore = (DownloadCount - PopulationMean) / PopulationStdev
| where ZScore > 3.0
// Z-score > 3 = more than 3 standard deviations above the mean
// In a normal distribution, this captures the top 0.13% of values
// These users are extreme outliers in download volume
| project AccountId, DownloadCount, ZScore
| sort by ZScore desc

The z-score threshold determines sensitivity. Z > 3 is conservative (fewer results, higher confidence each is unusual). Z > 2 is broader (more results, more false positives). For initial hunting campaigns, start with Z > 3 and reduce the threshold if the result set is too small.

Per-entity deviation — comparing to self, not the population

Population-level statistics identify outliers compared to other users. Per-entity deviation identifies outliers compared to the user’s own history — which is more powerful for hunting because it catches changes in individual behavior regardless of what other users are doing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Per-user sign-in volume: this week vs their 30-day average
let baseline = SigninLogs
| where TimeGenerated between (ago(37d) .. ago(7d))
| where ResultType == 0
| summarize BaselineDaily = count() / 30.0 by UserPrincipalName;
let current = SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| summarize CurrentDaily = count() / 7.0 by UserPrincipalName;
current
| join kind=inner baseline on UserPrincipalName
| extend DeviationRatio = CurrentDaily / BaselineDaily
| where DeviationRatio > 3.0
// Users whose current daily sign-in rate is 3x their 30-day average
// A user who normally signs in 10 times/day now signing in 30 times/day
//   is suspicious regardless of whether 30/day is normal for other users
| project UserPrincipalName, CurrentDaily, BaselineDaily, DeviationRatio
| sort by DeviationRatio desc

This per-entity deviation pattern is the foundation for TH4 (authentication anomalies), TH8 (data exfiltration), and TH13 (insider threat). The campaigns apply it to specific technique domains. This subsection teaches the KQL mechanics.

TWO OUTLIER DETECTION APPROACHESPOPULATION COMPARISONpercentile(), stdev(), z-score"Is this user unusual compared to all users?"Best for: discovering unknown anomalous entitiesWeakness: high-volume roles produce FPs (admins, service accounts)SELF-BASELINE COMPARISONbaseline avg → current / baseline ratio"Is this user unusual compared to themselves?"Best for: detecting behavior changes in known entitiesWeakness: new users have no baseline (TH1.10 edge case)Use population comparison for discovery. Use self-baseline for investigation.Campaign modules combine both: population outliers enriched with self-baseline deviation.

Figure TH2.1 — Two approaches to statistical outlier detection. Population comparison finds entities that are unusual relative to peers. Self-baseline comparison finds entities that are unusual relative to their own history. Hunting uses both.

Try it yourself

Exercise: Profile your environment's sign-in distribution

Run the percentile distribution query (second query in this subsection) against your SigninLogs. Record P50, P75, P90, P95, P99, and Max.

Then run the z-score query adapted for sign-in count rather than file downloads. How many users have z-scores above 3? Examine the top 3 — are they service accounts, administrators, or potentially compromised accounts? This is the population comparison approach in practice.

Then run the per-entity deviation query. How many users have a current/baseline ratio above 3? Are they the same users as the population outliers, or different? The overlap (or lack of it) tells you which approach is more informative for your environment.

⚠ Compliance Myth: "Statistical analysis is overkill for threat hunting — just sort by count and look at the top"

The myth: Sorting by volume and examining the top entries is sufficient. Statistical functions add complexity without value.

The reality: Sorting by volume finds the highest-volume entities. It does not find the entities whose behavior changed the most. A user who normally downloads 5 files and this week downloaded 50 (10x increase) is more suspicious than a user who normally downloads 400 and this week downloaded 500 (1.25x increase). Sorting by count ranks the 500 user higher. Statistical deviation ranks the 50 user higher. In hunting, the behavior change is the signal — not the absolute volume. Statistics measure the change. Sorting does not.

Extend this approach

The statistical patterns in this subsection use simple mean and standard deviation — which assume a roughly normal distribution. M365 behavioral data is often heavily skewed (many users with low activity, few with very high activity). For skewed distributions, median absolute deviation (MAD) is a more robust outlier metric than standard deviation. KQL supports this through `percentile(x, 50)` for the median and manual MAD calculation. TH2.3 (series_decompose_anomalies) uses more sophisticated statistical methods that handle non-normal distributions automatically.


References Used in This Subsection

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus