TH2.1 Percentile and Statistical Deviation for Outlier Detection
Why statistics, not sorting
The instinct is to sort results by count and look at the top. “Which user downloaded the most files from SharePoint this week?” Sort by download count descending, examine the top 10. The problem: the top 10 might include the engineering team doing a legitimate data migration. Or the CEO’s assistant preparing for a board meeting. Or the IT admin running a backup script. High volume is not suspicious — it is common.
What is suspicious is volume that deviates from the entity’s own baseline or from the population’s distribution. A user who downloads 500 files in a week when their normal is 5 is more suspicious than a user who downloads 500 files when their normal is 400. The absolute number is the same. The deviation is different. Statistics measure the deviation.
percentile() — where does this entity sit?
percentile() returns the value at a given percentile rank in a distribution. In hunting, it answers: “is this entity in the tail of the distribution?”
| |
A more efficient pattern using summarize with multiple percentiles to understand the full distribution:
| |
stdev() and z-scores — how far from normal?
Standard deviation measures the spread of a distribution. A z-score measures how many standard deviations a specific value is from the mean. In hunting, z-scores answer: “how unusual is this entity’s behavior compared to the population?”
| |
The z-score threshold determines sensitivity. Z > 3 is conservative (fewer results, higher confidence each is unusual). Z > 2 is broader (more results, more false positives). For initial hunting campaigns, start with Z > 3 and reduce the threshold if the result set is too small.
Per-entity deviation — comparing to self, not the population
Population-level statistics identify outliers compared to other users. Per-entity deviation identifies outliers compared to the user’s own history — which is more powerful for hunting because it catches changes in individual behavior regardless of what other users are doing.
| |
This per-entity deviation pattern is the foundation for TH4 (authentication anomalies), TH8 (data exfiltration), and TH13 (insider threat). The campaigns apply it to specific technique domains. This subsection teaches the KQL mechanics.
Figure TH2.1 — Two approaches to statistical outlier detection. Population comparison finds entities that are unusual relative to peers. Self-baseline comparison finds entities that are unusual relative to their own history. Hunting uses both.
Try it yourself
Exercise: Profile your environment's sign-in distribution
Run the percentile distribution query (second query in this subsection) against your SigninLogs. Record P50, P75, P90, P95, P99, and Max.
Then run the z-score query adapted for sign-in count rather than file downloads. How many users have z-scores above 3? Examine the top 3 — are they service accounts, administrators, or potentially compromised accounts? This is the population comparison approach in practice.
Then run the per-entity deviation query. How many users have a current/baseline ratio above 3? Are they the same users as the population outliers, or different? The overlap (or lack of it) tells you which approach is more informative for your environment.
The myth: Sorting by volume and examining the top entries is sufficient. Statistical functions add complexity without value.
The reality: Sorting by volume finds the highest-volume entities. It does not find the entities whose behavior changed the most. A user who normally downloads 5 files and this week downloaded 50 (10x increase) is more suspicious than a user who normally downloads 400 and this week downloaded 500 (1.25x increase). Sorting by count ranks the 500 user higher. Statistical deviation ranks the 50 user higher. In hunting, the behavior change is the signal — not the absolute volume. Statistics measure the change. Sorting does not.
Extend this approach
The statistical patterns in this subsection use simple mean and standard deviation — which assume a roughly normal distribution. M365 behavioral data is often heavily skewed (many users with low activity, few with very high activity). For skewed distributions, median absolute deviation (MAD) is a more robust outlier metric than standard deviation. KQL supports this through `percentile(x, 50)` for the median and manual MAD calculation. TH2.3 (series_decompose_anomalies) uses more sophisticated statistical methods that handle non-normal distributions automatically.
References Used in This Subsection
- Microsoft. “KQL percentile() Aggregation Function.” Microsoft Learn. https://learn.microsoft.com/en-us/kusto/query/percentile-aggfunction
- Microsoft. “KQL stdev() Aggregation Function.” Microsoft Learn. https://learn.microsoft.com/en-us/kusto/query/stdev-aggfunction
- Course cross-references: TH1.10 (behavioral baselining), TH4 (authentication anomalies), TH8 (data exfiltration), TH13 (insider threat)
You're reading the free modules of this course
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.