TH2.9 materialize() for Query Performance

4-5 hours · Module 2 · Free
Operational Objective
Hunt queries that reference the same subquery multiple times — once for the baseline, once for the current window, once for the count — execute that subquery each time, multiplying the computational cost. Advanced Hunting timeouts are the result. materialize() forces a subquery to execute once and caches the result, eliminating redundant computation and keeping complex hunts within execution limits.
Deliverable: The ability to use materialize() to optimize multi-reference hunt queries and avoid Advanced Hunting timeout failures.
⏱ Estimated completion: 15 minutes

The timeout problem

You write a hunt query that computes a per-user baseline, then joins the baseline against current activity, then counts how many users deviate. The query references the baseline subquery three times. Each reference re-executes the entire baseline computation — three full scans of 30 days of SigninLogs. The query times out.

materialize() solves this by executing the subquery once and caching the result in memory. Subsequent references use the cached result.

1
2
3
4
5
6
7
8
9
// Without materialize  baseline computed 3 times (SLOW)
let baseline = SigninLogs
| where TimeGenerated between (ago(37d) .. ago(7d))
| where ResultType == 0
| summarize AvgDaily = count() / 30.0 by UserPrincipalName;
// Reference 1: join
// Reference 2: count
// Reference 3: percentile
// Each reference re-executes the full 30-day scan
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// With materialize  baseline computed once (FAST)
let baseline = materialize(
    SigninLogs
    | where TimeGenerated between (ago(37d) .. ago(7d))
    | where ResultType == 0
    | summarize AvgDaily = count() / 30.0 by UserPrincipalName);
// All subsequent references to 'baseline' use the cached result
let currentWeek = SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| summarize CurrentDaily = count() / 7.0 by UserPrincipalName;
// Join uses cached baseline (fast)
currentWeek
| join kind=inner baseline on UserPrincipalName
| extend Ratio = CurrentDaily / AvgDaily
| where Ratio > 3.0
| project UserPrincipalName, CurrentDaily, AvgDaily, Ratio

When to use materialize()

Use it when a let statement is referenced more than once in the query. If the let is referenced only once, materialize() adds no benefit (and a small overhead for caching).

Common hunting patterns that benefit:

  • Baseline comparisons (baseline referenced in join AND in aggregate)
  • Multi-step enrichment (suspect list referenced across multiple table joins)
  • Score calculations (population statistics referenced for each user’s z-score)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Materialized suspect list used across multiple enrichment joins
let suspects = materialize(
    SigninLogs
    | where TimeGenerated > ago(7d)
    | where ResultType == 0
    | where RiskLevelDuringSignIn in ("medium", "high")
    | distinct UserPrincipalName);
// Join 1: what AuditLog activity did suspects perform?
let auditActivity = AuditLogs
| where TimeGenerated > ago(7d)
| where InitiatedBy.user.userPrincipalName in (suspects)
| summarize AuditOps = count() by
    tostring(InitiatedBy.user.userPrincipalName);
// Join 2: what CloudAppEvents activity did suspects perform?
let cloudActivity = CloudAppEvents
| where TimeGenerated > ago(7d)
| where AccountId in (suspects)
| summarize CloudOps = count() by AccountId;
// Combine  suspects list evaluated once, used in both joins
suspects
| join kind=leftouter auditActivity
    on $left.UserPrincipalName ==
    $right.InitiatedBy_user_userPrincipalName
| join kind=leftouter cloudActivity
    on $left.UserPrincipalName == $right.AccountId
MATERIALIZE — EXECUTE ONCE, REFERENCE MANYWITHOUT MATERIALIZE3 references = 3 × full 30-day scan = TIMEOUTWITH MATERIALIZE1 scan → cached → 3 references from cache = FASTmaterialize() is the difference between a query that completes and one that times out.

Figure TH2.9 — materialize() eliminates redundant computation. One scan, multiple uses.

Try it yourself

Exercise: Add materialize to your baseline query

Take the per-entity deviation query from TH2.1. Wrap the baseline `let` statement in materialize(). Run both versions and compare execution times (visible in the query statistics panel in Advanced Hunting). For small datasets the difference may be negligible. For 30-day baselines across thousands of users, the difference is significant.

⚠ Compliance Myth: "Query optimization is a nice-to-have — focus on getting the right answer first"

The myth: Performance tuning is secondary. Write the correct query first, optimize later if needed.

The reality: In Advanced Hunting, a correct query that times out produces no results — which is worse than an approximate query that completes. A 10-minute timeout is a hard constraint. materialize() is not premature optimization — it is the pattern that makes complex hunting queries viable. Learn it before you need it, because the query that times out during a hunt cannot be fixed retrospectively — it wastes the hunt session.

Extend this pattern

materialize() caches data in memory for the duration of the query execution. For very large datasets (millions of rows), the cache may consume significant memory. If the materialized result itself is large, consider pre-filtering more aggressively before materializing. The goal is to materialize the smallest result set that is referenced multiple times — not to materialize entire tables.


References Used in This Subsection

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus