TH2.7 arg_max and arg_min for Entity Investigation

4-5 hours · Module 2 · Free
Operational Objective
When hunting for a specific entity — a suspect user, a suspicious IP, a potentially compromised device — you often need the most recent event, the earliest event, or the event with the highest risk score for that entity. arg_max and arg_min retrieve the full row corresponding to the maximum or minimum value of a specified column, grouped by entity. This is faster and cleaner than the sort-then-take pattern most analysts use.
Deliverable: The ability to use arg_max and arg_min to efficiently extract key events per entity during hunt investigation and enrichment steps.
⏱ Estimated completion: 15 minutes

The right event, not all events

During the enrichment step (TH1.4), you often need one specific event per entity: the most recent sign-in for a suspect user, the first sign-in from a new IP, the highest-risk event for a flagged account. The naive approach — | sort by TimeGenerated desc | take 1 — works for a single entity but fails when you need it per-entity across a population.

arg_max(Column, *) inside a summarize returns the full row where Column has its maximum value, for each group. arg_min returns the minimum.

1
2
3
4
5
6
7
8
// Most recent sign-in per user  one row per user, full details
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
// Returns the most recent sign-in for every user
// with ALL columns from that row preserved
// Use case: "when did each suspect user last authenticate?"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// First-ever sign-in from each new IP per user
// Hunting application: when did the anomalous IP first appear?
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize arg_min(TimeGenerated, *)
    by UserPrincipalName, IPAddress
// First sign-in for each user-IP combination
// Cross-reference with the anomalous IP list from your hunt:
//   if the first appearance was recent, the IP is new
//   if the first appearance was 25 days ago, it is established
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Highest-risk sign-in per user in the last 7 days
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| where RiskLevelDuringSignIn != "none"
| extend RiskScore = case(
    RiskLevelDuringSignIn == "high", 3,
    RiskLevelDuringSignIn == "medium", 2,
    RiskLevelDuringSignIn == "low", 1, 0)
| summarize arg_max(RiskScore, *) by UserPrincipalName
// One row per user: their highest-risk sign-in event
// with full details (IP, location, app, device, MFA status)
// Enrichment query: "what is the worst thing this user did?"
ARG_MAX / ARG_MIN — ONE KEY EVENT PER ENTITYarg_max(Time, *)Most recent event per entity"When was the last sign-in?"arg_min(Time, *)Earliest event per entity"When did this IP first appear?"arg_max(Risk, *)Highest-value event per entity"What was the riskiest event?"

Figure TH2.7 — Three arg_max/arg_min patterns. Each returns one row per entity with the full event details preserved.

Try it yourself

Exercise: Find first-seen IPs for your suspect users

Take the suspect users from the TH2.1 or TH2.3 exercises. For each, use arg_min to find their first sign-in from each IP in the last 30 days. Are the anomalous IPs recently first-seen, or have they been used before?

⚠ Compliance Myth: "sort | take 1 is equivalent to arg_max — use whichever is simpler"

The myth: sort by TimeGenerated desc | take 1 achieves the same result as arg_max and is easier to read.

The reality: sort + take works for a single entity but does not scale to per-entity extraction. summarize arg_max(TimeGenerated, *) by UserPrincipalName returns the most recent event for every user in a single pass. Achieving the same with sort + take requires a loop or serialization that KQL does not natively support. arg_max is not just a syntax preference — it enables a query pattern that sort + take cannot replicate efficiently.

Extend this pattern

arg_max and arg_min accept multiple output columns instead of `*`: `arg_max(TimeGenerated, IPAddress, AppDisplayName)`. This returns only the specified columns from the maximum row, reducing result size when you do not need all columns. Use this in performance-sensitive queries where the full row is unnecessary.


References Used in This Subsection

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus