6.2 Analyze Query Results Using KQL

8-12 hours · Module 6 · Free

Analyze Query Results Using KQL

SC-200 Exam Objective

Domain 4 — Manage Security Threats: "Create custom hunting queries by using KQL." Summarization and visualization are tested directly — expect questions where you must choose the correct aggregation function for a scenario.

Introduction

In subsection 6.1, you learned to filter and shape data — narrowing millions of rows to the ones relevant to your investigation. But individual rows are only part of the picture. To understand an attack, you need to ask aggregate questions: “How many failed sign-ins came from each country?” “Which IP addresses generated the most traffic?” “What does the sign-in pattern look like over time?”

This subsection teaches you the summarize operator and its aggregation functions — the tools that transform raw log rows into answers. You will learn to count events, count unique values, group results by dimensions, create time-series data, and render visual charts. By the end, you will write a single query that answers “how many failed sign-ins per country per hour for the last 7 days” and displays the result as a time chart.

The operators covered in this subsection are:

summarize — the aggregation engine that groups rows and computes values
Aggregation functions — count(), dcount(), sum(), avg(), min(), max(), make_set(), make_list(), arg_max(), percentile()
bin() — grouping timestamps into time windows for trend analysis
render — generating visual charts from query results
top — a shortcut for getting the N highest or lowest results

Each function is introduced with what it does and why you need it, then demonstrated with an investigation scenario and practiced in your lab.

The `summarize` operator — aggregating data for analysis

What it is

summarize groups rows by one or more columns and computes aggregate values for each group. It transforms a table of individual events into a table of summaries — replacing thousands of rows with a concise answer.

Why it matters

During an investigation, individual events tell you what happened. Summaries tell you the scope and pattern. “There are 47 failed sign-ins” is more actionable than scrolling through 47 individual rows. “The failures come from 3 countries, with 41 from Russia and 6 from the Netherlands” is even more actionable. summarize produces these answers.

Figure 6.5: The summarize operator groups individual rows by a key column (IPAddress) and computes aggregations (count, dcount) for each group. 47 individual failure events become 3 rows of actionable intelligence.

How it works

The basic syntax is: | summarize AggregationFunction by GroupingColumn

Start with a simple count:

1
2
3
4
5
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != "0"
| summarize FailedSignins = count() by IPAddress
| order by FailedSignins desc

This says: “Group all failed sign-ins by IP address and count how many came from each IP. Sort highest count first.”

Expected Output

IPAddress	FailedSignins
198.51.100.44	47
203.0.113.88	12
86.12.45.89	3

What to look for: 198.51.100.44 with 47 failures is clearly anomalous compared to 12 and 3 from other IPs. The 86.12.45.89 with 3 failures is likely a legitimate user who mistyped their password (it is your office IP from Module 0.6). This single query turns 62 individual failure events into a 3-row summary that immediately identifies the attacker's IP.

Multiple aggregation functions in one query

You can compute multiple aggregations simultaneously:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != "0"
| summarize
    FailCount = count(),
    TargetedUsers = dcount(UserPrincipalName),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    TargetApps = make_set(AppDisplayName)
    by IPAddress
| order by FailCount desc

Expected Output

IPAddress	FailCount	TargetedUsers	FirstSeen	LastSeen	TargetApps
198.51.100.44	47	5	Mar 15 08:12	Mar 21 13:41	["Microsoft Office","Azure Portal","Exchange Online"]
203.0.113.88	12	1	Mar 18 22:01	Mar 18 22:14	["Microsoft Office"]

Read this like an analyst: 198.51.100.44 failed 47 times against 5 different users over 6 days, targeting 3 applications — this is a sustained password spray campaign. 203.0.113.88 failed 12 times against 1 user in 13 minutes, targeting 1 app — this is a focused brute force. Two different attack patterns identified in one query. dcount() (distinct count) tells you how many unique users were targeted. make_set() creates a deduplicated list of targeted applications. min()/max() give you the campaign timespan.

Figure 6.6: The same failure count tells completely different stories depending on distinct count. 47 failures against 1 user = brute force (repeated guessing). 47 failures against 47 users = password spray (one guess per user). Always use both count() and dcount() together.

The aggregation functions reference

These are the functions you will use inside summarize. Each has a specific investigation purpose.

Function	What it computes	Investigation use	Example
`count()`	Total number of rows in the group	“How many events?”	`count()`
`dcount(Column)`	Number of unique values	“How many distinct users/IPs/apps?”	`dcount(UserPrincipalName)`
`sum(Column)`	Total of a numeric column	“Total data volume transferred”	`sum(SentBytes)`
`avg(Column)`	Average of a numeric column	“Average session duration”	`avg(DurationMs)`
`min(Column)`	Smallest value	“When did this first occur?”	`min(TimeGenerated)`
`max(Column)`	Largest value	“When was the most recent event?”	`max(TimeGenerated)`
`make_set(Column)`	Deduplicated JSON array of values	“Which unique apps were accessed?”	`make_set(AppDisplayName)`
`make_list(Column)`	JSON array including duplicates	“List all error codes in order”	`make_list(ResultType)`
`arg_max(TimeCol, *)`	The row with the maximum value	“Most recent record per device”	`arg_max(TimeGenerated, *)`
`percentile(Column, N)`	Nth percentile value	“What is the P95 response time?”	`percentile(DurationMs, 95)`

count() vs dcount() — know the difference, it changes the investigation

count() tells you how many events occurred. dcount() tells you how many unique entities were involved. 47 failed sign-ins (count) from 1 user (dcount) = brute force against one person. 47 failed sign-ins from 47 users = password spray against the organization. Same count, completely different attack type. Always use both together.

The `bin()` function — grouping by time windows

What it is

bin() rounds timestamp values to a specified interval, grouping events into time windows. When combined with summarize, it creates time-series data — the foundation of trend analysis and visualization.

Why it matters

“47 failed sign-ins” is useful. “47 failed sign-ins, with 40 occurring in a single 5-minute window” is much more useful — it tells you the attack was concentrated, not spread over days. bin() reveals the temporal pattern.

1
2
3
4
5
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != "0"
| summarize FailCount = count() by bin(TimeGenerated, 1h)
| order by TimeGenerated asc

This groups all failures into 1-hour windows and counts how many occurred in each window.

Expected Output

TimeGenerated	FailCount
Mar 15 08:00	2
Mar 15 09:00	0
Mar 18 22:00	34
Mar 18 23:00	8
Mar 21 13:00	5

What to look for: The spike on March 18 at 22:00 (34 failures in one hour) is the brute-force attack. The surrounding hours have minimal failures. This temporal concentration distinguishes a targeted attack from background noise. Adjust the bin size for different granularity: bin(TimeGenerated, 5m) for minute-by-minute analysis during an active incident, bin(TimeGenerated, 1d) for weekly trends.

The `render` operator — visualizing results

What it is

render generates a visual chart from query results. It transforms tabular data into bar charts, time series, pie charts, and other visualizations directly in the Sentinel Logs query editor.

Why it matters

Patterns that are invisible in a table of numbers become obvious in a chart. A time series showing a spike is instantly readable. A bar chart showing one country dominating failure counts tells a story that 20 rows of numbers do not.

1
2
3
4
5
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != "0"
| summarize FailCount = count() by bin(TimeGenerated, 1h)
| render timechart

This produces a line chart with time on the x-axis and failure count on the y-axis. The attack spike becomes an unmistakable visual peak.

Visualization types:

Render type	Best for	Example
`timechart`	Trends over time	Sign-in failures per hour
`barchart`	Comparing categories	Failures per country
`columnchart`	Comparing categories (vertical)	Events per application
`piechart`	Proportion of a whole	Success vs failure ratio
`scatterchart`	Correlation between two values	Event count vs response time

1
2
3
4
5
6
7
// Bar chart: failures by country
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != "0"
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize FailCount = count() by Country
| render barchart

Try it yourself

Write a query that visualizes the number of successful sign-ins per application over the last 7 days as a bar chart. This shows you which applications are most used in your environment — baseline knowledge for detecting anomalous application access during an investigation.

1
2
3
4
5
6
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == "0"
| summarize SigninCount = count() by AppDisplayName
| order by SigninCount desc
| render barchart

In your lab, you may see only 2-3 applications (Microsoft Office, Azure Portal, perhaps Exchange Online). In production, this query reveals the application usage profile of your organization — critical baseline knowledge for identifying when an attacker accesses an unusual application.

The `top` operator — a shortcut for ranked results

top N by Column returns the N rows with the highest (or lowest) value in the specified column. It is a shortcut for | order by Column desc | take N.

1
2
3
4
5
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != "0"
| summarize FailCount = count() by IPAddress
| top 5 by FailCount

Returns the 5 IPs with the most failures. Quick, clean, and used frequently in investigation for rapid prioritization.

The `arg_max` pattern — most recent record per entity

What it is

arg_max(TimeGenerated, *) within a summarize ... by EntityId returns the entire most recent row for each entity. This is the deduplication pattern — essential when a table contains multiple records per device, user, or IP and you want only the latest.

Why it matters

DeviceInfo contains a row for every device check-in (multiple per day). If you want the current status of each device, you need the most recent record, not all of them.

1
2
3
4
5
DeviceInfo
| where TimeGenerated > ago(7d)
| summarize arg_max(TimeGenerated, *) by DeviceId
| project DeviceName, OSPlatform, OnboardingStatus, SensorHealthState,
    LastSeen = TimeGenerated

This returns exactly one row per device — the most recent — with all columns from that row. You used this pattern in Module 7 (if following the non-standard build order) and will use it in every device investigation.

The arg_max pattern appears in almost every device and user query

Any table that records periodic status updates (DeviceInfo, AADNonInteractiveUserSignInLogs) needs deduplication with arg_max to get the current state. Without it, you see every historical record, making counts and status reports inaccurate.

Combine everything from this subsection into a single comprehensive query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != "0"
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize
    FailCount = count(),
    UniqueUsers = dcount(UserPrincipalName),
    UniqueApps = dcount(AppDisplayName),
    FirstAttempt = min(TimeGenerated),
    LastAttempt = max(TimeGenerated),
    Countries = make_set(Country)
    by IPAddress
| where FailCount > 5
| order by FailCount desc

Expected Output — Failed Sign-In Intelligence Summary

IPAddress	FailCount	UniqueUsers	UniqueApps	FirstAttempt	LastAttempt	Countries
198.51.100.44	47	5	3	Mar 15 08:12	Mar 21 13:41	["RU"]
203.0.113.88	12	1	1	Mar 18 22:01	Mar 18 22:14	["NL"]

One query, complete threat intelligence: IP .44 from Russia — sustained spray across 5 users over 6 days. IP .88 from Netherlands — focused brute force against 1 user in 13 minutes. This query is the starting point for every password-based attack investigation. Save it — you will use it in Modules 9 (detection rule), 11 (AiTM investigation), and 15 (detection engineering).

Check your understanding

1. You need to determine how many unique users were targeted by a specific IP address. Which aggregation function answers this?

count() — counts total events

dcount(UserPrincipalName) — counts unique users. count() tells you how many times the IP appeared; dcount() tells you how many distinct users were targeted. 47 events against 1 user = brute force. 47 events against 47 users = password spray.

make_set(UserPrincipalName) — lists the users but does not count them

sum(UserPrincipalName) — sum only works on numeric columns

dcount() (distinct count) is the investigation function for answering "how many unique entities." Use it alongside count() — the combination reveals attack type. Same count + low dcount = focused attack. Same count + high dcount = spray attack.

2. Your time-series chart shows a flat line all week with a single massive spike on Wednesday at 22:00. What does this pattern indicate?

A concentrated attack event — not background noise. Normal authentication failures are distributed throughout the day. A single sharp spike indicates a deliberate attack (brute force, password spray, or credential stuffing) that started and ended within a narrow time window. Investigate the specific time window with a more granular bin (5-minute intervals) to identify the source IP and targeted users.

System maintenance caused a disruption

Users forgot their passwords

The data connector was briefly disconnected

Time-series visualization makes attack patterns visible that raw data hides. A spike surrounded by flat baseline is the hallmark of a deliberate, time-bounded attack. Increase granularity (bin(TimeGenerated, 5m)) to see the attack's exact start and end time, then pivot to the source IP analysis.

3. DeviceInfo contains multiple rows per device per day. You want a table showing the current status of each device (one row per device). Which pattern do you use?

summarize arg_max(TimeGenerated, *) by DeviceId — this returns the most recent row for each device, giving you the current status. Without this pattern, your query returns historical records that inflate counts and show outdated status.

summarize count() by DeviceId

| take 1

| where TimeGenerated == max(TimeGenerated)

The arg_max deduplication pattern is essential for any table with periodic updates. It returns one row per group key (DeviceId) with the most recent timestamp, and * means "include all columns from that row." You will use this constantly.

4. You want to understand the temporal pattern of an attack. Which bin() interval would you choose for initial analysis, and when would you change it?

Start with bin(TimeGenerated, 1h) for a 7-day overview to identify which hours had activity. Once you identify the attack window, zoom in with bin(TimeGenerated, 5m) or bin(TimeGenerated, 1m) to see minute-by-minute progression. For longer-term trends (weeks/months), use bin(TimeGenerated, 1d). Match the bin size to your analysis scope.

Always use 1-minute bins for maximum detail

Always use 1-day bins for simplicity

The bin size does not affect the analysis

Bin size is an investigation decision. Too wide (1d for a 1-hour attack) hides the spike in a daily average. Too narrow (1m across 30 days) creates thousands of data points that obscure the pattern. Start broad, identify the interesting window, then narrow. This zoom-in technique is a core investigation workflow.

← 6.1 Construct KQL Statements for Microsoft Sentinel 6.3 Build Multi-Table Statements Using KQL →

6.2 Analyze Query Results Using KQL

Analyze Query Results Using KQL

Introduction

The summarize operator — aggregating data for analysis

What it is

Why it matters

How it works

Multiple aggregation functions in one query

The aggregation functions reference

The bin() function — grouping by time windows

What it is

Why it matters

The render operator — visualizing results

What it is

Why it matters

Try it yourself

The top operator — a shortcut for ranked results

The arg_max pattern — most recent record per entity

What it is

Why it matters

Investigation scenario: building a failed sign-in dashboard query

Check your understanding

The `summarize` operator — aggregating data for analysis

The `bin()` function — grouping by time windows

The `render` operator — visualizing results

The `top` operator — a shortcut for ranked results

The `arg_max` pattern — most recent record per entity