Module 2: KQL Fundamentals for Security Analysts
The KQL operators, functions, and patterns you'll use in every investigation. Written for security analysts, not data engineers.
What You'll Learn
- Write queries using where, project, summarize, extend, and join
- Use time-based filtering and datetime functions
- Parse and extract data from semi-structured fields
- Build reusable query patterns for sign-in analysis, email tracing, and alert triage
SC-200 Exam Objectives Covered
- Build KQL statements
- Analyze data by using KQL
Kusto Query Language is the query language behind Microsoft Sentinel, Defender XDR Advanced Hunting, and Log Analytics. Every detection rule, every hunting query, and every workbook you build in the Microsoft security stack runs on KQL. If you can write KQL, you can investigate. If you cannot, you are dependent on whoever wrote the queries you are running.
This module covers the operators and patterns that account for the vast majority of security work. It is not a complete language reference — Microsoft publishes that at learn.microsoft.com/en-us/kusto/query. What follows is the working subset that a SOC analyst reaches for daily, explained in the context of security tables and real investigation questions.
Where to practise: You can run every query in this module in the Log Analytics demo environment or in the Advanced Hunting blade within the Microsoft Defender portal if you have access to a tenant.
How KQL Queries Work
A KQL query is a pipeline. Data starts at a table, flows through operators separated by the pipe character |, and emerges as a result set. Each operator receives a tabular input, transforms it, and passes it to the next operator.
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0
| summarize FailedAttempts = count() by UserPrincipalName
| where FailedAttempts > 10
| sort by FailedAttempts desc
Read that top to bottom: start with sign-in logs from the last 24 hours, keep only failures, count failures per user, keep users with more than 10 failures, sort by count descending. The pipe model means you build queries incrementally — run each line, inspect the output, add the next operator.
Three rules apply to everything that follows:
KQL is case-sensitive for table names, column names, operators, and functions.
SigninLogsworks.signinlogsdoes not. String comparisons default to case-insensitive for most operators, but the language itself is strict. (Source: KQL overview — Microsoft Learn)Filter early, project late. Every row and column you eliminate early reduces the data the engine processes downstream. Put your
whereclauses — especially time filters — before aggregations or joins. (Source: Query best practices — Microsoft Learn)Time-filter first. Kusto maintains a highly efficient index on datetime columns. Filtering on
TimeGeneratedfirst allows the engine to skip entire data partitions without reading them. Always start with a time constraint.
The Tables You Will Query
Before learning operators, you need to know where the data lives. Microsoft Sentinel and Defender XDR expose dozens of tables. These are the ones you will query most as a security analyst:
| Table | What It Contains | Source |
|---|---|---|
SigninLogs | Interactive Azure AD / Entra ID sign-in attempts | Entra ID |
AADNonInteractiveUserSignInLogs | Token refreshes, background app sign-ins | Entra ID |
SecurityAlert | Alerts from Sentinel analytics rules, Defender XDR, Defender for Cloud | Multiple |
SecurityIncident | Incidents created from correlated alerts | Sentinel |
SecurityEvent | Windows Security Event Log entries (logon events, process creation, etc.) | Windows agents |
EmailEvents | Email delivery and blocking events | Defender for Office 365 |
EmailUrlInfo | URLs within emails | Defender for Office 365 |
DeviceProcessEvents | Process execution on endpoints | Defender for Endpoint |
DeviceNetworkEvents | Network connections from endpoints | Defender for Endpoint |
AuditLogs | Entra ID configuration changes (user creation, group membership, app consent) | Entra ID |
OfficeActivity | SharePoint, OneDrive, Exchange, and Teams operations | Office 365 |
(Source: Microsoft Sentinel data source schema reference — Microsoft Learn)
Every table has a TimeGenerated column that records when the event was ingested. This is the column you use for time-based filtering.
Core Operators
where — Filter Rows
where keeps rows that match a condition. It is the operator you will use more than any other.
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| where AppDisplayName == "Microsoft Teams"
This returns successful sign-ins to Microsoft Teams in the last seven days. Multiple where clauses act as logical AND — every condition must be true. You can combine conditions in a single where with and and or:
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0 and (AppDisplayName == "Exchange Online" or AppDisplayName == "SharePoint Online")
Comparison operators:
| Operator | Meaning |
|---|---|
== | Equals (case-sensitive for strings) |
!= | Not equals |
=~ | Equals (case-insensitive) |
!~ | Not equals (case-insensitive) |
>, <, >=, <= | Numeric and datetime comparisons |
in | Matches any value in a list (case-sensitive) |
in~ | Matches any value in a list (case-insensitive) |
!in | Does not match any value in a list |
between | Within a range (inclusive) |
The in operator is particularly useful for filtering on multiple known values:
SecurityEvent
| where TimeGenerated > ago(1d)
| where EventID in (4624, 4625, 4648)
That query returns successful logons (4624), failed logons (4625), and explicit credential logons (4648) in one pass.
String Operators — has vs contains
This is the single most important performance distinction in KQL for security analysts. Both has and contains search within strings, but they work differently and the performance gap is significant.
has performs a term-level match. Kusto breaks every string into terms — sequences of four or more alphanumeric characters bounded by non-alphanumeric characters or string boundaries. The engine maintains a term index, so has searches are fast index lookups.
contains performs a substring scan. It searches for the exact character sequence anywhere in the string, character by character. No index is used. On large tables, contains can be orders of magnitude slower.
(Source: String operators — Microsoft Learn)
// Fast — uses the term index
EmailEvents
| where TimeGenerated > ago(1d)
| where SenderFromAddress has "paypal"
// Slow — full substring scan
EmailEvents
| where TimeGenerated > ago(1d)
| where SenderFromAddress contains "paypal"
Both return similar results in most cases, but has runs faster because it checks the term index for “paypal” rather than scanning every character of every email address.
When you must use contains: When your search term is part of a larger word and not a standalone term. For example, searching for “admin” inside “sysadmin” requires contains because “admin” is a substring of the term “sysadmin”, not a separate term.
Practical rule: Default to has. Switch to contains only when you need partial-word matching and understand you are paying a performance cost.
The full family of string operators:
| Operator | Behaviour | Performance |
|---|---|---|
has | Term match (case-insensitive) | Fast (indexed) |
has_cs | Term match (case-sensitive) | Faster |
contains | Substring match (case-insensitive) | Slow (scan) |
contains_cs | Substring match (case-sensitive) | Slightly faster scan |
startswith | Prefix match (case-insensitive) | Moderate |
endswith | Suffix match (case-insensitive) | Moderate |
matches regex | Regular expression | Slowest — use sparingly |
has_any | Term match against a list | Fast |
has_all | All terms must match | Fast |
Case-sensitive variants (the _cs suffix) are faster than their case-insensitive counterparts because the engine can use stricter index matching. Use them when you know the exact casing — for example, when matching known IOC hashes.
project — Select and Rename Columns
project controls which columns appear in your output. It reduces the width of your result set, making it easier to read and cheaper to process.
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0
| project TimeGenerated, UserPrincipalName, IPAddress, Location, ResultType, ResultDescription
You can rename columns inline:
SigninLogs
| where TimeGenerated > ago(1d)
| project Timestamp = TimeGenerated, User = UserPrincipalName, IP = IPAddress, Result = ResultType
project-away removes specific columns and keeps everything else — useful when a table has 40 columns and you want to drop three:
SigninLogs
| where TimeGenerated > ago(1d)
| project-away OperationName, OperationVersion, Identity
extend — Add Calculated Columns
extend adds new columns without removing existing ones. Use it to compute derived values, parse nested fields, or add labels.
SigninLogs
| where TimeGenerated > ago(1d)
| extend HourOfDay = hourofday(TimeGenerated)
| extend Country = tostring(LocationDetails.countryOrRegion)
| extend IsFailure = iff(ResultType != 0, true, false)
The iff() function is KQL’s conditional expression: iff(condition, value_if_true, value_if_false).
extend is also where you unpack JSON. Many Sentinel columns store structured data as dynamic (JSON) objects. Use tostring(), toint(), or parse_json() to extract values:
SigninLogs
| where TimeGenerated > ago(1d)
| extend OS = tostring(DeviceDetail.operatingSystem)
| extend Browser = tostring(DeviceDetail.browser)
| extend City = tostring(LocationDetails.city)
summarize — Aggregate Data
summarize groups rows and computes aggregate values. It is the KQL equivalent of SQL’s GROUP BY combined with aggregate functions.
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != 0
| summarize
FailedAttempts = count(),
DistinctIPs = dcount(IPAddress),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by UserPrincipalName
| where FailedAttempts > 20
| sort by FailedAttempts desc
This answers: “Which users had the most failed sign-ins over the past week, how many unique IPs did those failures come from, and when did they start and stop?”
Common aggregation functions:
| Function | Returns |
|---|---|
count() | Number of rows in the group |
dcount(column) | Approximate distinct count |
sum(column) | Sum of numeric values |
avg(column) | Average |
min(column) | Minimum value |
max(column) | Maximum value |
make_list(column) | JSON array of all values |
make_set(column) | JSON array of distinct values |
arg_max(column, *) | Row with the maximum value of that column |
arg_min(column, *) | Row with the minimum value of that column |
percentile(column, N) | Nth percentile |
countif(condition) | Count of rows matching a condition |
bin() for time bucketing: When you need to see trends over time, use bin() to group timestamps into intervals:
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0
| summarize FailedCount = count() by bin(TimeGenerated, 1h), UserPrincipalName
| sort by TimeGenerated asc
This produces a row per user per hour, showing how failures are distributed across time — essential for spotting brute force patterns or spray attacks.
sort and take — Order and Limit Results
sort by (also written order by) orders your results. take (also written limit) caps the number of rows returned.
SecurityAlert
| where TimeGenerated > ago(7d)
| summarize AlertCount = count() by AlertName
| sort by AlertCount desc
| take 10
Top 10 most frequent alert types in the last week. Use top as a shorthand for sort + take:
SecurityAlert
| where TimeGenerated > ago(7d)
| summarize AlertCount = count() by AlertName
| top 10 by AlertCount desc
Time Functions
Security investigations are inherently time-based. KQL provides a rich set of datetime functions.
Relative Time with ago()
ago() returns a datetime relative to now. It accepts time literals:
| Literal | Meaning |
|---|---|
1m | 1 minute |
1h | 1 hour |
1d | 1 day |
7d | 7 days |
30d | 30 days |
365d | 365 days |
// Last 4 hours
SigninLogs | where TimeGenerated > ago(4h)
// Between 48 and 24 hours ago
SigninLogs | where TimeGenerated between(ago(48h) .. ago(24h))
Absolute Time with datetime()
When investigating a specific incident, pin your query to exact timestamps:
SigninLogs
| where TimeGenerated between(datetime(2026-03-15 08:00) .. datetime(2026-03-15 12:00))
| where UserPrincipalName == "j.smith@northgateeng.com"
Extracting Time Components
SigninLogs
| where TimeGenerated > ago(7d)
| extend Hour = hourofday(TimeGenerated)
| extend DayOfWeek = dayofweek(TimeGenerated)
| extend DayName = case(
DayOfWeek == 0d, "Sunday",
DayOfWeek == 1d, "Monday",
DayOfWeek == 2d, "Tuesday",
DayOfWeek == 3d, "Wednesday",
DayOfWeek == 4d, "Thursday",
DayOfWeek == 5d, "Friday",
DayOfWeek == 6d, "Saturday",
"Unknown")
dayofweek() returns a timespan where 0d is Sunday. The case() function is KQL’s multi-branch conditional — it evaluates pairs of condition/value from top to bottom and returns the first match.
Time Differences
Calculate the gap between two events — critical for detecting token replay, impossible travel, and session hijacking:
SigninLogs
| where TimeGenerated > ago(1d)
| where UserPrincipalName == "j.smith@northgateeng.com"
| where ResultType == 0
| sort by TimeGenerated asc
| extend PreviousSignIn = prev(TimeGenerated)
| extend TimeDelta = TimeGenerated - PreviousSignIn
| where TimeDelta < 5m and IPAddress != prev(IPAddress)
| project TimeGenerated, IPAddress, Location, TimeDelta
The prev() function references the value in the previous row (based on the current sort order). This query finds cases where the same user signed in from different IPs within five minutes — a potential indicator of session token theft.
Joining Tables
Investigations rarely live in a single table. An email phishing alert leads to sign-in logs, which lead to audit logs, which lead to endpoint events. KQL’s join operator lets you correlate across tables.
(Source: join operator — Microsoft Learn)
Basic Join Syntax
Table1
| join kind=inner (Table2) on CommonColumn
The left table is Table1. The right table (in parentheses) is Table2. The on clause specifies the matching column. If the column names differ between tables, use $left.ColumnA == $right.ColumnB.
Join Kinds for Security Analysts
| Kind | Returns | Use Case |
|---|---|---|
inner | Only rows with matches in both tables | Correlate alerts with sign-ins for the same user |
leftouter | All left rows; matching right rows or nulls | Enrich events with optional lookup data |
leftanti | Left rows with NO match in right table | Find users who triggered alerts but have no successful sign-in |
leftsemi | Left rows that HAVE a match, but only left columns | Filter a table to only users that appear in another table |
inner is the default if you omit kind=. For most security correlation, inner and leftouter cover 90% of requirements.
Practical Example: Correlating Alerts with Sign-Ins
You have a list of users who triggered a phishing alert. You want to check whether any of them successfully signed in during the same period — which would suggest credential compromise:
let PhishedUsers = SecurityAlert
| where TimeGenerated > ago(24h)
| where AlertName has "phish"
| extend Entities = parse_json(Entities)
| mv-expand Entity = Entities
| where Entity.Type == "account"
| extend AlertedUser = tostring(Entity.AadUserId)
| distinct AlertedUser;
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0
| where UserId in (PhishedUsers)
| project TimeGenerated, UserPrincipalName, IPAddress, AppDisplayName, Location
This uses a let statement to store the intermediate result, then filters SigninLogs to only the users who were in the alert set. The let statement assigns a name to an expression or a tabular result. It must end with a semicolon.
Performance Warning on Joins
Joins are expensive. Both tables are loaded into memory before matching. To keep joins fast:
- Time-filter both sides before joining — never join unfiltered tables
projecteach side to only the columns you need before the join- Put the smaller table on the right — the engine uses the right table as the lookup
Parsing Semi-Structured Data
Security logs frequently embed structured data inside string or dynamic fields. JSON payloads, delimited strings, and key-value pairs all require parsing.
parse_json() and Dynamic Column Access
Many Sentinel columns are typed as dynamic, meaning they contain JSON objects. Access nested properties with dot notation:
SigninLogs
| where TimeGenerated > ago(1d)
| extend ConditionalAccessStatus = tostring(ConditionalAccessPolicies[0].result)
| extend OS = tostring(DeviceDetail.operatingSystem)
| extend TrustType = tostring(DeviceDetail.trustType)
For columns stored as strings that contain JSON, convert them first with parse_json():
SecurityAlert
| where TimeGenerated > ago(7d)
| extend ParsedEntities = parse_json(Entities)
| mv-expand ParsedEntities
| where ParsedEntities.Type == "ip"
| extend AttackerIP = tostring(ParsedEntities.Address)
| summarize AlertCount = count() by AttackerIP
| sort by AlertCount desc
mv-expand takes a dynamic array and creates one row per element — essential when an alert contains multiple entities (IPs, accounts, hosts) packed into a single Entities field.
parse Operator for Delimited Strings
When data follows a consistent pattern but is not JSON, use parse:
Syslog
| where TimeGenerated > ago(1d)
| where SyslogMessage has "Failed password"
| parse SyslogMessage with * "Failed password for " Username " from " SourceIP " port " Port:int *
| summarize Attempts = count() by Username, SourceIP
| sort by Attempts desc
The parse operator matches a text pattern and extracts named segments into new columns. The * wildcard absorbs characters you do not need. The :int type annotation on Port converts the extracted value to an integer.
extract() for Regex Extraction
When parse cannot handle the pattern, use extract() with a regular expression:
EmailEvents
| where TimeGenerated > ago(7d)
| where SenderFromAddress has "invoice"
| extend Domain = extract(@"@(.+)$", 1, SenderFromAddress)
| summarize EmailCount = count() by Domain
| sort by EmailCount desc
extract() takes a regex pattern, a capture group number, and a source string. Group 1 here captures everything after the @ symbol. Use regex as a last resort — it is the slowest string operation in KQL.
Query Patterns for Common Investigation Tasks
Pattern 1: Failed Sign-In Spike Detection
let Threshold = 15;
let TimeWindow = 1h;
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0
| summarize
FailedCount = count(),
DistinctIPs = dcount(IPAddress),
Applications = make_set(AppDisplayName),
ResultCodes = make_set(ResultType)
by UserPrincipalName, bin(TimeGenerated, TimeWindow)
| where FailedCount > Threshold
| sort by FailedCount desc
This surfaces users who exceeded the failure threshold in any one-hour window. make_set() collects the distinct applications and error codes involved, giving you immediate context about whether this is a password spray (many users, same IP) or a targeted brute force (one user, many IPs).
Pattern 2: Inbox Rule Creation After Sign-In
Attackers who compromise mailboxes frequently create inbox forwarding rules to maintain access. This query finds inbox rules created within one hour of a sign-in from an unusual location:
let SuspiciousSignIns = SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0
| where RiskLevelDuringSignIn in ("medium", "high")
| project SignInTime = TimeGenerated, UserPrincipalName, IPAddress, RiskLevelDuringSignIn;
OfficeActivity
| where TimeGenerated > ago(24h)
| where Operation in ("New-InboxRule", "Set-InboxRule", "Enable-InboxRule")
| project RuleTime = TimeGenerated, UserId, Operation, Parameters
| join kind=inner (SuspiciousSignIns) on $left.UserId == $right.UserPrincipalName
| where RuleTime between (SignInTime .. (SignInTime + 1h))
| project RuleTime, UserId, Operation, Parameters, SignInTime, IPAddress, RiskLevelDuringSignIn
Pattern 3: Distinct Countries Per User (Impossible Travel Indicator)
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize Countries = make_set(Country), CountryCount = dcount(Country) by UserPrincipalName
| where CountryCount > 2
| sort by CountryCount desc
Users signing in from more than two countries in a week warrants investigation. This is a rough filter — not a replacement for Entra ID’s impossible travel detection, but useful as a hunting starting point.
Pattern 4: Alert Triage — Top Alerts by Severity and Frequency
SecurityAlert
| where TimeGenerated > ago(7d)
| where Status != "Dismissed"
| summarize
Count = count(),
FirstOccurrence = min(TimeGenerated),
LastOccurrence = max(TimeGenerated),
AffectedEntities = dcount(CompromisedEntity)
by AlertName, AlertSeverity
| extend DurationHours = datetime_diff('hour', LastOccurrence, FirstOccurrence)
| sort by AlertSeverity asc, Count desc
This gives you an operational overview of your alert queue — what is firing, how often, how many distinct entities are involved, and how long each alert type has been active.
let Statements and Query Organisation
As queries get longer, let statements keep them maintainable. A let statement assigns a name to an expression or a tabular result. It must end with a semicolon.
let LookbackPeriod = 24h;
let FailureThreshold = 10;
let TargetApps = dynamic(["Exchange Online", "SharePoint Online", "Microsoft Teams"]);
SigninLogs
| where TimeGenerated > ago(LookbackPeriod)
| where ResultType != 0
| where AppDisplayName in (TargetApps)
| summarize Failures = count() by UserPrincipalName
| where Failures > FailureThreshold
Parameterising your queries with let means you change one value at the top instead of hunting through the query body. When this query becomes an analytics rule, those let variables become the knobs you tune.
Rendering Visualisations
KQL can render charts directly in the query output — useful in Log Analytics and workbooks.
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != 0
| summarize FailedCount = count() by bin(TimeGenerated, 1h)
| render timechart
Available render types: timechart, barchart, piechart, columnchart, scatterchart, areachart. In Sentinel workbooks, these become the visualisations your SOC manager sees on the dashboard.
SC-200 Exam Relevance
The SC-200 study guide (updated January 2026) lists two objectives directly addressed by this module:
Build KQL statements: The exam tests whether you can read a query and identify what it returns, select the correct operator for a given task, and complete partially-written queries. You do not need to write complex queries from memory, but you must recognise what
summarize,join,extend, andwhereproduce.Analyze data by using KQL: Expect questions that show a query and ask you to predict its output — for example, “How many rows does this query return?” or “What does the
dcountfunction calculate?”
(Source: SC-200 study guide — Microsoft Learn)
Hotspot and dropdown questions in the KQL section are common. You will see partially completed queries where you must select the correct operator or function from a dropdown. Knowing the difference between has and contains, between inner and leftouter join, and between count() and dcount() is directly tested.
Key Takeaways
- Filter early: Time constraints first, then high-selectivity predicates.
- Default to
hasovercontains: Use the term index. Reservecontainsfor substring searches where term matching does not apply. - Use
letfor readability: Name your thresholds, time windows, and intermediate results. - Join with care: Time-filter and project both sides before joining. Smaller table on the right.
extend+parse_json()for nested data: Most Sentinel columns with useful context store it as dynamic JSON.summarize+bin()for trend analysis: The combination of aggregation with time bucketing powers every spike detection and baseline deviation query you will write.
References
All technical claims in this module are sourced from current Microsoft documentation:
- KQL overview and case sensitivity — learn.microsoft.com/en-us/kusto/query
- String operators and term indexing — learn.microsoft.com/en-us/kusto/query/datatypes-string-operators
- Query best practices (filter ordering, performance) — learn.microsoft.com/en-us/kusto/query/best-practices
- join operator and join kinds — learn.microsoft.com/en-us/kusto/query/join-operator
- Common KQL tasks for Microsoft Sentinel — learn.microsoft.com/en-us/kusto/query/tutorials/common-tasks-microsoft-sentinel
- Sentinel data source schema reference — learn.microsoft.com/en-us/azure/sentinel/data-source-schema-reference
- SC-200 study guide (January 2026 update) — learn.microsoft.com/en-us/credentials/certifications/resources/study-guides/sc-200
- Sample KQL queries for Microsoft Sentinel — learn.microsoft.com/en-us/azure/sentinel/datalake/kql-sample-queries
Next module: Module 3: Defender XDR Portal Navigation →