FREE SC-200: Manage Soc Env 60 minutes

Module 2: KQL Fundamentals for Security Analysts

The KQL operators, functions, and patterns you'll use in every investigation. Written for security analysts, not data engineers.

What You'll Learn

  • Write queries using where, project, summarize, extend, and join
  • Use time-based filtering and datetime functions
  • Parse and extract data from semi-structured fields
  • Build reusable query patterns for sign-in analysis, email tracing, and alert triage

SC-200 Exam Objectives Covered

  • Build KQL statements
  • Analyze data by using KQL

Kusto Query Language is the query language behind Microsoft Sentinel, Defender XDR Advanced Hunting, and Log Analytics. Every detection rule, every hunting query, and every workbook you build in the Microsoft security stack runs on KQL. If you can write KQL, you can investigate. If you cannot, you are dependent on whoever wrote the queries you are running.

This module covers the operators and patterns that account for the vast majority of security work. It is not a complete language reference — Microsoft publishes that at learn.microsoft.com/en-us/kusto/query. What follows is the working subset that a SOC analyst reaches for daily, explained in the context of security tables and real investigation questions.

Where to practise: You can run every query in this module in the Log Analytics demo environment or in the Advanced Hunting blade within the Microsoft Defender portal if you have access to a tenant.

How KQL Queries Work

A KQL query is a pipeline. Data starts at a table, flows through operators separated by the pipe character |, and emerges as a result set. Each operator receives a tabular input, transforms it, and passes it to the next operator.

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0
| summarize FailedAttempts = count() by UserPrincipalName
| where FailedAttempts > 10
| sort by FailedAttempts desc

Read that top to bottom: start with sign-in logs from the last 24 hours, keep only failures, count failures per user, keep users with more than 10 failures, sort by count descending. The pipe model means you build queries incrementally — run each line, inspect the output, add the next operator.

Three rules apply to everything that follows:

  1. KQL is case-sensitive for table names, column names, operators, and functions. SigninLogs works. signinlogs does not. String comparisons default to case-insensitive for most operators, but the language itself is strict. (Source: KQL overview — Microsoft Learn)

  2. Filter early, project late. Every row and column you eliminate early reduces the data the engine processes downstream. Put your where clauses — especially time filters — before aggregations or joins. (Source: Query best practices — Microsoft Learn)

  3. Time-filter first. Kusto maintains a highly efficient index on datetime columns. Filtering on TimeGenerated first allows the engine to skip entire data partitions without reading them. Always start with a time constraint.

The Tables You Will Query

Before learning operators, you need to know where the data lives. Microsoft Sentinel and Defender XDR expose dozens of tables. These are the ones you will query most as a security analyst:

TableWhat It ContainsSource
SigninLogsInteractive Azure AD / Entra ID sign-in attemptsEntra ID
AADNonInteractiveUserSignInLogsToken refreshes, background app sign-insEntra ID
SecurityAlertAlerts from Sentinel analytics rules, Defender XDR, Defender for CloudMultiple
SecurityIncidentIncidents created from correlated alertsSentinel
SecurityEventWindows Security Event Log entries (logon events, process creation, etc.)Windows agents
EmailEventsEmail delivery and blocking eventsDefender for Office 365
EmailUrlInfoURLs within emailsDefender for Office 365
DeviceProcessEventsProcess execution on endpointsDefender for Endpoint
DeviceNetworkEventsNetwork connections from endpointsDefender for Endpoint
AuditLogsEntra ID configuration changes (user creation, group membership, app consent)Entra ID
OfficeActivitySharePoint, OneDrive, Exchange, and Teams operationsOffice 365

(Source: Microsoft Sentinel data source schema reference — Microsoft Learn)

Every table has a TimeGenerated column that records when the event was ingested. This is the column you use for time-based filtering.

Core Operators

where — Filter Rows

where keeps rows that match a condition. It is the operator you will use more than any other.

SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| where AppDisplayName == "Microsoft Teams"

This returns successful sign-ins to Microsoft Teams in the last seven days. Multiple where clauses act as logical AND — every condition must be true. You can combine conditions in a single where with and and or:

SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0 and (AppDisplayName == "Exchange Online" or AppDisplayName == "SharePoint Online")

Comparison operators:

OperatorMeaning
==Equals (case-sensitive for strings)
!=Not equals
=~Equals (case-insensitive)
!~Not equals (case-insensitive)
>, <, >=, <=Numeric and datetime comparisons
inMatches any value in a list (case-sensitive)
in~Matches any value in a list (case-insensitive)
!inDoes not match any value in a list
betweenWithin a range (inclusive)

The in operator is particularly useful for filtering on multiple known values:

SecurityEvent
| where TimeGenerated > ago(1d)
| where EventID in (4624, 4625, 4648)

That query returns successful logons (4624), failed logons (4625), and explicit credential logons (4648) in one pass.

String Operators — has vs contains

This is the single most important performance distinction in KQL for security analysts. Both has and contains search within strings, but they work differently and the performance gap is significant.

has performs a term-level match. Kusto breaks every string into terms — sequences of four or more alphanumeric characters bounded by non-alphanumeric characters or string boundaries. The engine maintains a term index, so has searches are fast index lookups.

contains performs a substring scan. It searches for the exact character sequence anywhere in the string, character by character. No index is used. On large tables, contains can be orders of magnitude slower.

(Source: String operators — Microsoft Learn)

// Fast — uses the term index
EmailEvents
| where TimeGenerated > ago(1d)
| where SenderFromAddress has "paypal"

// Slow — full substring scan
EmailEvents
| where TimeGenerated > ago(1d)
| where SenderFromAddress contains "paypal"

Both return similar results in most cases, but has runs faster because it checks the term index for “paypal” rather than scanning every character of every email address.

When you must use contains: When your search term is part of a larger word and not a standalone term. For example, searching for “admin” inside “sysadmin” requires contains because “admin” is a substring of the term “sysadmin”, not a separate term.

Practical rule: Default to has. Switch to contains only when you need partial-word matching and understand you are paying a performance cost.

The full family of string operators:

OperatorBehaviourPerformance
hasTerm match (case-insensitive)Fast (indexed)
has_csTerm match (case-sensitive)Faster
containsSubstring match (case-insensitive)Slow (scan)
contains_csSubstring match (case-sensitive)Slightly faster scan
startswithPrefix match (case-insensitive)Moderate
endswithSuffix match (case-insensitive)Moderate
matches regexRegular expressionSlowest — use sparingly
has_anyTerm match against a listFast
has_allAll terms must matchFast

Case-sensitive variants (the _cs suffix) are faster than their case-insensitive counterparts because the engine can use stricter index matching. Use them when you know the exact casing — for example, when matching known IOC hashes.

project — Select and Rename Columns

project controls which columns appear in your output. It reduces the width of your result set, making it easier to read and cheaper to process.

SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0
| project TimeGenerated, UserPrincipalName, IPAddress, Location, ResultType, ResultDescription

You can rename columns inline:

SigninLogs
| where TimeGenerated > ago(1d)
| project Timestamp = TimeGenerated, User = UserPrincipalName, IP = IPAddress, Result = ResultType

project-away removes specific columns and keeps everything else — useful when a table has 40 columns and you want to drop three:

SigninLogs
| where TimeGenerated > ago(1d)
| project-away OperationName, OperationVersion, Identity

extend — Add Calculated Columns

extend adds new columns without removing existing ones. Use it to compute derived values, parse nested fields, or add labels.

SigninLogs
| where TimeGenerated > ago(1d)
| extend HourOfDay = hourofday(TimeGenerated)
| extend Country = tostring(LocationDetails.countryOrRegion)
| extend IsFailure = iff(ResultType != 0, true, false)

The iff() function is KQL’s conditional expression: iff(condition, value_if_true, value_if_false).

extend is also where you unpack JSON. Many Sentinel columns store structured data as dynamic (JSON) objects. Use tostring(), toint(), or parse_json() to extract values:

SigninLogs
| where TimeGenerated > ago(1d)
| extend OS = tostring(DeviceDetail.operatingSystem)
| extend Browser = tostring(DeviceDetail.browser)
| extend City = tostring(LocationDetails.city)

summarize — Aggregate Data

summarize groups rows and computes aggregate values. It is the KQL equivalent of SQL’s GROUP BY combined with aggregate functions.

SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != 0
| summarize
    FailedAttempts = count(),
    DistinctIPs = dcount(IPAddress),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by UserPrincipalName
| where FailedAttempts > 20
| sort by FailedAttempts desc

This answers: “Which users had the most failed sign-ins over the past week, how many unique IPs did those failures come from, and when did they start and stop?”

Common aggregation functions:

FunctionReturns
count()Number of rows in the group
dcount(column)Approximate distinct count
sum(column)Sum of numeric values
avg(column)Average
min(column)Minimum value
max(column)Maximum value
make_list(column)JSON array of all values
make_set(column)JSON array of distinct values
arg_max(column, *)Row with the maximum value of that column
arg_min(column, *)Row with the minimum value of that column
percentile(column, N)Nth percentile
countif(condition)Count of rows matching a condition

bin() for time bucketing: When you need to see trends over time, use bin() to group timestamps into intervals:

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0
| summarize FailedCount = count() by bin(TimeGenerated, 1h), UserPrincipalName
| sort by TimeGenerated asc

This produces a row per user per hour, showing how failures are distributed across time — essential for spotting brute force patterns or spray attacks.

sort and take — Order and Limit Results

sort by (also written order by) orders your results. take (also written limit) caps the number of rows returned.

SecurityAlert
| where TimeGenerated > ago(7d)
| summarize AlertCount = count() by AlertName
| sort by AlertCount desc
| take 10

Top 10 most frequent alert types in the last week. Use top as a shorthand for sort + take:

SecurityAlert
| where TimeGenerated > ago(7d)
| summarize AlertCount = count() by AlertName
| top 10 by AlertCount desc

Time Functions

Security investigations are inherently time-based. KQL provides a rich set of datetime functions.

Relative Time with ago()

ago() returns a datetime relative to now. It accepts time literals:

LiteralMeaning
1m1 minute
1h1 hour
1d1 day
7d7 days
30d30 days
365d365 days
// Last 4 hours
SigninLogs | where TimeGenerated > ago(4h)

// Between 48 and 24 hours ago
SigninLogs | where TimeGenerated between(ago(48h) .. ago(24h))

Absolute Time with datetime()

When investigating a specific incident, pin your query to exact timestamps:

SigninLogs
| where TimeGenerated between(datetime(2026-03-15 08:00) .. datetime(2026-03-15 12:00))
| where UserPrincipalName == "j.smith@northgateeng.com"

Extracting Time Components

SigninLogs
| where TimeGenerated > ago(7d)
| extend Hour = hourofday(TimeGenerated)
| extend DayOfWeek = dayofweek(TimeGenerated)
| extend DayName = case(
    DayOfWeek == 0d, "Sunday",
    DayOfWeek == 1d, "Monday",
    DayOfWeek == 2d, "Tuesday",
    DayOfWeek == 3d, "Wednesday",
    DayOfWeek == 4d, "Thursday",
    DayOfWeek == 5d, "Friday",
    DayOfWeek == 6d, "Saturday",
    "Unknown")

dayofweek() returns a timespan where 0d is Sunday. The case() function is KQL’s multi-branch conditional — it evaluates pairs of condition/value from top to bottom and returns the first match.

Time Differences

Calculate the gap between two events — critical for detecting token replay, impossible travel, and session hijacking:

SigninLogs
| where TimeGenerated > ago(1d)
| where UserPrincipalName == "j.smith@northgateeng.com"
| where ResultType == 0
| sort by TimeGenerated asc
| extend PreviousSignIn = prev(TimeGenerated)
| extend TimeDelta = TimeGenerated - PreviousSignIn
| where TimeDelta < 5m and IPAddress != prev(IPAddress)
| project TimeGenerated, IPAddress, Location, TimeDelta

The prev() function references the value in the previous row (based on the current sort order). This query finds cases where the same user signed in from different IPs within five minutes — a potential indicator of session token theft.

Joining Tables

Investigations rarely live in a single table. An email phishing alert leads to sign-in logs, which lead to audit logs, which lead to endpoint events. KQL’s join operator lets you correlate across tables.

(Source: join operator — Microsoft Learn)

Basic Join Syntax

Table1
| join kind=inner (Table2) on CommonColumn

The left table is Table1. The right table (in parentheses) is Table2. The on clause specifies the matching column. If the column names differ between tables, use $left.ColumnA == $right.ColumnB.

Join Kinds for Security Analysts

KindReturnsUse Case
innerOnly rows with matches in both tablesCorrelate alerts with sign-ins for the same user
leftouterAll left rows; matching right rows or nullsEnrich events with optional lookup data
leftantiLeft rows with NO match in right tableFind users who triggered alerts but have no successful sign-in
leftsemiLeft rows that HAVE a match, but only left columnsFilter a table to only users that appear in another table

inner is the default if you omit kind=. For most security correlation, inner and leftouter cover 90% of requirements.

Practical Example: Correlating Alerts with Sign-Ins

You have a list of users who triggered a phishing alert. You want to check whether any of them successfully signed in during the same period — which would suggest credential compromise:

let PhishedUsers = SecurityAlert
    | where TimeGenerated > ago(24h)
    | where AlertName has "phish"
    | extend Entities = parse_json(Entities)
    | mv-expand Entity = Entities
    | where Entity.Type == "account"
    | extend AlertedUser = tostring(Entity.AadUserId)
    | distinct AlertedUser;
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0
| where UserId in (PhishedUsers)
| project TimeGenerated, UserPrincipalName, IPAddress, AppDisplayName, Location

This uses a let statement to store the intermediate result, then filters SigninLogs to only the users who were in the alert set. The let statement assigns a name to an expression or a tabular result. It must end with a semicolon.

Performance Warning on Joins

Joins are expensive. Both tables are loaded into memory before matching. To keep joins fast:

  • Time-filter both sides before joining — never join unfiltered tables
  • project each side to only the columns you need before the join
  • Put the smaller table on the right — the engine uses the right table as the lookup

Parsing Semi-Structured Data

Security logs frequently embed structured data inside string or dynamic fields. JSON payloads, delimited strings, and key-value pairs all require parsing.

parse_json() and Dynamic Column Access

Many Sentinel columns are typed as dynamic, meaning they contain JSON objects. Access nested properties with dot notation:

SigninLogs
| where TimeGenerated > ago(1d)
| extend ConditionalAccessStatus = tostring(ConditionalAccessPolicies[0].result)
| extend OS = tostring(DeviceDetail.operatingSystem)
| extend TrustType = tostring(DeviceDetail.trustType)

For columns stored as strings that contain JSON, convert them first with parse_json():

SecurityAlert
| where TimeGenerated > ago(7d)
| extend ParsedEntities = parse_json(Entities)
| mv-expand ParsedEntities
| where ParsedEntities.Type == "ip"
| extend AttackerIP = tostring(ParsedEntities.Address)
| summarize AlertCount = count() by AttackerIP
| sort by AlertCount desc

mv-expand takes a dynamic array and creates one row per element — essential when an alert contains multiple entities (IPs, accounts, hosts) packed into a single Entities field.

parse Operator for Delimited Strings

When data follows a consistent pattern but is not JSON, use parse:

Syslog
| where TimeGenerated > ago(1d)
| where SyslogMessage has "Failed password"
| parse SyslogMessage with * "Failed password for " Username " from " SourceIP " port " Port:int *
| summarize Attempts = count() by Username, SourceIP
| sort by Attempts desc

The parse operator matches a text pattern and extracts named segments into new columns. The * wildcard absorbs characters you do not need. The :int type annotation on Port converts the extracted value to an integer.

extract() for Regex Extraction

When parse cannot handle the pattern, use extract() with a regular expression:

EmailEvents
| where TimeGenerated > ago(7d)
| where SenderFromAddress has "invoice"
| extend Domain = extract(@"@(.+)$", 1, SenderFromAddress)
| summarize EmailCount = count() by Domain
| sort by EmailCount desc

extract() takes a regex pattern, a capture group number, and a source string. Group 1 here captures everything after the @ symbol. Use regex as a last resort — it is the slowest string operation in KQL.

Query Patterns for Common Investigation Tasks

Pattern 1: Failed Sign-In Spike Detection

let Threshold = 15;
let TimeWindow = 1h;
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0
| summarize
    FailedCount = count(),
    DistinctIPs = dcount(IPAddress),
    Applications = make_set(AppDisplayName),
    ResultCodes = make_set(ResultType)
    by UserPrincipalName, bin(TimeGenerated, TimeWindow)
| where FailedCount > Threshold
| sort by FailedCount desc

This surfaces users who exceeded the failure threshold in any one-hour window. make_set() collects the distinct applications and error codes involved, giving you immediate context about whether this is a password spray (many users, same IP) or a targeted brute force (one user, many IPs).

Pattern 2: Inbox Rule Creation After Sign-In

Attackers who compromise mailboxes frequently create inbox forwarding rules to maintain access. This query finds inbox rules created within one hour of a sign-in from an unusual location:

let SuspiciousSignIns = SigninLogs
    | where TimeGenerated > ago(24h)
    | where ResultType == 0
    | where RiskLevelDuringSignIn in ("medium", "high")
    | project SignInTime = TimeGenerated, UserPrincipalName, IPAddress, RiskLevelDuringSignIn;
OfficeActivity
| where TimeGenerated > ago(24h)
| where Operation in ("New-InboxRule", "Set-InboxRule", "Enable-InboxRule")
| project RuleTime = TimeGenerated, UserId, Operation, Parameters
| join kind=inner (SuspiciousSignIns) on $left.UserId == $right.UserPrincipalName
| where RuleTime between (SignInTime .. (SignInTime + 1h))
| project RuleTime, UserId, Operation, Parameters, SignInTime, IPAddress, RiskLevelDuringSignIn

Pattern 3: Distinct Countries Per User (Impossible Travel Indicator)

SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize Countries = make_set(Country), CountryCount = dcount(Country) by UserPrincipalName
| where CountryCount > 2
| sort by CountryCount desc

Users signing in from more than two countries in a week warrants investigation. This is a rough filter — not a replacement for Entra ID’s impossible travel detection, but useful as a hunting starting point.

Pattern 4: Alert Triage — Top Alerts by Severity and Frequency

SecurityAlert
| where TimeGenerated > ago(7d)
| where Status != "Dismissed"
| summarize
    Count = count(),
    FirstOccurrence = min(TimeGenerated),
    LastOccurrence = max(TimeGenerated),
    AffectedEntities = dcount(CompromisedEntity)
    by AlertName, AlertSeverity
| extend DurationHours = datetime_diff('hour', LastOccurrence, FirstOccurrence)
| sort by AlertSeverity asc, Count desc

This gives you an operational overview of your alert queue — what is firing, how often, how many distinct entities are involved, and how long each alert type has been active.

let Statements and Query Organisation

As queries get longer, let statements keep them maintainable. A let statement assigns a name to an expression or a tabular result. It must end with a semicolon.

let LookbackPeriod = 24h;
let FailureThreshold = 10;
let TargetApps = dynamic(["Exchange Online", "SharePoint Online", "Microsoft Teams"]);
SigninLogs
| where TimeGenerated > ago(LookbackPeriod)
| where ResultType != 0
| where AppDisplayName in (TargetApps)
| summarize Failures = count() by UserPrincipalName
| where Failures > FailureThreshold

Parameterising your queries with let means you change one value at the top instead of hunting through the query body. When this query becomes an analytics rule, those let variables become the knobs you tune.

Rendering Visualisations

KQL can render charts directly in the query output — useful in Log Analytics and workbooks.

SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != 0
| summarize FailedCount = count() by bin(TimeGenerated, 1h)
| render timechart

Available render types: timechart, barchart, piechart, columnchart, scatterchart, areachart. In Sentinel workbooks, these become the visualisations your SOC manager sees on the dashboard.

SC-200 Exam Relevance

The SC-200 study guide (updated January 2026) lists two objectives directly addressed by this module:

  • Build KQL statements: The exam tests whether you can read a query and identify what it returns, select the correct operator for a given task, and complete partially-written queries. You do not need to write complex queries from memory, but you must recognise what summarize, join, extend, and where produce.

  • Analyze data by using KQL: Expect questions that show a query and ask you to predict its output — for example, “How many rows does this query return?” or “What does the dcount function calculate?”

(Source: SC-200 study guide — Microsoft Learn)

Hotspot and dropdown questions in the KQL section are common. You will see partially completed queries where you must select the correct operator or function from a dropdown. Knowing the difference between has and contains, between inner and leftouter join, and between count() and dcount() is directly tested.

Key Takeaways

  • Filter early: Time constraints first, then high-selectivity predicates.
  • Default to has over contains: Use the term index. Reserve contains for substring searches where term matching does not apply.
  • Use let for readability: Name your thresholds, time windows, and intermediate results.
  • Join with care: Time-filter and project both sides before joining. Smaller table on the right.
  • extend + parse_json() for nested data: Most Sentinel columns with useful context store it as dynamic JSON.
  • summarize + bin() for trend analysis: The combination of aggregation with time bucketing powers every spike detection and baseline deviation query you will write.

References

All technical claims in this module are sourced from current Microsoft documentation:

  1. KQL overview and case sensitivity — learn.microsoft.com/en-us/kusto/query
  2. String operators and term indexing — learn.microsoft.com/en-us/kusto/query/datatypes-string-operators
  3. Query best practices (filter ordering, performance) — learn.microsoft.com/en-us/kusto/query/best-practices
  4. join operator and join kinds — learn.microsoft.com/en-us/kusto/query/join-operator
  5. Common KQL tasks for Microsoft Sentinel — learn.microsoft.com/en-us/kusto/query/tutorials/common-tasks-microsoft-sentinel
  6. Sentinel data source schema reference — learn.microsoft.com/en-us/azure/sentinel/data-source-schema-reference
  7. SC-200 study guide (January 2026 update) — learn.microsoft.com/en-us/credentials/certifications/resources/study-guides/sc-200
  8. Sample KQL queries for Microsoft Sentinel — learn.microsoft.com/en-us/azure/sentinel/datalake/kql-sample-queries

Next module: Module 3: Defender XDR Portal Navigation →

Downloadable Resources

KQL Quick Reference Card (PDF)