2.2 Core Operators

90 minutes · Module 2 · Free

Core Operators

These are the operators you will use in every investigation query. Master these six and you can answer the majority of security questions: where filters rows, project selects columns, extend adds calculated columns, summarize aggregates data, sort orders results, and take limits output.

THE SIX CORE OPERATORSwhereFilter rowsprojectSelect columnsextendAdd columnssummarizeAggregatesortOrder resultstakeLimit outputTypical order in an investigation query: where (time) → where (filter) → extend → summarize → sort → takeNot every query uses all six. Most use where + project or where + summarize.

where — Filter Rows

where keeps rows that match a condition. It is the operator you will use more than any other.

1
2
3
4
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| where AppDisplayName == "Microsoft Teams"

This returns successful sign-ins to Microsoft Teams in the last seven days. Multiple where clauses act as logical AND — every condition must be true. You can combine conditions in a single where with and and or:

1
2
3
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0 and (AppDisplayName == "Exchange Online" or AppDisplayName == "SharePoint Online")

Comparison operators:

OperatorMeaning
==Equals (case-sensitive for strings)
!=Not equals
=~Equals (case-insensitive)
>, <, >=, <=Numeric and datetime comparisons
inMatches any value in a list (case-sensitive)
in~Matches any value in a list (case-insensitive)
!inDoes not match any value in a list
betweenWithin a range (inclusive)

The in operator is particularly useful for filtering on known values:

1
2
3
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType in (50126, 50053, 50074)

This returns sign-ins that failed with specific error codes: invalid credentials (50126), account locked (50053), or MFA required but not completed (50074).

has vs contains — The Performance Difference

This is the most important distinction in KQL string matching. Getting it wrong costs you minutes per query on large tables.

Default to has. Use contains only when you need partial-word matching.

has checks the term index — it looks up complete words and runs fast. contains scans every character of every string in the column — it runs slow. On a table with millions of rows, has returns in seconds while contains can take minutes.

1
2
EmailEvents
| where SenderFromAddress has "paypal"
1
2
EmailEvents
| where SenderFromAddress contains "paypal"

Both return similar results in most cases, but has runs faster because it checks the term index for “paypal” rather than scanning every character of every email address.

When you must use contains: when your search term is part of a larger word and not a standalone term. For example, searching for “admin” inside “sysadmin” requires contains because “admin” is a substring of the term “sysadmin”, not a separate term.

The full family of string operators:

OperatorBehaviourPerformance
hasTerm match (case-insensitive)Fast (indexed)
has_csTerm match (case-sensitive)Faster
containsSubstring match (case-insensitive)Slow (scan)
contains_csSubstring match (case-sensitive)Slightly faster scan
startswithPrefix match (case-insensitive)Moderate
endswithSuffix match (case-insensitive)Moderate
matches regexRegular expressionSlowest — use sparingly
has_anyTerm match against a listFast
has_allAll terms must matchFast

Case-sensitive variants (the _cs suffix) are faster than their case-insensitive counterparts because the engine can use stricter index matching. Use them when you know the exact casing — for example, when matching known IOC hashes.

project — Select and Rename Columns

project controls which columns appear in your output. It reduces the width of your result set, making it easier to read and reducing the data the engine processes.

1
2
3
4
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0
| project TimeGenerated, UserPrincipalName, IPAddress, ResultType, AppDisplayName

You can rename columns in the same step:

1
2
3
SigninLogs
| where TimeGenerated > ago(1d)
| project Timestamp = TimeGenerated, User = UserPrincipalName, IP = IPAddress, ErrorCode = ResultType
project-away: remove specific columns

If you want most columns but need to remove a few, project-away is more efficient than listing every column you want to keep. | project-away TenantId, OperationName, CorrelationId drops those three and keeps everything else.

extend — Add Calculated Columns

extend adds a new column while keeping all existing columns. Use it for calculations, extractions, and transformations.

1
2
3
4
5
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType == 0
| extend HourOfDay = hourofday(TimeGenerated)
| extend Domain = tostring(split(UserPrincipalName, "@")[1])

This adds two columns: HourOfDay (the hour the sign-in occurred) and Domain (extracted from the user’s email). The original columns remain — extend never removes data.

Common extend patterns for security:

1
2
3
4
5
| extend TimeSinceEvent = datetime_diff('minute', now(), TimeGenerated)
| extend Country = tostring(LocationDetails.countryOrRegion)
| extend City = tostring(LocationDetails.city)
| extend Browser = tostring(DeviceDetail.browser)
| extend OS = tostring(DeviceDetail.operatingSystem)

summarize — Aggregate Data

summarize groups rows and computes aggregations. This is how you count things, find unique values, and identify patterns.

1
2
3
4
5
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0
| summarize FailedAttempts = count() by UserPrincipalName
| sort by FailedAttempts desc

Key aggregation functions:

FunctionWhat it doesExample use
count()Count rows per groupFailed sign-ins per user
dcount(column)Count distinct valuesUnique IPs per user
make_set(column)Collect distinct values into an arrayAll countries a user signed in from
make_list(column)Collect all values (including duplicates)All applications accessed
arg_max(col, *)Return the row with the maximum valueMost recent sign-in per user
arg_min(col, *)Return the row with the minimum valueFirst sign-in per user
sum(column)Sum a numeric columnTotal data volume per device
avg(column)Average of a numeric columnAverage response time
countif(condition)Count rows matching a conditionFailed vs successful in one query

Multiple aggregations in one query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SigninLogs
| where TimeGenerated > ago(1d)
| summarize
    TotalSignins = count(),
    FailedSignins = countif(ResultType != 0),
    SuccessfulSignins = countif(ResultType == 0),
    UniqueIPs = dcount(IPAddress),
    Countries = make_set(tostring(LocationDetails.countryOrRegion))
    by UserPrincipalName
| where FailedSignins > 10
| sort by FailedSignins desc

This gives you a complete sign-in profile per user in a single query: total sign-ins, failures, successes, how many distinct IPs they used, and which countries they signed in from.

sort and take — Order and Limit

sort by orders results. take limits the number of rows returned.

1
2
3
4
5
SecurityAlert
| where TimeGenerated > ago(7d)
| summarize AlertCount = count() by AlertName
| sort by AlertCount desc
| take 10

This returns the top 10 most frequent alerts in the last week — a quick view of what is generating the most noise in your environment.

top is a shortcut for sort + take

| top 10 by FailedAttempts desc is equivalent to | sort by FailedAttempts desc | take 10. Use whichever reads more naturally.

Operator decision: which string operator should you use?

You need to search the SenderFromAddress column for emails from "paypal" domains. Which operator gives the best balance of accuracy and performance?
Is "paypal" a standalone word or part of a larger word in the email address?

Try it yourself

Write a query that finds the top 5 IP addresses with the most failed sign-in attempts in the last 24 hours, showing the IP, the number of failures, and the number of distinct users targeted from each IP.
1
2
3
4
5
6
7
8
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0
| summarize
    FailedAttempts = count(),
    TargetedUsers = dcount(UserPrincipalName)
    by IPAddress
| top 5 by FailedAttempts desc

An IP targeting many distinct users with failures is a strong indicator of password spray. An IP targeting one user with many failures is brute force. The dcount(UserPrincipalName) distinguishes between the two — this is a pattern you will use repeatedly in investigation scenarios.

Check your understanding

1. What is the practical difference between has and contains?

They produce the same results
has matches whole terms using the index (fast); contains matches substrings by scanning every character (slow)
contains is faster because it is simpler

2. What does extend do differently from project?

extend removes columns; project adds them
They are interchangeable
extend adds new columns while keeping all existing ones; project selects only the columns you specify and drops the rest

3. You want to count failed sign-ins per user and also collect the list of IP addresses used. Which operators achieve this?

summarize count() and make_set(IPAddress) by UserPrincipalName
project UserPrincipalName, IPAddress, count()
extend FailCount = count() by UserPrincipalName

4. What is the most efficient way to find the single most recent sign-in for each user?

summarize arg_max(TimeGenerated, *) by UserPrincipalName
sort by TimeGenerated desc | take 1
where TimeGenerated == max(TimeGenerated)