2.2 Core Operators

90 minutes · Module 2 · Free

Core Operators

These are the operators you will use in every investigation query. Master these six and you can answer the majority of security questions: where filters rows, project selects columns, extend adds calculated columns, summarize aggregates data, sort orders results, and take limits output.

where — Filter Rows

where keeps rows that match a condition. It is the operator you will use more than any other.

1
2
3
4
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| where AppDisplayName == "Microsoft Teams"

This returns successful sign-ins to Microsoft Teams in the last seven days. Multiple where clauses act as logical AND — every condition must be true. You can combine conditions in a single where with and and or:

1
2
3
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0 and (AppDisplayName == "Exchange Online" or AppDisplayName == "SharePoint Online")

Comparison operators:

Operator	Meaning
`==`	Equals (case-sensitive for strings)
`!=`	Not equals
`=~`	Equals (case-insensitive)
`>`, `<`, `>=`, `<=`	Numeric and datetime comparisons
`in`	Matches any value in a list (case-sensitive)
`in~`	Matches any value in a list (case-insensitive)
`!in`	Does not match any value in a list
`between`	Within a range (inclusive)

The in operator is particularly useful for filtering on known values:

1
2
3
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType in (50126, 50053, 50074)

This returns sign-ins that failed with specific error codes: invalid credentials (50126), account locked (50053), or MFA required but not completed (50074).

has vs contains — The Performance Difference

This is the most important distinction in KQL string matching. Getting it wrong costs you minutes per query on large tables.

Default to has. Use contains only when you need partial-word matching.

has checks the term index — it looks up complete words and runs fast. contains scans every character of every string in the column — it runs slow. On a table with millions of rows, has returns in seconds while contains can take minutes.

1
2
EmailEvents
| where SenderFromAddress has "paypal"

1
2
EmailEvents
| where SenderFromAddress contains "paypal"

Both return similar results in most cases, but has runs faster because it checks the term index for “paypal” rather than scanning every character of every email address.

When you must use contains: when your search term is part of a larger word and not a standalone term. For example, searching for “admin” inside “sysadmin” requires contains because “admin” is a substring of the term “sysadmin”, not a separate term.

The full family of string operators:

Operator	Behaviour	Performance
`has`	Term match (case-insensitive)	Fast (indexed)
`has_cs`	Term match (case-sensitive)	Faster
`contains`	Substring match (case-insensitive)	Slow (scan)
`contains_cs`	Substring match (case-sensitive)	Slightly faster scan
`startswith`	Prefix match (case-insensitive)	Moderate
`endswith`	Suffix match (case-insensitive)	Moderate
`matches regex`	Regular expression	Slowest — use sparingly
`has_any`	Term match against a list	Fast
`has_all`	All terms must match	Fast

Case-sensitive variants (the _cs suffix) are faster than their case-insensitive counterparts because the engine can use stricter index matching. Use them when you know the exact casing — for example, when matching known IOC hashes.

project — Select and Rename Columns

project controls which columns appear in your output. It reduces the width of your result set, making it easier to read and reducing the data the engine processes.

1
2
3
4
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0
| project TimeGenerated, UserPrincipalName, IPAddress, ResultType, AppDisplayName

You can rename columns in the same step:

1
2
3
SigninLogs
| where TimeGenerated > ago(1d)
| project Timestamp = TimeGenerated, User = UserPrincipalName, IP = IPAddress, ErrorCode = ResultType

project-away: remove specific columns

If you want most columns but need to remove a few, project-away is more efficient than listing every column you want to keep. | project-away TenantId, OperationName, CorrelationId drops those three and keeps everything else.

extend — Add Calculated Columns

extend adds a new column while keeping all existing columns. Use it for calculations, extractions, and transformations.

1
2
3
4
5
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType == 0
| extend HourOfDay = hourofday(TimeGenerated)
| extend Domain = tostring(split(UserPrincipalName, "@")[1])

This adds two columns: HourOfDay (the hour the sign-in occurred) and Domain (extracted from the user’s email). The original columns remain — extend never removes data.

Common extend patterns for security:

1
2
3
4
5
| extend TimeSinceEvent = datetime_diff('minute', now(), TimeGenerated)
| extend Country = tostring(LocationDetails.countryOrRegion)
| extend City = tostring(LocationDetails.city)
| extend Browser = tostring(DeviceDetail.browser)
| extend OS = tostring(DeviceDetail.operatingSystem)

summarize — Aggregate Data

summarize groups rows and computes aggregations. This is how you count things, find unique values, and identify patterns.

1
2
3
4
5
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0
| summarize FailedAttempts = count() by UserPrincipalName
| sort by FailedAttempts desc

Key aggregation functions:

Function	What it does	Example use
`count()`	Count rows per group	Failed sign-ins per user
`dcount(column)`	Count distinct values	Unique IPs per user
`make_set(column)`	Collect distinct values into an array	All countries a user signed in from
`make_list(column)`	Collect all values (including duplicates)	All applications accessed
`arg_max(col, *)`	Return the row with the maximum value	Most recent sign-in per user
`arg_min(col, *)`	Return the row with the minimum value	First sign-in per user
`sum(column)`	Sum a numeric column	Total data volume per device
`avg(column)`	Average of a numeric column	Average response time
`countif(condition)`	Count rows matching a condition	Failed vs successful in one query

Multiple aggregations in one query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SigninLogs
| where TimeGenerated > ago(1d)
| summarize
    TotalSignins = count(),
    FailedSignins = countif(ResultType != 0),
    SuccessfulSignins = countif(ResultType == 0),
    UniqueIPs = dcount(IPAddress),
    Countries = make_set(tostring(LocationDetails.countryOrRegion))
    by UserPrincipalName
| where FailedSignins > 10
| sort by FailedSignins desc

This gives you a complete sign-in profile per user in a single query: total sign-ins, failures, successes, how many distinct IPs they used, and which countries they signed in from.

sort and take — Order and Limit

sort by orders results. take limits the number of rows returned.

1
2
3
4
5
SecurityAlert
| where TimeGenerated > ago(7d)
| summarize AlertCount = count() by AlertName
| sort by AlertCount desc
| take 10

This returns the top 10 most frequent alerts in the last week — a quick view of what is generating the most noise in your environment.

top is a shortcut for sort + take

| top 10 by FailedAttempts desc is equivalent to | sort by FailedAttempts desc | take 10. Use whichever reads more naturally.

Operator decision: which string operator should you use?

You need to search the SenderFromAddress column for emails from "paypal" domains. Which operator gives the best balance of accuracy and performance?

Is "paypal" a standalone word or part of a larger word in the email address?

Try it yourself

Write a query that finds the top 5 IP addresses with the most failed sign-in attempts in the last 24 hours, showing the IP, the number of failures, and the number of distinct users targeted from each IP.

1
2
3
4
5
6
7
8
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != 0
| summarize
    FailedAttempts = count(),
    TargetedUsers = dcount(UserPrincipalName)
    by IPAddress
| top 5 by FailedAttempts desc

An IP targeting many distinct users with failures is a strong indicator of password spray. An IP targeting one user with many failures is brute force. The dcount(UserPrincipalName) distinguishes between the two — this is a pattern you will use repeatedly in investigation scenarios.

Check your understanding

1. What is the practical difference between has and contains?

They produce the same results

has matches whole terms using the index (fast); contains matches substrings by scanning every character (slow)

contains is faster because it is simpler

has uses the term index — it looks up complete words. contains scans the full string character by character. For security log volumes (millions of rows), this performance difference is significant. Default to has; use contains only when you need partial-word matching.

2. What does extend do differently from project?

extend removes columns; project adds them

They are interchangeable

extend adds new columns while keeping all existing ones; project selects only the columns you specify and drops the rest

extend is additive — it adds new calculated columns without touching existing data. project is selective — it keeps only the columns you list. Use extend when you need to add computed fields. Use project when you want to narrow the output to specific columns for readability.

3. You want to count failed sign-ins per user and also collect the list of IP addresses used. Which operators achieve this?

summarize count() and make_set(IPAddress) by UserPrincipalName

project UserPrincipalName, IPAddress, count()

extend FailCount = count() by UserPrincipalName

summarize with count() counts rows per group, and make_set() collects the distinct IPs into an array — both in one query. This gives you the volume (how many failures) and the spread (from how many IPs) per user, which distinguishes brute force (one IP, many attempts) from password spray (many IPs, few attempts per user).

4. What is the most efficient way to find the single most recent sign-in for each user?

summarize arg_max(TimeGenerated, *) by UserPrincipalName

sort by TimeGenerated desc | take 1

where TimeGenerated == max(TimeGenerated)

arg_max(TimeGenerated, *) returns the complete row with the latest timestamp per group. Combined with by UserPrincipalName, it gives you one row per user showing their most recent sign-in with all columns. This is the standard pattern for getting the latest state of each entity.

← 2.3 Time Functions and Joins 2.1 How KQL Works and Key Tables →