In this module

1.2 The Query Pipeline — How Operators Chain Left to Right

3-4 hours · Module 1 · Free

Operational Objective

This subsection teaches the query pipeline — how operators chain left to right — a core KQL skill for security investigation and detection engineering in Microsoft Sentinel and Defender XDR. Every concept is demonstrated against security log data from the Northgate Engineering environment.

Deliverable: Working proficiency with the KQL operators and patterns covered in this subsection, demonstrated through security investigation exercises.

Estimated completion: 25 minutes

Figure 1.2 — Operational workflow from input through documented output.

The Query Pipeline — How Operators Chain Left to Right

Introduction

Figure — The Query Pipeline — How Operators Chain. Applied to security investigation workflows at NE.

KQL is a pipeline language. You start with a data source (a table), then pipe the data through a sequence of operators using the | (pipe) character. Each operator takes the table from the previous step, transforms it, and passes the result to the next operator. The final operator's output is the query result.

This is fundamentally different from SQL, where you write a declarative statement and the engine decides the execution order. In KQL, you control the execution order explicitly — the operators execute top to bottom, left to right, exactly as you write them. The order you choose affects both the result and the performance.

The anatomy of a KQL query

SigninLogs                              // Step 1: start with a table
| where TimeGenerated > ago(24h)       // Step 2: filter rows by time
| where ResultType != 0                // Step 3: filter to failures only
| extend Country = tostring(           // Step 4: extract country from
    LocationDetails.countryOrRegion)   //         dynamic column
| summarize                            // Step 5: aggregate
    FailCount = count(),
    DistinctUsers = dcount(UserPrincipalName)
    by IPAddress, Country
| where FailCount > 20                 // Step 6: filter aggregated results
| sort by FailCount desc               // Step 7: order output

// SLOW: summarize all data, then filter
SigninLogs
| where TimeGenerated > ago(30d)
| summarize count() by UserPrincipalName, IPAddress
| where UserPrincipalName == "j.morrison@northgateeng.com"

// FAST: filter first, then summarize the small dataset
SigninLogs
| where TimeGenerated > ago(30d)
| where UserPrincipalName == "j.morrison@northgateeng.com"
| summarize count() by IPAddress

// Extend BEFORE summarize: Country is available for grouping
SigninLogs
| where TimeGenerated > ago(24h)
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize count() by Country

// Extend AFTER summarize: Country does not exist yet — ERROR
SigninLogs
| where TimeGenerated > ago(24h)
| summarize count() by UserPrincipalName
| extend Country = tostring(LocationDetails.countryOrRegion) // ERROR: LocationDetails doesn't exist after summarize

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != "0"
| extend Country = tostring(LocationDetails.countryOrRegion)
// | summarize FailCount = count() by IPAddress, Country   // <-- commenting this out
// | where FailCount > 20                                    // <-- and this
// | sort by FailCount desc                                  // <-- and this

// Run each of these separately:
SigninLogs | where TimeGenerated > ago(24h) | count
// → 1,200,000

SigninLogs | where TimeGenerated > ago(24h) | where ResultType != "0" | count
// → 45,000

SigninLogs | where TimeGenerated > ago(24h) | where ResultType != "0" 
| extend Country = tostring(LocationDetails.countryOrRegion) | count
// → 45,000 (extend does not change row count)

SigninLogs | where TimeGenerated > ago(24h) | where ResultType != "0"
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize FailCount = count() by IPAddress, Country | count
// → 3,200 (aggregation reduces to unique groups)

// SLOW: join two full tables (millions × millions)
SigninLogs
| where TimeGenerated > ago(24h)
| join kind=inner OfficeActivity on $left.UserPrincipalName == $right.UserId

// FAST: aggregate each, then join the summaries (thousands × thousands)
let signInSummary = SigninLogs
    | where TimeGenerated > ago(24h)
    | summarize SignIns = count() by UserPrincipalName;
let activitySummary = OfficeActivity
    | where TimeGenerated > ago(24h)
    | summarize Actions = count() by UserId;
signInSummary
| join kind=leftouter activitySummary 
    on $left.UserPrincipalName == $right.UserId

// Query A: extend THEN where
SigninLogs
| where TimeGenerated > ago(24h)
| extend Country = tostring(LocationDetails.countryOrRegion)
| where Country == "US"
| summarize count() by UserPrincipalName

// Query B: where THEN extend
SigninLogs
| where TimeGenerated > ago(24h)
| where tostring(LocationDetails.countryOrRegion) == "US"
| summarize count() by UserPrincipalName

// Query C: summarize THEN extend
SigninLogs
| where TimeGenerated > ago(24h)
| summarize TotalEvents = count(), FailCount = countif(ResultType != "0") by UserPrincipalName
| extend FailRate = round(100.0 * FailCount / TotalEvents, 1)
| where FailRate > 50

// Query D: extend THEN summarize (WRONG)
SigninLogs
| where TimeGenerated > ago(24h)
| extend FailRate = round(100.0 * countif(ResultType != "0") / count(), 1)  // ERROR
| where FailRate > 50

// Step 1: start with the table and time filter — verify data exists
SigninLogs
| where TimeGenerated > ago(1h)
| count
// → 12,500 rows. Good — data exists.

// Step 2: add the entity filter — verify the user has activity
SigninLogs
| where TimeGenerated > ago(1h)
| where UserPrincipalName =~ "suspect@northgateeng.com"
| count
// → 47 rows. Good — the user has recent sign-ins.

// Step 3: add projection — see the relevant columns
SigninLogs
| where TimeGenerated > ago(1h)
| where UserPrincipalName =~ "suspect@northgateeng.com"
| project TimeGenerated, IPAddress, AppDisplayName, ResultType, 
    tostring(LocationDetails.countryOrRegion)
// → 47 rows with 5 columns. Review the data — what patterns emerge?

// Step 4: add aggregation based on what you observed
SigninLogs
| where TimeGenerated > ago(1h)
| where UserPrincipalName =~ "suspect@northgateeng.com"
| summarize count() by IPAddress, tostring(LocationDetails.countryOrRegion)
// → 3 rows: 2 IPs from GB, 1 IP from NG. The Nigerian IP is suspicious.

// Step 5: focus on the suspicious IP
SigninLogs
| where TimeGenerated > ago(1h)
| where UserPrincipalName =~ "suspect@northgateeng.com"
| where IPAddress == "198.51.100.44"
| project TimeGenerated, AppDisplayName, ResultType
| sort by TimeGenerated asc
// → Timeline of the adversary session.

Expand for Deeper Context

Each step produces a table that becomes the input for the next step:

Step	Input rows	Operation	Output rows	Output columns
1	All SigninLogs	Table reference	All rows	All columns
2	All rows	Time filter	Last 24h only	All columns
3	Last 24h	Status filter	Failures only	All columns
4	Failures	Add column	Failures	All columns + Country
5	Failures	Aggregate	One row per IP+Country	IPAddress, Country, FailCount, DistinctUsers
6	Aggregated	Filter	Only IPs with >20 fails	Same 4 columns
7	Filtered	Sort	Same rows, ordered	Same 4 columns

Why order matters

The order of operators changes both the result and the performance.

Performance impact — filter early:

The first query summarizes every user across 30 days, then filters to one user. The second query filters to one user first, then summarizes only their data. The result is the same, but the second query processes orders of magnitude less data.

Rule: place where filters as early as possible in the pipeline. The earlier you reduce the row count, the less work every subsequent operator does.

Result impact — extend before or after summarize:

After summarize, the only columns that exist are the by columns and the aggregation columns. All other columns from the original table are gone. If you need a column in the output, you must either include it in the by clause, aggregate it (e.g., make_set()), or compute it before the summarize.

Debugging the pipeline — isolating operator behavior

When a query produces unexpected results, the debugging technique is always the same: comment out operators from the bottom up until the output makes sense, then re-add them one at a time to find the operator that introduces the unexpected behavior.

Run the partial query. If the filtered rows look correct, uncomment the summarize and run again. If the aggregated output looks wrong, the issue is in the summarize — perhaps the by columns are not what you expected, or the aggregation function is not producing the right values.

This "peel from the bottom" technique works because the pipeline is sequential — every operator depends only on what came before it. Removing the last operator cannot change the behavior of earlier operators.

The | count debugging trick:

Add | count after each operator to see how many rows survive each stage:

The row count at each stage tells you exactly where the data volume changes. If a where clause drops more rows than expected, that filter is either too aggressive or the data does not contain what you assumed.

Pipeline optimization principles

Beyond "filter early," two additional optimization principles govern pipeline construction:

Principle 2: Reduce column width early.

Wide rows (many columns) consume more memory at every stage. If you only need 5 columns from a 50-column table, project after the first where to narrow the data carried through the pipeline. This matters most for tables like DeviceProcessEvents that have 30+ columns including large string fields (ProcessCommandLine).

Principle 3: Aggregate before joining.

If you need to join two large tables, aggregate each table first to reduce the row count, then join the smaller aggregated tables:

The aggregated version joins ~500 rows to ~500 rows instead of 1,000,000 to 500,000. Module 4 covers join performance in depth.

The operator execution order matters for correctness, not just performance

Consider these two queries that look similar but produce different results:

Both produce identical results. But now consider:

Query D fails because countif and count are aggregation functions — they cannot be used inside extend. Aggregation functions only work inside summarize. The pipeline enforces this: extend operates on individual rows, summarize operates on groups of rows. You must group first (summarize), then compute derived metrics on the grouped output (extend).

Building a pipeline incrementally during investigation

Experienced analysts do not write complete queries from scratch. They build incrementally, adding one operator at a time and verifying the output at each step:

This five-step sequence takes 2 minutes. Each step validates the previous step's output before adding complexity. If Step 2 returned zero rows, you would adjust the time range or check the UPN spelling — not add 4 more operators to a query that has no data.

This is the pipeline methodology: start broad, verify, narrow, verify, analyze.

The operator categories

Every KQL operator falls into one of these categories:

Tabular source operators — produce the initial table:

Operator	What it does
Table name (e.g., `SigninLogs`)	Returns all rows from the named table
`union`	Combines rows from multiple tables
`externaldata`	Reads data from an external file
`datatable`	Creates a literal table inline
`find`	Searches across multiple tables for a pattern

Filter operators — reduce the number of rows:

Operator	What it does
`where`	Keeps rows that match a condition
`take` / `limit`	Keeps only N rows
`top`	Keeps N rows with highest/lowest values
`distinct`	Removes duplicate rows
`sample`	Returns a random sample of rows

Transform operators — change the columns:

Operator	What it does
`extend`	Adds new columns (original columns preserved)
`project`	Selects and renames columns (unlisted columns dropped)
`project-away`	Removes specific columns (others preserved)
`project-rename`	Renames columns
`project-reorder`	Changes column order
`parse`	Extracts fields from a string column
`mv-expand`	Expands a dynamic array into one row per element

Aggregation operators — reduce rows by grouping:

Operator	What it does
`summarize`	Groups rows and computes aggregations
`make-series`	Creates time-series data from events
`count`	Returns the total row count

Sort operators — order the rows:

Operator	What it does
`sort by` / `order by`	Orders rows by column values
`top`	Combines sort + take in one operation

Join operators — combine tables:

Operator	What it does
`join`	Combines rows from two tables on matching keys
`lookup`	Enriches rows from a reference table
`union`	Concatenates rows from multiple tables

You do not need to memorise these tables — they are reference. What you need to understand is that every operator has a defined input shape (a table) and output shape (a table), and the pipeline chains these transformations sequentially.

SigninLogs                                         // | count → ?
| where TimeGenerated > ago(24h)                  // | count → ?
| where ResultType != 0                           // | count → ?
| summarize count() by IPAddress                  // | count → ?
| where count_ > 10                               // | count → ?

Try it yourself

Take a query you wrote recently and add | count after eve...

Take a query you wrote recently and add | count after every operator to see how the row count changes at each stage. For example:

Run each version (uncomment one | count at a time). This shows you exactly how many rows each operator processes — and where the biggest reductions happen. The operator with the largest row reduction is your most important filter.

Check your understanding

1. You write a query: SigninLogs | summarize count() by IPAddress | where IPAddress has "10.0.". A colleague suggests rewriting it as: SigninLogs | where IPAddress has "10.0." | summarize count() by IPAddress. Both return the same result. Which is better and why?

The colleague's version is better. It filters to "10.0." IPs first, then summarizes only those rows. Your version summarizes ALL rows (computing counts for every IP address in the table), then discards all the non-"10.0." results. The total work is much higher — the summarize processes every row in the table when it only needs to process the subset matching "10.0.". This is the "filter early" principle: place where clauses as early in the pipeline as possible to reduce the data volume before expensive operations like summarize.

Your version is better because summarize creates an index

They are identical in performance because the engine optimizes both

Filter early. The KQL engine does perform some automatic optimization, but placing filters before aggregation is a habit that consistently produces faster queries — especially on large datasets where the difference is seconds vs minutes.

Compliance Myth: "You can learn the query pipeline from documentation alone"

The myth: Reading the Microsoft KQL documentation is sufficient to learn this concept. The documentation explains the syntax — what more do you need?

The reality: Documentation teaches SYNTAX. This course teaches APPLICATION. The KQL docs show how summarize works with generic examples. This course shows how summarize reveals a password spray pattern hiding in 45,000 sign-in events. The difference between knowing the syntax and knowing WHEN and WHY to use it in a security context is the difference between an analyst who copies queries and one who writes them. Every operator in this course is taught through a security investigation scenario — not abstract data manipulation.

Troubleshooting

"The query returns an error I do not understand." KQL error messages reference the specific line and operator that failed. Read the error message from left to right: it names the operator, the expected input type, and the actual input type. Most errors are type mismatches (passing a string where a datetime is expected) or field name typos. The getschema operator shows every field name and type for any table: TableName | getschema.

"The query runs but returns unexpected results." Add | take 10 after each operator in the pipeline and examine the intermediate output. This reveals WHERE the data transforms in a way you did not expect. Debug the pipeline stage by stage, not the entire query at once.

Decision point

You are writing a KQL query and cannot remember the exact operator syntax. Do you check the Microsoft documentation or ask Claude?

Check the documentation first for SYNTAX questions (exact parameter names, return types, operator behavior). The KQL documentation at learn.microsoft.com/kusto is the authoritative source. Claude may generate plausible but incorrect syntax — especially for newer operators or recently changed behavior. Use Claude for DESIGN questions: 'I need to detect accounts with sign-ins from more than 3 countries in 24 hours — what is the best KQL approach?' Claude excels at designing the query strategy; the documentation excels at providing the exact syntax.

You've learned how KQL processes data.

K0 gave you the query language's place in the Microsoft security stack. K1 took you through the semantics — tables, operators, the pipe model, and why KQL isn't SQL. Now you write the queries that find what attackers hope you miss.

12 modules of query craft — filtering and shaping, joins and unions, time-series analysis, summarisation, string manipulation, and geospatial analysis
68 KQL exercises — every one with a realistic dataset, a reference solution, and a discussion of alternative approaches
K11 — Threat Hunting with KQL — the course's flagship module. Hypothesis-driven methodology, MITRE ATT&CK-aligned hunting across 7 techniques, UEBA composite risk scoring, and retroactive IOC sweeps
K13 Capstone — The Hunting Lab — three complete investigation scenarios requiring every query skill from the course
Hunt management and ROI metrics — the operating model that justifies KQL hunt programs to leadership

Unlock the full course with Premium See Full Syllabus

← Previous Next →