6.3 Build Multi-Table Statements Using KQL

8-12 hours · Module 6 · Free

Build Multi-Table Statements Using KQL

SC-200 Exam Objective

Domain 4 — Manage Security Threats: "Create custom hunting queries by using KQL." Multi-table queries are tested directly — you will see exam scenarios requiring you to correlate data from different sources using join and union.

Introduction

Until now, every query you have written targets a single table. But real investigations do not live in one table. A phishing attack spans email data (EmailEvents), sign-in data (SigninLogs), endpoint data (DeviceProcessEvents), and cloud app data (CloudAppEvents). To trace the full attack chain — from phishing email to credential theft to mailbox access to lateral movement — you must query across tables and connect the results.

This subsection teaches the two operators that make cross-table analysis possible: union and join. These are the operators that transform KQL from a log viewer into an investigation engine. The queries in Modules 11-15 (the real-world investigation modules) depend heavily on what you learn here.

Here is what this subsection covers:

union — combining rows from multiple tables into a single result set, useful when the same type of data exists in multiple tables
join — matching rows between two tables on a shared key, used to correlate different types of events that share a common entity (user, IP, device)
Join types — how inner, leftouter, leftanti, and other join kinds control which rows appear in the result, and which join type to use for each investigation scenario
Investigation pattern: email-to-signin correlation — the cross-table query that connects a phishing email to the resulting credential compromise, used in Module 11

Each operator is taught with an investigation scenario that demonstrates why you need it.

Figure 6.7: Union stacks rows vertically (same data type from multiple tables). Join connects rows horizontally on a shared key (different data types about the same entity). Union for "all sign-in data." Join for "email that led to sign-in compromise."

The `union` operator — combining tables vertically

What it is

union takes two or more tables and stacks their rows together into a single result set. Think of it as appending one spreadsheet below another. The columns are merged — columns that exist in both tables appear once, columns unique to one table appear with null values for the other.

Why it matters

Microsoft stores sign-in data in two separate tables: SigninLogs (interactive — user signs in with a browser) and AADNonInteractiveUserSignInLogs (non-interactive — an app refreshes a token). To see ALL sign-in activity for a user, you must query both tables. Without union, you only see half the picture.

How it works

1
2
3
4
union SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(24h)
| where UserPrincipalName has "morrison"
| summarize EventCount = count() by Type

Expected Output

Type	EventCount
SigninLogs	12
AADNonInteractiveUserSignInLogs	847

What to look for: 12 interactive sign-ins and 847 non-interactive. The non-interactive events are token refreshes from applications the user is already signed into. If you only query SigninLogs, you miss 98.6% of the authentication activity. During the Module 11 AiTM investigation, the token replay that confirmed the attacker's access appeared in AADNonInteractiveUserSignInLogs, not SigninLogs. Without union, you would miss the compromise.

Wildcard union

When you need to search across many tables with a common naming pattern:

1
2
3
union Device*
| where TimeGenerated > ago(1h)
| summarize EventCount = count() by Type

This unions all tables starting with “Device” — DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceLogonEvents, DeviceInfo, DeviceEvents. Useful for broad device investigation when you do not yet know which table contains the evidence.

Wildcard unions can be slow on large workspaces

union * (all tables) should be avoided in production — it scans everything. union Device* is acceptable because it limits to a known table family. Use wildcards for discovery, then narrow to specific tables once you know where the data lives.

The `join` operator — correlating across tables

What it is

join matches rows between two tables based on a shared column value. Unlike union (which stacks rows vertically), join connects rows horizontally — combining columns from both tables into a single row where the key matches.

Why it matters

This is the most powerful investigation tool in KQL. It answers questions that no single table can: “Which users who received the phishing email also had suspicious sign-ins afterward?” “Which devices that triggered a malware alert also made network connections to the C2 IP?” These cross-table correlations are what separate an analyst who can use one tool at a time from one who can trace an entire attack chain.

How it works — the basic join

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
EmailEvents
| where TimeGenerated > ago(7d)
| where SenderFromDomain == "northgate-voicemail.com"
| project EmailTime = TimeGenerated, RecipientEmailAddress, Subject
| join kind=inner (
    SigninLogs
    | where TimeGenerated > ago(7d)
    | where ResultType == "0"
    | project SigninTime = TimeGenerated, UserPrincipalName, IPAddress, AppDisplayName
) on $left.RecipientEmailAddress == $right.UserPrincipalName
| where SigninTime between (EmailTime .. (EmailTime + 1h))
| project EmailTime, SigninTime, RecipientEmailAddress, Subject, IPAddress, AppDisplayName

Read this query carefully — it is the foundation of the Module 11 AiTM investigation.

Line by line:

Start with EmailEvents, filter to emails from the phishing domain
Keep only the relevant email columns, rename TimeGenerated to avoid ambiguity
Join with SigninLogs, filtered to successful sign-ins
Match where the email recipient equals the sign-in user
Filter to sign-ins that occurred within 1 hour AFTER the email was received
Display the correlated timeline

Expected Output — Phishing Email → Credential Compromise Correlation

EmailTime	SigninTime	Recipient	Subject	IPAddress	App
08:02	08:14	j.morrison@...	New voicemail...	198.51.100.44	Exchange Online
08:05	08:22	s.patel@...	New voicemail...	198.51.100.44	Exchange Online

This is the smoking gun: j.morrison received the phishing email at 08:02 and had a successful sign-in from the attacker IP at 08:14 (12 minutes later). s.patel received it at 08:05 and was compromised at 08:22 (17 minutes later). Both users clicked the phishing link, entered their credentials on the AiTM proxy, and the attacker replayed the stolen tokens from 198.51.100.44. One query connected the phishing email to the credential compromise — this is why KQL joins are the most important skill for incident investigation.

Figure 6.8: The three join kinds used in 95% of investigations. inner for confirmed correlations. leftouter for enrichment with optional match data. leftanti for finding what is absent — the most powerful detection pattern.

Join types — choosing the right one

The kind= parameter controls which rows appear in the result. This is critical for investigation — different join types answer different questions.

Join kind	What it returns	Investigation use	Example question it answers
`inner`	Only rows that match in BOTH tables	“Which users who received the email were also compromised?”	Email recipients who also have suspicious sign-ins
`leftouter`	All rows from the left table, with matches from the right (null if no match)	“Show all email recipients and whether they were compromised”	All phishing targets with compromise status
`leftanti`	Rows from the left table that have NO match in the right	“Which email recipients were NOT compromised?”	Users who received phishing but show no suspicious sign-in
`rightouter`	All rows from the right table, with matches from the left	Less common — same as leftouter with tables swapped	—
`fullouter`	All rows from both tables, matching where possible	Rare in investigation	Complete picture of both datasets

The three join kinds you will use 95% of the time

inner — "show me the matches" (both sides must exist). leftouter — "show me everything from the left with matches from the right where available" (preserves all left rows). leftanti — "show me what's in the left that's NOT in the right" (the exclusion pattern). These three handle almost every investigation correlation.

The `leftanti` join — the exclusion pattern

leftanti is particularly powerful for security. It returns rows from the left table that have NO match in the right table. This answers “what is missing?” — which is often more important than “what is present.”

Investigation pattern: token replay detection.

In the Module 11 AiTM investigation, the attacker replayed a stolen token. The stolen token had MFA satisfied (it was captured during the legitimate user’s MFA completion on the proxy). The attacker’s sign-in has AuthenticationRequirement = singleFactorAuthentication (token replay, no MFA challenge) while the legitimate user’s sign-in has AuthenticationRequirement = multiFactorAuthentication.

But the cleaner detection is: find sign-ins that succeeded WITHOUT a corresponding MFA event.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
let successfulSignins =
    SigninLogs
    | where TimeGenerated > ago(24h)
    | where ResultType == "0"
    | where UserPrincipalName has "morrison"
    | project SigninTime = TimeGenerated, CorrelationId, UserPrincipalName, IPAddress;
let mfaEvents =
    AADNonInteractiveUserSignInLogs
    | where TimeGenerated > ago(24h)
    | where UserPrincipalName has "morrison"
    | where AuthenticationRequirement == "multiFactorAuthentication"
    | project CorrelationId;
successfulSignins
| join kind=leftanti mfaEvents on CorrelationId
| project SigninTime, UserPrincipalName, IPAddress

Expected Output — Sign-ins Without MFA (Potential Token Replay)

SigninTime	UserPrincipalName	IPAddress
08:14	j.morrison@...	198.51.100.44

What to look for: A successful sign-in with no corresponding MFA event. The leftanti join returns sign-ins that exist in the success table but do NOT have a matching MFA event. In a properly configured environment, every interactive sign-in should trigger MFA. A success without MFA means either: (1) MFA is not required for this user/app (check your conditional access policies), or (2) a stolen token was replayed, bypassing the MFA prompt. This is the token replay detection pattern from Module 13.

Try it yourself

Write a union query that combines SigninLogs and AADNonInteractiveUserSignInLogs for a specific user over the last 7 days. Summarize the total event count and unique IP count by table name (Type). Then write a separate join query that correlates EmailEvents with SigninLogs to find users who received an email from any external sender and then had a sign-in from a new IP within 30 minutes.

Union query:

1
2
3
4
5
6
7
union SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(7d)
| where UserPrincipalName has "morrison"
| summarize
    EventCount = count(),
    UniqueIPs = dcount(IPAddress)
    by Type

Join query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
EmailEvents
| where TimeGenerated > ago(7d)
| where SenderFromDomain != "yourdomain.onmicrosoft.com"
| project EmailTime = TimeGenerated, RecipientEmailAddress, SenderFromDomain, Subject
| join kind=inner (
    SigninLogs
    | where TimeGenerated > ago(7d)
    | where ResultType == "0"
    | project SigninTime = TimeGenerated, UserPrincipalName, IPAddress
) on $left.RecipientEmailAddress == $right.UserPrincipalName
| where SigninTime between (EmailTime .. (EmailTime + 30m))
| project EmailTime, SigninTime, RecipientEmailAddress, SenderFromDomain, IPAddress
| order by EmailTime asc

The join query is the investigation backbone for any phishing analysis — it connects the email delivery to the subsequent compromise.

Common join mistakes

Mistake	What happens	How to fix
Column name conflict — both tables have `TimeGenerated`	Query fails or produces ambiguous results	Rename columns with `project` before the join: `project EmailTime = TimeGenerated`
Joining on a column with many duplicates	Cartesian explosion — 100 left rows × 100 right rows = 10,000 result rows	Add additional join conditions or filter more aggressively before the join
Using `inner` when you need `leftouter`	Missing rows that do not have a match — you see only compromised users, not all targeted users	Choose the join kind based on your question: “show matches” = inner, “show all with match status” = leftouter
Forgetting the `$left.` / `$right.` syntax for different column names	Query fails with column not found	When join columns have different names, use `on $left.ColumnA == $right.ColumnB`

Check your understanding

1. You need to see ALL sign-in activity for a user — both interactive (browser) and non-interactive (token refresh). Which operator and which tables?

union SigninLogs, AADNonInteractiveUserSignInLogs — combines both tables vertically. Without this, you miss ~98% of authentication events (the non-interactive token refreshes). In the Module 11 investigation, the token replay evidence was in the non-interactive table, not SigninLogs.

join SigninLogs with AADNonInteractiveUserSignInLogs

Query only SigninLogs — it contains all sign-in data

search "morrison" across all tables

union stacks tables vertically (same type of data from multiple sources). join connects tables horizontally (different types of data about the same entity). Sign-in data split across two tables = union. Correlating email data with sign-in data = join.

2. You want to find users who received a phishing email but were NOT compromised (no suspicious sign-in). Which join kind?

inner — shows only matches (only compromised users)

leftouter — shows all recipients with compromise info where available

leftanti — returns rows from the left (email recipients) that have NO match in the right (sign-in data). These are the users who received the phishing email but did not have a subsequent suspicious sign-in — they either did not click, or clicked but did not enter credentials.

fullouter — shows everything from both tables

leftanti is the "exclusion" join — it finds what is NOT there. This is invaluable in security: "who was targeted but not compromised" is as important as "who was compromised" because it tells you the scope of the attack and which users are safe.

3. Your email-to-signin join returns 50,000 rows when you expected 20. What went wrong?

Cartesian join explosion. The join key has too many matching rows on both sides — every email row is matching with every sign-in row for the same user, creating a product. Fix: add a time window constraint (where SigninTime between (EmailTime .. (EmailTime + 1h))) or additional join conditions to narrow the matches.

The query is looking at too many days

You should use union instead of join

The tables are too large

Cartesian joins are the most common performance problem with join. A user who received 10 emails and had 5,000 sign-ins in a week produces 50,000 rows on an unconstrained join. The time window filter is essential — it limits matches to sign-ins that occurred within a relevant window after the email, cutting the result set dramatically.

← 6.2 Analyze Query Results Using KQL 6.4 Work with String Data in KQL →

6.3 Build Multi-Table Statements Using KQL

Build Multi-Table Statements Using KQL

Introduction

The union operator — combining tables vertically

What it is

Why it matters

How it works

Wildcard union

The join operator — correlating across tables

What it is

Why it matters

How it works — the basic join

Join types — choosing the right one

The leftanti join — the exclusion pattern

Try it yourself

Common join mistakes

Check your understanding

The `union` operator — combining tables vertically

The `join` operator — correlating across tables

The `leftanti` join — the exclusion pattern