6.3 Build Multi-Table Statements Using KQL

8-12 hours · Module 6 · Free

Build Multi-Table Statements Using KQL

SC-200 Exam Objective

Domain 4 — Manage Security Threats: "Create custom hunting queries by using KQL." Multi-table queries are tested directly — you will see exam scenarios requiring you to correlate data from different sources using join and union.

Introduction

Until now, every query you have written targets a single table. But real investigations do not live in one table. A phishing attack spans email data (EmailEvents), sign-in data (SigninLogs), endpoint data (DeviceProcessEvents), and cloud app data (CloudAppEvents). To trace the full attack chain — from phishing email to credential theft to mailbox access to lateral movement — you must query across tables and connect the results.

This subsection teaches the two operators that make cross-table analysis possible: union and join. These are the operators that transform KQL from a log viewer into an investigation engine. The queries in Modules 11-15 (the real-world investigation modules) depend heavily on what you learn here.

Here is what this subsection covers:

  • union — combining rows from multiple tables into a single result set, useful when the same type of data exists in multiple tables
  • join — matching rows between two tables on a shared key, used to correlate different types of events that share a common entity (user, IP, device)
  • Join types — how inner, leftouter, leftanti, and other join kinds control which rows appear in the result, and which join type to use for each investigation scenario
  • Investigation pattern: email-to-signin correlation — the cross-table query that connects a phishing email to the resulting credential compromise, used in Module 11

Each operator is taught with an investigation scenario that demonstrates why you need it.


UNION vs JOIN — TWO WAYS TO COMBINE TABLESUNION (stack vertically)SigninLogsAADNonInteractiveCombined rows12 interactive+ 847 non-interactive= 859 totalSame data type, different tablesJOIN (connect horizontally)EmailEventsSigninLogson UserCorrelated rowsEmail + Sign-in for same user= Attack chainDifferent data types, shared entity
Figure 6.7: Union stacks rows vertically (same data type from multiple tables). Join connects rows horizontally on a shared key (different data types about the same entity). Union for "all sign-in data." Join for "email that led to sign-in compromise."

The union operator — combining tables vertically

What it is

union takes two or more tables and stacks their rows together into a single result set. Think of it as appending one spreadsheet below another. The columns are merged — columns that exist in both tables appear once, columns unique to one table appear with null values for the other.

Why it matters

Microsoft stores sign-in data in two separate tables: SigninLogs (interactive — user signs in with a browser) and AADNonInteractiveUserSignInLogs (non-interactive — an app refreshes a token). To see ALL sign-in activity for a user, you must query both tables. Without union, you only see half the picture.

How it works

1
2
3
4
union SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(24h)
| where UserPrincipalName has "morrison"
| summarize EventCount = count() by Type
Expected Output
TypeEventCount
SigninLogs12
AADNonInteractiveUserSignInLogs847
What to look for: 12 interactive sign-ins and 847 non-interactive. The non-interactive events are token refreshes from applications the user is already signed into. If you only query SigninLogs, you miss 98.6% of the authentication activity. During the Module 11 AiTM investigation, the token replay that confirmed the attacker's access appeared in AADNonInteractiveUserSignInLogs, not SigninLogs. Without union, you would miss the compromise.

Wildcard union

When you need to search across many tables with a common naming pattern:

1
2
3
union Device*
| where TimeGenerated > ago(1h)
| summarize EventCount = count() by Type

This unions all tables starting with “Device” — DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceLogonEvents, DeviceInfo, DeviceEvents. Useful for broad device investigation when you do not yet know which table contains the evidence.

Wildcard unions can be slow on large workspaces

union * (all tables) should be avoided in production — it scans everything. union Device* is acceptable because it limits to a known table family. Use wildcards for discovery, then narrow to specific tables once you know where the data lives.


The join operator — correlating across tables

What it is

join matches rows between two tables based on a shared column value. Unlike union (which stacks rows vertically), join connects rows horizontally — combining columns from both tables into a single row where the key matches.

Why it matters

This is the most powerful investigation tool in KQL. It answers questions that no single table can: “Which users who received the phishing email also had suspicious sign-ins afterward?” “Which devices that triggered a malware alert also made network connections to the C2 IP?” These cross-table correlations are what separate an analyst who can use one tool at a time from one who can trace an entire attack chain.

How it works — the basic join

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
EmailEvents
| where TimeGenerated > ago(7d)
| where SenderFromDomain == "northgate-voicemail.com"
| project EmailTime = TimeGenerated, RecipientEmailAddress, Subject
| join kind=inner (
    SigninLogs
    | where TimeGenerated > ago(7d)
    | where ResultType == "0"
    | project SigninTime = TimeGenerated, UserPrincipalName, IPAddress, AppDisplayName
) on $left.RecipientEmailAddress == $right.UserPrincipalName
| where SigninTime between (EmailTime .. (EmailTime + 1h))
| project EmailTime, SigninTime, RecipientEmailAddress, Subject, IPAddress, AppDisplayName

Read this query carefully — it is the foundation of the Module 11 AiTM investigation.

Line by line:

  1. Start with EmailEvents, filter to emails from the phishing domain
  2. Keep only the relevant email columns, rename TimeGenerated to avoid ambiguity
  3. Join with SigninLogs, filtered to successful sign-ins
  4. Match where the email recipient equals the sign-in user
  5. Filter to sign-ins that occurred within 1 hour AFTER the email was received
  6. Display the correlated timeline
Expected Output — Phishing Email → Credential Compromise Correlation
EmailTimeSigninTimeRecipientSubjectIPAddressApp
08:0208:14j.morrison@...New voicemail...198.51.100.44Exchange Online
08:0508:22s.patel@...New voicemail...198.51.100.44Exchange Online
This is the smoking gun: j.morrison received the phishing email at 08:02 and had a successful sign-in from the attacker IP at 08:14 (12 minutes later). s.patel received it at 08:05 and was compromised at 08:22 (17 minutes later). Both users clicked the phishing link, entered their credentials on the AiTM proxy, and the attacker replayed the stolen tokens from 198.51.100.44. One query connected the phishing email to the credential compromise — this is why KQL joins are the most important skill for incident investigation.
JOIN KINDS — WHAT EACH RETURNSinner"Show me the matches"Only rows in BOTH tablesleftouter"Show all left + matches"All left rows, match info if existsleftanti"Show what's NOT matched"Left rows with NO match in right
Figure 6.8: The three join kinds used in 95% of investigations. inner for confirmed correlations. leftouter for enrichment with optional match data. leftanti for finding what is absent — the most powerful detection pattern.

Join types — choosing the right one

The kind= parameter controls which rows appear in the result. This is critical for investigation — different join types answer different questions.

Join kindWhat it returnsInvestigation useExample question it answers
innerOnly rows that match in BOTH tables“Which users who received the email were also compromised?”Email recipients who also have suspicious sign-ins
leftouterAll rows from the left table, with matches from the right (null if no match)“Show all email recipients and whether they were compromised”All phishing targets with compromise status
leftantiRows from the left table that have NO match in the right“Which email recipients were NOT compromised?”Users who received phishing but show no suspicious sign-in
rightouterAll rows from the right table, with matches from the leftLess common — same as leftouter with tables swapped
fullouterAll rows from both tables, matching where possibleRare in investigationComplete picture of both datasets
The three join kinds you will use 95% of the time

inner — "show me the matches" (both sides must exist). leftouter — "show me everything from the left with matches from the right where available" (preserves all left rows). leftanti — "show me what's in the left that's NOT in the right" (the exclusion pattern). These three handle almost every investigation correlation.

The leftanti join — the exclusion pattern

leftanti is particularly powerful for security. It returns rows from the left table that have NO match in the right table. This answers “what is missing?” — which is often more important than “what is present.”

Investigation pattern: token replay detection.

In the Module 11 AiTM investigation, the attacker replayed a stolen token. The stolen token had MFA satisfied (it was captured during the legitimate user’s MFA completion on the proxy). The attacker’s sign-in has AuthenticationRequirement = singleFactorAuthentication (token replay, no MFA challenge) while the legitimate user’s sign-in has AuthenticationRequirement = multiFactorAuthentication.

But the cleaner detection is: find sign-ins that succeeded WITHOUT a corresponding MFA event.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
let successfulSignins =
    SigninLogs
    | where TimeGenerated > ago(24h)
    | where ResultType == "0"
    | where UserPrincipalName has "morrison"
    | project SigninTime = TimeGenerated, CorrelationId, UserPrincipalName, IPAddress;
let mfaEvents =
    AADNonInteractiveUserSignInLogs
    | where TimeGenerated > ago(24h)
    | where UserPrincipalName has "morrison"
    | where AuthenticationRequirement == "multiFactorAuthentication"
    | project CorrelationId;
successfulSignins
| join kind=leftanti mfaEvents on CorrelationId
| project SigninTime, UserPrincipalName, IPAddress
Expected Output — Sign-ins Without MFA (Potential Token Replay)
SigninTimeUserPrincipalNameIPAddress
08:14j.morrison@...198.51.100.44
What to look for: A successful sign-in with no corresponding MFA event. The leftanti join returns sign-ins that exist in the success table but do NOT have a matching MFA event. In a properly configured environment, every interactive sign-in should trigger MFA. A success without MFA means either: (1) MFA is not required for this user/app (check your conditional access policies), or (2) a stolen token was replayed, bypassing the MFA prompt. This is the token replay detection pattern from Module 13.

Try it yourself

Write a union query that combines SigninLogs and AADNonInteractiveUserSignInLogs for a specific user over the last 7 days. Summarize the total event count and unique IP count by table name (Type). Then write a separate join query that correlates EmailEvents with SigninLogs to find users who received an email from any external sender and then had a sign-in from a new IP within 30 minutes.

Union query:

1
2
3
4
5
6
7
union SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(7d)
| where UserPrincipalName has "morrison"
| summarize
    EventCount = count(),
    UniqueIPs = dcount(IPAddress)
    by Type

Join query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
EmailEvents
| where TimeGenerated > ago(7d)
| where SenderFromDomain != "yourdomain.onmicrosoft.com"
| project EmailTime = TimeGenerated, RecipientEmailAddress, SenderFromDomain, Subject
| join kind=inner (
    SigninLogs
    | where TimeGenerated > ago(7d)
    | where ResultType == "0"
    | project SigninTime = TimeGenerated, UserPrincipalName, IPAddress
) on $left.RecipientEmailAddress == $right.UserPrincipalName
| where SigninTime between (EmailTime .. (EmailTime + 30m))
| project EmailTime, SigninTime, RecipientEmailAddress, SenderFromDomain, IPAddress
| order by EmailTime asc

The join query is the investigation backbone for any phishing analysis — it connects the email delivery to the subsequent compromise.

Common join mistakes

MistakeWhat happensHow to fix
Column name conflict — both tables have TimeGeneratedQuery fails or produces ambiguous resultsRename columns with project before the join: project EmailTime = TimeGenerated
Joining on a column with many duplicatesCartesian explosion — 100 left rows × 100 right rows = 10,000 result rowsAdd additional join conditions or filter more aggressively before the join
Using inner when you need leftouterMissing rows that do not have a match — you see only compromised users, not all targeted usersChoose the join kind based on your question: “show matches” = inner, “show all with match status” = leftouter
Forgetting the $left. / $right. syntax for different column namesQuery fails with column not foundWhen join columns have different names, use on $left.ColumnA == $right.ColumnB

Check your understanding

1. You need to see ALL sign-in activity for a user — both interactive (browser) and non-interactive (token refresh). Which operator and which tables?

union SigninLogs, AADNonInteractiveUserSignInLogs — combines both tables vertically. Without this, you miss ~98% of authentication events (the non-interactive token refreshes). In the Module 11 investigation, the token replay evidence was in the non-interactive table, not SigninLogs.
join SigninLogs with AADNonInteractiveUserSignInLogs
Query only SigninLogs — it contains all sign-in data
search "morrison" across all tables

2. You want to find users who received a phishing email but were NOT compromised (no suspicious sign-in). Which join kind?

inner — shows only matches (only compromised users)
leftouter — shows all recipients with compromise info where available
leftanti — returns rows from the left (email recipients) that have NO match in the right (sign-in data). These are the users who received the phishing email but did not have a subsequent suspicious sign-in — they either did not click, or clicked but did not enter credentials.
fullouter — shows everything from both tables

3. Your email-to-signin join returns 50,000 rows when you expected 20. What went wrong?

Cartesian join explosion. The join key has too many matching rows on both sides — every email row is matching with every sign-in row for the same user, creating a product. Fix: add a time window constraint (where SigninTime between (EmailTime .. (EmailTime + 1h))) or additional join conditions to narrow the matches.
The query is looking at too many days
You should use union instead of join
The tables are too large