6.3 Build Multi-Table Statements Using KQL
Build Multi-Table Statements Using KQL
Domain 4 — Manage Security Threats: "Create custom hunting queries by using KQL." Multi-table queries are tested directly — you will see exam scenarios requiring you to correlate data from different sources using join and union.
Introduction
Until now, every query you have written targets a single table. But real investigations do not live in one table. A phishing attack spans email data (EmailEvents), sign-in data (SigninLogs), endpoint data (DeviceProcessEvents), and cloud app data (CloudAppEvents). To trace the full attack chain — from phishing email to credential theft to mailbox access to lateral movement — you must query across tables and connect the results.
This subsection teaches the two operators that make cross-table analysis possible: union and join. These are the operators that transform KQL from a log viewer into an investigation engine. The queries in Modules 11-15 (the real-world investigation modules) depend heavily on what you learn here.
Here is what this subsection covers:
union— combining rows from multiple tables into a single result set, useful when the same type of data exists in multiple tablesjoin— matching rows between two tables on a shared key, used to correlate different types of events that share a common entity (user, IP, device)- Join types — how
inner,leftouter,leftanti, and other join kinds control which rows appear in the result, and which join type to use for each investigation scenario - Investigation pattern: email-to-signin correlation — the cross-table query that connects a phishing email to the resulting credential compromise, used in Module 11
Each operator is taught with an investigation scenario that demonstrates why you need it.
The union operator — combining tables vertically
What it is
union takes two or more tables and stacks their rows together into a single result set. Think of it as appending one spreadsheet below another. The columns are merged — columns that exist in both tables appear once, columns unique to one table appear with null values for the other.
Why it matters
Microsoft stores sign-in data in two separate tables: SigninLogs (interactive — user signs in with a browser) and AADNonInteractiveUserSignInLogs (non-interactive — an app refreshes a token). To see ALL sign-in activity for a user, you must query both tables. Without union, you only see half the picture.
How it works
| |
| Type | EventCount |
|---|---|
| SigninLogs | 12 |
| AADNonInteractiveUserSignInLogs | 847 |
union, you would miss the compromise.Wildcard union
When you need to search across many tables with a common naming pattern:
| |
This unions all tables starting with “Device” — DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceLogonEvents, DeviceInfo, DeviceEvents. Useful for broad device investigation when you do not yet know which table contains the evidence.
union * (all tables) should be avoided in production — it scans everything. union Device* is acceptable because it limits to a known table family. Use wildcards for discovery, then narrow to specific tables once you know where the data lives.
The join operator — correlating across tables
What it is
join matches rows between two tables based on a shared column value. Unlike union (which stacks rows vertically), join connects rows horizontally — combining columns from both tables into a single row where the key matches.
Why it matters
This is the most powerful investigation tool in KQL. It answers questions that no single table can: “Which users who received the phishing email also had suspicious sign-ins afterward?” “Which devices that triggered a malware alert also made network connections to the C2 IP?” These cross-table correlations are what separate an analyst who can use one tool at a time from one who can trace an entire attack chain.
How it works — the basic join
| |
Read this query carefully — it is the foundation of the Module 11 AiTM investigation.
Line by line:
- Start with EmailEvents, filter to emails from the phishing domain
- Keep only the relevant email columns, rename TimeGenerated to avoid ambiguity
- Join with SigninLogs, filtered to successful sign-ins
- Match where the email recipient equals the sign-in user
- Filter to sign-ins that occurred within 1 hour AFTER the email was received
- Display the correlated timeline
| EmailTime | SigninTime | Recipient | Subject | IPAddress | App |
|---|---|---|---|---|---|
| 08:02 | 08:14 | j.morrison@... | New voicemail... | 198.51.100.44 | Exchange Online |
| 08:05 | 08:22 | s.patel@... | New voicemail... | 198.51.100.44 | Exchange Online |
inner for confirmed correlations. leftouter for enrichment with optional match data. leftanti for finding what is absent — the most powerful detection pattern.Join types — choosing the right one
The kind= parameter controls which rows appear in the result. This is critical for investigation — different join types answer different questions.
| Join kind | What it returns | Investigation use | Example question it answers |
|---|---|---|---|
inner | Only rows that match in BOTH tables | “Which users who received the email were also compromised?” | Email recipients who also have suspicious sign-ins |
leftouter | All rows from the left table, with matches from the right (null if no match) | “Show all email recipients and whether they were compromised” | All phishing targets with compromise status |
leftanti | Rows from the left table that have NO match in the right | “Which email recipients were NOT compromised?” | Users who received phishing but show no suspicious sign-in |
rightouter | All rows from the right table, with matches from the left | Less common — same as leftouter with tables swapped | — |
fullouter | All rows from both tables, matching where possible | Rare in investigation | Complete picture of both datasets |
inner — "show me the matches" (both sides must exist). leftouter — "show me everything from the left with matches from the right where available" (preserves all left rows). leftanti — "show me what's in the left that's NOT in the right" (the exclusion pattern). These three handle almost every investigation correlation.
The leftanti join — the exclusion pattern
leftanti is particularly powerful for security. It returns rows from the left table that have NO match in the right table. This answers “what is missing?” — which is often more important than “what is present.”
Investigation pattern: token replay detection.
In the Module 11 AiTM investigation, the attacker replayed a stolen token. The stolen token had MFA satisfied (it was captured during the legitimate user’s MFA completion on the proxy). The attacker’s sign-in has AuthenticationRequirement = singleFactorAuthentication (token replay, no MFA challenge) while the legitimate user’s sign-in has AuthenticationRequirement = multiFactorAuthentication.
But the cleaner detection is: find sign-ins that succeeded WITHOUT a corresponding MFA event.
| |
| SigninTime | UserPrincipalName | IPAddress |
|---|---|---|
| 08:14 | j.morrison@... | 198.51.100.44 |
leftanti join returns sign-ins that exist in the success table but do NOT have a matching MFA event. In a properly configured environment, every interactive sign-in should trigger MFA. A success without MFA means either: (1) MFA is not required for this user/app (check your conditional access policies), or (2) a stolen token was replayed, bypassing the MFA prompt. This is the token replay detection pattern from Module 13.Try it yourself
union query that combines SigninLogs and AADNonInteractiveUserSignInLogs for a specific user over the last 7 days. Summarize the total event count and unique IP count by table name (Type). Then write a separate join query that correlates EmailEvents with SigninLogs to find users who received an email from any external sender and then had a sign-in from a new IP within 30 minutes.Union query:
| |
Join query:
| |
The join query is the investigation backbone for any phishing analysis — it connects the email delivery to the subsequent compromise.
Common join mistakes
| Mistake | What happens | How to fix |
|---|---|---|
Column name conflict — both tables have TimeGenerated | Query fails or produces ambiguous results | Rename columns with project before the join: project EmailTime = TimeGenerated |
| Joining on a column with many duplicates | Cartesian explosion — 100 left rows × 100 right rows = 10,000 result rows | Add additional join conditions or filter more aggressively before the join |
Using inner when you need leftouter | Missing rows that do not have a match — you see only compromised users, not all targeted users | Choose the join kind based on your question: “show matches” = inner, “show all with match status” = leftouter |
Forgetting the $left. / $right. syntax for different column names | Query fails with column not found | When join columns have different names, use on $left.ColumnA == $right.ColumnB |
Check your understanding
1. You need to see ALL sign-in activity for a user — both interactive (browser) and non-interactive (token refresh). Which operator and which tables?
union stacks tables vertically (same type of data from multiple sources). join connects tables horizontally (different types of data about the same entity). Sign-in data split across two tables = union. Correlating email data with sign-in data = join.2. You want to find users who received a phishing email but were NOT compromised (no suspicious sign-in). Which join kind?
leftanti is the "exclusion" join — it finds what is NOT there. This is invaluable in security: "who was targeted but not compromised" is as important as "who was compromised" because it tells you the scope of the attack and which users are safe.3. Your email-to-signin join returns 50,000 rows when you expected 20. What went wrong?
join. A user who received 10 emails and had 5,000 sign-ins in a week produces 50,000 rows on an unconstrained join. The time window filter is essential — it limits matches to sign-ins that occurred within a relevant window after the email, cutting the result set dramatically.