Module 6 — Check My Knowledge (20 questions)
1. You need to find all sign-in events for a specific user across both interactive and non-interactive tables. Which approach is correct?
union SigninLogs, AADNonInteractiveUserSignInLogs | where UserPrincipalName has "targetuser" — union combines both tables, then the filter applies to the combined result. This captures ~98% of authentication events that querying SigninLogs alone would miss.
Query only SigninLogs
join the two tables on UserPrincipalName
search "targetuser"
union stacks rows from multiple tables vertically. join connects rows horizontally on a shared key. For the same type of data in different tables, use union.
2. A query returns 50,000 rows when you expected 50. You are joining EmailEvents with SigninLogs on UserPrincipalName without a time constraint. What happened?
Cartesian join explosion — every email row for a user matches every sign-in row for that user, creating a product. Add a time window: where SigninTime between (EmailTime .. (EmailTime + 1h)) to limit matches to sign-ins within 1 hour of the email.
The tables are too large
You should use union instead
The join key has a typo
Unconstrained joins on high-cardinality columns create combinatorial explosions. Always add time window constraints when correlating event data across tables.
3. You want to detect users whose daily sign-in count is 5x their 30-day average. Why use a ratio instead of a fixed threshold?
Users have different baselines. An executive with 5 daily sign-ins at 25 (5x) is as suspicious as a developer with 100 daily sign-ins at 500 (5x). A fixed threshold of 100 would miss the executive and false-positive on the developer's normal day. Ratio-based detection adapts to each entity's behavior.
Ratios are easier to compute in KQL
Fixed thresholds are not possible in KQL
Microsoft recommends ratio-based detection
Behavioral detection adapts to each entity. Threshold detection treats all entities the same. This principle applies to every anomaly detection pattern — sign-in volume, email volume, file access count, API call frequency.
4. Your CISO asks: "How many phishing emails reached inboxes this month, broken down by week?" Which operators do you need?
where to filter EmailEvents to phishing (ThreatTypes has "Phish") and delivered (DeliveryAction == "Delivered"), summarize count() by bin(TimeGenerated, 7d) to group by week, optionally render barchart for visual presentation.
search "phishing"
join EmailEvents with SigninLogs
union all email tables
This is a filter + aggregate + visualize pattern: where narrows to phishing deliveries, summarize + bin groups by week, render creates the chart. The CISO sees a bar chart. You see the query that produced it.
5. You need to find sign-ins that succeeded WITHOUT a corresponding MFA event. Which join kind?
inner
leftouter
leftanti — returns successful sign-ins that have NO matching MFA event. This is the token replay detection pattern: a sign-in that bypassed MFA because the attacker replayed a stolen token that already contained the MFA claim.
fullouter
leftanti is the "what is missing" operator. Finding sign-ins without MFA events, users without recent password changes, devices without EDR — the exclusion pattern is one of the most powerful detection tools in KQL.
6. A Syslog message reads: "Connection from 198.51.100.44 denied by rule FW-001." How do you extract the IP address?
extract(@"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", 1, SyslogMessage) — the regex matches any IPv4 pattern. Alternatively, parse SyslogMessage with * "Connection from " IP " denied" * if the message format is consistent. Choose parse for consistent formats, extract for variable text.
split(SyslogMessage, " ")[2]
where SyslogMessage has "198.51.100.44"
tostring(SyslogMessage.IP)
Unstructured text requires extraction. extract with regex works for any message format. parse works when the format is consistent. split would work here if the IP is always the third word, but that assumption breaks with different message formats.
7. Your query takes 90 seconds. The first line is SigninLogs | where UserPrincipalName has "morrison". There is no time filter. How do you fix it?
Add | where TimeGenerated > ago(7d) as the FIRST where clause, before the username filter. Sentinel partitions data by time — without a time boundary, the query scans the entire table history. This single change typically reduces execution time from minutes to seconds.
Replace has with contains
Add | take 1000 at the end
Use a different table
Time filter first — the single most impactful KQL optimization. Always the first where clause. Every query in your library should have it.
8. What does dcount(UserPrincipalName) tell you that count() does not?
dcount returns the number of unique users. count returns the total number of events. 100 events with dcount of 1 = 100 attacks on one user (brute force). 100 events with dcount of 100 = 1 attack on each of 100 users (password spray). The distinction identifies the attack type.
dcount is faster
dcount filters duplicates from the output
They return the same value
count vs dcount distinguishes attack patterns. Always use both together when analyzing threat activity.
9. You write | project TimeGenerated, UserPrincipalName on line 3, then | extend Country = tostring(LocationDetails.countryOrRegion) on line 4. What happens?
Error — LocationDetails was removed by project on line 3. extend cannot create a column from a field that no longer exists in the pipeline. Fix: move project to the last line, after all extend operations.
The query runs but Country is null
The query runs normally
extend creates LocationDetails automatically
Filter first, compute second, project last. This rule prevents the most common KQL structural error.
10. Which three join kinds do you use 95% of the time in security investigation, and what does each answer?
inner ("show me the matches" — both sides must exist), leftouter ("show me everything from the left with match info where available"), leftanti ("show me what's in the left that's NOT in the right"). Inner for confirmed correlations, leftouter for status enrichment, leftanti for exclusion detection.
inner, fullouter, rightouter
leftouter, rightouter, fullouter
inner, leftanti, rightanti
These three cover virtually every investigation correlation. inner is the default for "show me where X and Y both occurred." leftouter preserves all left-side rows for complete status views. leftanti is the detection powerhouse — finding what is absent.
11. Your investigation query has 15 lines and fails with a syntax error. What is the fastest debugging approach?
Remove lines from the bottom one at a time until it runs. The last removed line contains the error. This isolates the problem in under a minute regardless of query length.
Read all 15 lines carefully for typos
Rewrite the query from scratch
Search the error message online
Line elimination is the universal debugging technique. Each removal + run takes 5 seconds. For 15 lines, you find the error in under 2 minutes.
12. You are writing a detection rule (Module 9) that fires when a user signs in from a country they have never signed in from before. Which KQL pattern from this module do you use?
The first-time-seen pattern (subsection 6.5): build a 30-day baseline of (user, country) pairs, then use leftanti join to find today's sign-ins from countries NOT in the baseline. This becomes a scheduled analytics rule that runs every hour.
The failed-then-succeeded pattern
The anomalous volume pattern
The IOC sweep pattern
First-time-seen with leftanti is the foundation for novelty-based detection rules. The pattern you learned as a query in this module becomes a scheduled analytics rule in Module 9. This is how KQL skills translate directly into detection engineering.