K0.2 The Microsoft Security Data Model

1.5-2 hours · Module 0 · Free

Operational Objective

The Data Landscape: Microsoft's security stack generates telemetry across 5 product families: Entra ID (identity), Defender for Endpoint (devices), Defender for Office 365 (email), Defender for Cloud Apps (SaaS applications), and Sentinel (the SIEM that aggregates everything). Each product writes to specific Log Analytics tables with distinct schemas. Before writing any KQL, the analyst must understand WHERE the data lives — which table contains sign-in events, which contains process creation, which contains email activity. Querying the wrong table returns zero results and wastes investigation time.

Deliverable: A mental map of the Microsoft security data model — which product generates which table, and which investigation questions each table answers.

Estimated completion: 25 minutes

Two query surfaces, one data model

Microsoft provides two places to write KQL for security investigations:

Microsoft Sentinel — Log Analytics workspace: The SIEM. Contains ALL security data in one workspace: sign-in logs, device events, email activity, firewall logs, custom log sources. Tables use Sentinel naming conventions: SigninLogs, DeviceProcessEvents, OfficeActivity. Queries run against the Log Analytics backend. Retention: 30 days to 2 years depending on configuration.

Defender XDR — Advanced Hunting: The unified security portal. Contains data from Defender products only: endpoint, email, identity, cloud apps. Tables use Defender naming conventions: DeviceProcessEvents, EmailEvents, IdentityLogonEvents. Queries run against the Defender data store. Retention: 30 days standard.

The overlap: Many tables exist in BOTH surfaces with identical schemas (DeviceProcessEvents, DeviceFileEvents, DeviceNetworkEvents). Some tables exist only in Sentinel (OfficeActivity, CommonSecurityLog, custom logs). Some exist only in Defender XDR (EmailEvents, EmailUrlInfo, CloudAppEvents with full fidelity).

For this course: We teach KQL against Sentinel tables unless noted otherwise. Sentinel is the superset — it contains Defender XDR data plus non-Microsoft data sources. Every query in this course works in the Sentinel Logs blade. Most also work in Defender Advanced Hunting with minor schema adjustments.

Figure K0.2 — The Microsoft security data model. Five product families feed into one Sentinel workspace, all queryable via KQL.

Which product answers which question?

The data model determines where you look. Asking the wrong table wastes time. Here is the mental map:

“Who signed in?” → Entra ID → SigninLogs (interactive sign-ins), AADNonInteractiveUserSignInLogs (token refreshes, application sign-ins). These tables contain: UserPrincipalName, IPAddress, DeviceDetail, ResultType (success/failure code), ConditionalAccessStatus, RiskLevelDuringSignIn.

“What ran on the device?” → Defender for Endpoint → DeviceProcessEvents (every process creation including command line, parent process, user context), DeviceFileEvents (file creation, modification, deletion), DeviceRegistryEvents (registry changes), DeviceNetworkEvents (network connections from the device).

“What happened in email?” → Defender for Office 365 → EmailEvents (delivery, filtering decisions), OfficeActivity (mailbox operations — inbox rules, email sends, message reads). The distinction matters: EmailEvents shows what the EMAIL SYSTEM did. OfficeActivity shows what the USER did.

“What happened in Active Directory?” → Defender for Identity → IdentityLogonEvents (Kerberos and NTLM authentication on domain controllers), IdentityDirectoryEvents (AD object changes — group membership, password resets, attribute modifications).

“What did the SIEM detect?” → Sentinel → SecurityAlert (alerts from all connected products), SecurityIncident (grouped incidents), CommonSecurityLog (firewall/proxy logs in CEF format).

NE’s data landscape: 20 tables, 18 GB/day

At Northgate Engineering, Rachel’s Sentinel workspace ingests data from all 5 product families. The workspace receives 18 GB per day across 20 connected tables. But Rachel does not query 20 tables equally. Eight tables handle 95% of her investigation queries. These are the 8 tables this course teaches KQL against:

The daily 8: SigninLogs (authentication), AuditLogs (configuration changes), DeviceProcessEvents (process execution), DeviceFileEvents (file operations), OfficeActivity (email and SharePoint), IdentityLogonEvents (AD authentication), CommonSecurityLog (firewall), SecurityAlert (all product alerts).

The remaining 12 tables (DeviceRegistryEvents, DeviceNetworkEvents, DeviceImageLoadEvents, EmailEvents, EmailUrlInfo, CloudAppEvents, IdentityDirectoryEvents, IdentityQueryEvents, AADNonInteractiveUserSignInLogs, SecurityIncident, Syslog, Heartbeat) are queried for specific investigations but not daily. This course covers them in context when the investigation requires them — not as standalone lessons.

Compliance Myth: "KQL is just for Sentinel — Defender has its own interface"

The myth: KQL is the query language for Sentinel. Defender XDR has Advanced Hunting, which is a different tool.

The reality: Advanced Hunting IS KQL. The same query language, the same operators, the same syntax. The only differences are: (1) which tables are available (Defender Advanced Hunting has some tables that Sentinel does not, and vice versa), and (2) slight schema variations in shared tables (some field names differ between the Sentinel connector version and the native Defender version). A KQL query written in Sentinel runs in Defender Advanced Hunting with minor table/field adjustments. Learning KQL once gives you investigation capability across BOTH platforms.

Try it yourself

Exercise: Explore your workspace tables

Open your Sentinel Logs blade (or Defender Advanced Hunting). Type: search * and press Shift+Enter. This returns ALL data in the workspace — scroll the results and note the different table names in the $table column. Then try: union withsource=TableName * | summarize Count = count() by TableName | sort by Count desc — this shows every table in your workspace ranked by event volume. Which table has the most data?

Check your understanding

An analyst needs to determine: (1) whether a user authenticated successfully, (2) whether that user then ran PowerShell on a server, and (3) whether that user created an inbox rule. Which three tables does the analyst need?

Answer: (1) SigninLogs — for authentication success/failure. (2) DeviceProcessEvents — for process execution (PowerShell) on the server. (3) OfficeActivity — for inbox rule creation (New-InboxRule operation). This is a cross-product, cross-table investigation: Entra ID → Defender for Endpoint → Defender for Office 365. KQL joins these three tables by the shared entity: UserPrincipalName. This is exactly the CHAIN-HARVEST investigation pattern from Question 2 in K0.1.

Troubleshooting

“My workspace does not have DeviceProcessEvents.” This table requires Defender for Endpoint P2 (or Microsoft 365 E5). If your organization has a lower licence tier, endpoint telemetry is not ingested. The alternative: Sysmon logs collected via AMA land in the Event table — see Detection Engineering DE1.12b for the Sysmon-to-MDE translation.

Operational Artifact — Investigation Question to Table Quick Reference

Authentication: “Who signed in?” → SigninLogs, AADNonInteractiveUserSignInLogs Process execution: “What ran?” → DeviceProcessEvents File operations: “What was created/modified/deleted?” → DeviceFileEvents Email activity: “What was sent/received/ruled?” → OfficeActivity, EmailEvents AD authentication: “Who authenticated to AD?” → IdentityLogonEvents Configuration changes: “What was changed?” → AuditLogs, IdentityDirectoryEvents Firewall/proxy: “What network traffic?” → CommonSecurityLog Alerts: “What did products detect?” → SecurityAlert

The data flow: from event to queryable row

Understanding the data model requires understanding the PIPELINE that transforms a security event into a queryable row in Sentinel:

Step 1 — Event generation: j.morrison opens Outlook and authenticates to Exchange Online. Entra ID generates a sign-in event containing: UserPrincipalName, IPAddress, DeviceDetail (browser, OS, compliance status), ResultType (success/failure), ConditionalAccessStatus, RiskLevelDuringSignIn, and 30+ additional fields.

Step 2 — Connector ingestion: The Entra ID data connector forwards the event to the Log Analytics workspace. Latency: typically 2-5 minutes from event generation to workspace availability. During this window, the event exists in Entra ID’s internal log but is NOT yet queryable in Sentinel.

Step 3 — Table storage: The event lands in the SigninLogs table as a new row. Every field from Step 1 becomes a column. The TimeGenerated column records when the event was INGESTED (not when it occurred — the actual event time is in a separate field). This distinction matters for time-based queries: where TimeGenerated > ago(5m) may miss events that occurred 4 minutes ago but have not yet been ingested.

Step 4 — Query availability: The row is now queryable via KQL. Any analyst with Reader permissions can query it. The row persists for the table’s retention period (90 days default, configurable up to 2 years for Analytics-tier tables).

At NE, this pipeline processes approximately 45,000 SigninLogs rows per day, 32,000 DeviceProcessEvents rows per day, and 15,000 OfficeActivity rows per day. Total: ~92,000 security-relevant rows per day across the 8 daily tables. Every one of these rows is queryable via KQL within minutes of the event occurring.

The two-surface architecture in practice

Rachel uses BOTH query surfaces daily. The choice depends on the investigation:

Sentinel Logs blade — for investigations that require: Sentinel-only tables (CommonSecurityLog for firewall data, custom log tables), retention beyond 30 days (Sentinel retains up to 2 years), or analytics rule development (rules are defined in Sentinel, not Defender).

Defender Advanced Hunting — for investigations that require: Defender-specific tables (EmailEvents with full email metadata, CloudAppEvents with SaaS application detail), or when the analyst is already in the Defender portal triaging an incident and needs to pivot to a quick query without switching portals.

The KQL is identical. The same query runs in both surfaces with the same syntax. The only adjustments: some table names differ (Sentinel’s OfficeActivity vs Defender’s CloudAppEvents for overlapping data), and some fields have slight name variations. This course notes these differences when they occur — but the KQL operators, logic patterns, and investigation methodology are the same everywhere.

Understanding ingestion latency

A common investigation mistake: assuming the data is available the instant the event occurs. At NE, the typical latency from event to queryable row is 2-5 minutes for SigninLogs, 3-8 minutes for DeviceProcessEvents, and 5-15 minutes for OfficeActivity. During an active incident, Rachel runs queries with where TimeGenerated > ago(15m) — not ago(5m) — to account for this lag. The extra 10 minutes ensures events that occurred 5 minutes ago but have not yet been ingested are captured in the next query cycle. For detection rules that run on a schedule (every 5 minutes), the lookback window must exceed the schedule by at least the ingestion latency: a 5-minute schedule with a 10-minute lookback captures events that arrive with up to 5 minutes of delay.

You're reading the free modules of Mastering KQL

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus

← K0.1 Three Questions You Cannot Answer Without KQL K0.3 The Eight Tables You Will Query Every Day →