In this section

DE0.4 The Microsoft Detection Surface

8-10 hours · Module 0 · Free

What you already know

You understand what detection coverage is and why 10.3% matters. Before you can build detection rules, you need to know what data you have to work with. This section maps the five Microsoft data source families — identity, endpoint, email, cloud applications, and infrastructure — and explains what detection questions each one answers.

Five data source families

Microsoft's security telemetry spans five families. Each family contains specific tables in your Sentinel workspace. Each table records different events.

Each event answers different detection questions. Most organizations have data from at least three families but write detection rules against one or two. The unused families represent collected telemetry that costs ingestion budget and produces zero detection value.

Understanding the families at the table level matters because detection rules query specific tables. A hypothesis about credential theft leads you to SigninLogs.

A hypothesis about inbox rule persistence leads you to OfficeActivity. A hypothesis about lateral movement leads you to DeviceLogonEvents joined with DeviceNetworkEvents. The link between your hypothesis and the KQL you write is knowing which table holds the signal you need.

This is where many detection programs stall. The detection engineer knows what technique they want to detect, but they don't know which table contains the telemetry, which fields to query, or whether the data connector is even configured. A hypothesis about email collection fails immediately if the Advanced Hunting connector isn't enabled — the MailItemsAccessed events exist in Defender's Advanced Hunting console but aren't available in Sentinel for analytics rules.

A hypothesis about lateral movement via SMB fails if the Windows Security Events connector isn't collecting event ID 5140 (network share access). The detection surface — which families are connected, which tables are populated, which fields are available — determines what hypotheses you can test. Building a detection rule against a table that's empty is writing a query that will never return results, deployed as a rule that will never fire, counted in your active rule total while detecting nothing.

Estimated time: 35 minutes.

Figure DE0.4 — The five Microsoft data source families and their primary tables. Detection rules query specific tables. Missing families are detection blind spots. Cross-family joins enable multi-phase attack chain detection that single-family rules cannot achieve.

Family 1 — Identity

Tables:

SigninLogs, AADNonInteractiveUserSignInLogs, AADServicePrincipalSignInLogs, AuditLogs.

Every authentication event in your M365 tenant appears here. Interactive sign-ins (a user opening Outlook) go to SigninLogs.

Token refreshes, background SSO, and app-driven authentication go to AADNonInteractiveUserSignInLogs. Service principal authentications (app registrations, managed identities calling Graph API) go to AADServicePrincipalSignInLogs. Every directory change — user creation, role assignment, group modification, Conditional Access policy change, application registration — goes to AuditLogs.

The fields that matter for detection engineering: UserPrincipalName (who authenticated), IPAddress (from where), DeviceDetail (browser, OS, device name), Location (country and city from IP geolocation), ConditionalAccessStatus (which CA policies evaluated and what they decided), RiskLevelDuring and RiskLevelAggregated (Entra ID Protection's risk assessment), AuthenticationDetails (which auth methods were used and whether they succeeded), and SessionId (the session identifier that links interactive and non-interactive sign-ins for the same session).

The SessionId field is what makes AiTM token replay detectable. When a user authenticates interactively, the SessionId is created.

When an attacker replays a captured token, the non-interactive refresh uses the same SessionId but a different DeviceDetail. Joining SigninLogs and AADNonInteractiveUserSignInLogs on SessionId and comparing DeviceDetail is the core of the AiTM detection rule you'll build in DE4.

Detection domains covered: credential attacks (T1078, T1110), token theft (T1528), privilege escalation through role assignment (T1098), persistence through account manipulation (T1098), identity-based defense evasion (T1562.001).

Family 2 — Endpoint

Tables:

DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceRegistryEvents, DeviceLogonEvents, DeviceImageLoadEvents.

Every process execution, network connection, file write, registry modification, logon event, and DLL load on MDE-onboarded devices. This is the richest data source family — a single endpoint generates thousands of events per day. It is also the most expensive to ingest and the most productive for detection.

DeviceProcessEvents is the anchor table. ProcessCommandLine contains the full command line of every executed process. InitiatingProcessFileName and InitiatingProcessCommandLine tell you what launched the process — the parent-child relationship that distinguishes a user opening PowerShell (explorer.exe → powershell.exe) from a Word document spawning PowerShell (winword.exe → powershell.exe). The latter is a classic malware delivery pattern. AccountName identifies who ran the process. DeviceId ties it to a specific endpoint.

DeviceNetworkEvents captures every outbound connection. An outbound connection to an unusual external IP on port 443 from a process that shouldn't make network connections — calc.exe, notepad.exe, regsvr32.exe — is a C2 beacon indicator.

The attacker uses a legitimate process as a network proxy to blend with normal traffic. DeviceNetworkEvents records the process that initiated the connection, the destination IP, the port, and the protocol.

Detection domains covered: execution (T1059 command interpreters, T1204 user execution), persistence (T1547 boot autostart, T1053 scheduled tasks), privilege escalation (T1068, T1134), defense evasion (T1036 masquerading, T1055 process injection, T1070 indicator removal), credential access (T1003 OS credential dumping), lateral movement (T1021 remote services), collection (T1005 local data), C2 (T1071 application layer protocol).

Family 3 — Email

Tables:

EmailEvents, EmailUrlInfo, EmailAttachmentInfo, EmailPostDeliveryEvents.

Every email sent and received through Exchange Online. EmailEvents contains envelope data — sender, recipient, subject, delivery action, direction (inbound, outbound, intra-org), and Microsoft's threat classification (phishing, spam, malware, or clean).

EmailUrlInfo contains every URL in every email — the original URL, the URL after Safe Links rewriting, click data, and the URL's threat classification. EmailAttachmentInfo contains attachment metadata — filenames, file types, SHA-256 hash values, and detection verdicts. EmailPostDeliveryEvents captures actions taken after delivery — emails quarantined, deleted, or moved by Zero-hour Auto Purge (ZAP), which is useful for detecting phishing that initially passed inspection but was later remediated.

The critical architectural detail: EmailUrlInfo is a separate table from EmailEvents.

They share the NetworkMessageId field as the join key, but a detection rule that queries EmailEvents alone cannot examine URL characteristics — you can't filter on URL domain, URL path pattern, or redirect chain depth without joining to EmailUrlInfo. Conversely, a rule that queries EmailUrlInfo alone can't see who received the email, when it was delivered, or what Microsoft's delivery verdict was.

This two-table design means phishing detection that examines both the email context (recipient role, sender reputation, delivery timing) and the URL structure (domain age, path characteristics, redirect chain length) requires a join. That join is what makes custom phishing detection more powerful than Defender's built-in verdict — you can write rules that catch phishing emails Defender classified as clean because your rule examines structural URL patterns that Defender's model didn't flag.

Detection domains covered: phishing (T1566.001 Attachment, T1566.002 Link), BEC (T1534), email-based exfiltration (T1048), mail flow manipulation.

Family 4 — Cloud applications

Tables:

CloudAppEvents, OfficeActivity.

SharePoint access, Teams actions, OneDrive downloads, inbox rule creation, admin consent grants, Power Automate flow execution — every action a user or admin takes in M365 cloud applications.

CloudAppEvents is the unified table from Defender for Cloud Apps, providing normalized event data across all cloud applications. OfficeActivity is the classic Office 365 audit log — it overlaps significantly with CloudAppEvents but contains some operations that CloudAppEvents does not, most importantly the MailItemsAccessed operation.

MailItemsAccessed (recorded in OfficeActivity with specific Operation values) records every email read event. The OperationType field distinguishes "Bind" (a specific email was opened and read) from "Sync" (a mail client synchronized the folder contents).

This distinction matters enormously for detection: Bind events represent individual email access — a human or attacker opening specific emails. Sync events represent mail client activity — Outlook syncing the inbox on startup.

For CHAIN-HARVEST Phase 4, the attacker reads three months of email through Outlook Web. This generates 400+ MailItemsAccessed events with OperationType "Bind" from a single SessionId in three hours.

the finance manager's baseline is 28 Bind events per day during working hours. The anomaly is a 14x deviation from baseline — detectable through KQL's time-series functions (make-series, series_decompose_anomalies) applied to daily Bind counts per user. Zero of 23 rules query MailItemsAccessed.

The consent grant operations in CloudAppEvents are another high-value detection source. An OAuth consent grant (Operation "Consent to application") gives an app persistent access to user data — mailbox, files, calendar — without ongoing authentication.

A malicious consent grant is one of the most persistent cloud-native attack techniques because it survives password resets, MFA changes, and session revocation. The app retains access until the consent is explicitly revoked.

Detection domains covered: persistence through inbox rules (T1137), email collection (T1114), consent grant persistence (T1098.003), cloud data exfiltration (T1530), SharePoint/OneDrive bulk access (T1213).

Family 5 — Infrastructure

Tables:

SecurityEvent, Syslog, CommonSecurityLog.

On-premises and network infrastructure telemetry. SecurityEvent contains Windows Security Event Log entries forwarded via the Azure Monitor Agent. Syslog contains Linux system and application logs. CommonSecurityLog contains CEF-formatted logs from firewalls (Palo Alto, Fortinet), web proxies (Zscaler), and network appliances.

For hybrid infrastructure organizations, this family is essential for detecting attacks that cross the cloud-to-on-premises boundary. A VPN authentication (CommonSecurityLog from the firewall) followed by an SMB connection to a file server (SecurityEvent from the server) followed by a suspicious process execution (DeviceProcessEvents from the endpoint) traces CHAIN-MESH's lateral movement path across three data families.

Detection domains covered: network lateral movement (T1021), firewall traversal, VPN abuse (T1133), infrastructure persistence, Linux server attacks.

Scenario

You write a detection rule for suspicious inbox rule creation. It queries OfficeActivity for New-InboxRule operations. The rule fires 12 times per day — 11 are users creating legitimate filters, 1 is an attacker hiding evidence after account compromise. How do you tell the difference? The answer is not in OfficeActivity alone — it is in the join with SigninLogs that connects the inbox rule to the compromised authentication that preceded it.

The cross-family detection advantage

The power of the Microsoft unified data model is that all five families live in the same Sentinel workspace, queryable by the same KQL engine. A single detection rule can join identity telemetry with email telemetry with endpoint telemetry — tracing an attack chain across multiple phases in one query.

Most organizations don't do this. They write single-family rules: SigninLogs rules for identity anomalies, DeviceProcessEvents rules for endpoint behaviors, EmailEvents rules for phishing. Each rule sees one phase of an attack. Nobody connects the phases.

Cross-family detection changes the confidence calculation fundamentally. An inbox rule creation (OfficeActivity) is a benign event — users create inbox rules all the time. A suspicious sign-in (SigninLogs) is a common event — risk classifications produce hundreds of medium-risk findings per week.

But an inbox rule creation within 30 minutes of a suspicious sign-in from the same user, in the same session — that combination is a high-confidence indicator of account compromise with immediate persistence. Neither event alone is actionable. The join makes both actionable.

This cross-family correlation is taught throughout DE3-DE8. Each module builds detection rules that join across the families relevant to that detection domain. By DE8, you'll join identity, endpoint, and infrastructure tables in a single query to trace lateral movement from cloud compromise through endpoint access to network traversal.

Here is what a cross-family join looks like in practice. This KQL connects a suspicious sign-in (identity family) to an inbox rule creation (cloud apps family) within 30 minutes:

KQL

// Cross-family: suspicious sign-in → inbox rule creation within 30 minutes
let suspiciousSignins = SigninLogs
| where TimeGenerated > ago(1d)
| where RiskLevelDuring in ("medium", "high")
| project SigninTime = TimeGenerated, UserPrincipalName, SessionId, IPAddress;
OfficeActivity
| where TimeGenerated > ago(1d)
| where Operation == "New-InboxRule"
| extend ruleUser = UserId
| join kind=inner suspiciousSignins
 on $left.ruleUser == $right.UserPrincipalName
| where TimeGenerated between (SigninTime .. (SigninTime + 30m))
| project
 SigninTime,
 RuleCreatedTime = TimeGenerated,
 UserPrincipalName,
 IPAddress,
 Operation,
 Parameters

This is one operator — join — connecting two data families. Neither table alone tells you the story. SigninLogs shows suspicious authentication events, hundreds per week in a busy tenant. OfficeActivity shows inbox rule creation, a routine action. The temporal join within 30 minutes, scoped to the same user, connects compromise to persistence. That connection is what makes the alert actionable.

Before mapping detection scope, inventory your connected data sources. This PowerShell lists every data connector with its status and last log received:

PowerShell

# Inventory connected data sources and their health
Get-AzSentinelDataConnector -ResourceGroupName "rg-sentinel" `
    -WorkspaceName "law-sentinel" |
    Select-Object Kind, Name,
        @{N='Connected';E={$_.State -eq 'Enabled'}},
        @{N='TenantId';E={$_.TenantId}} |
    Sort-Object Kind |
    Format-Table -AutoSize

Expected Output

Kind                              Name                              Connected
────────────────────────────────  ────────────────────────────────  ─────────
AzureActiveDirectory              AAD-SignIns                       True
AzureActiveDirectory              AAD-AuditLogs                     True
MicrosoftDefenderAdvancedThreat   MDE-Connector                     True
MicrosoftCloudAppSecurity         MCAS-Connector                    True
Office365                         O365-Connector                    True
ThreatIntelligence                MDTI-Connector                    True
Syslog                            Syslog-RHEL                       True
SecurityEvents                    WindowsSecurityEvents             True

NE has 8 connectors feeding 5 data families. All show Connected = True. But "Connected" in the portal doesn't mean "currently ingesting" — a connector can show True while the underlying agent has stopped forwarding events. The KQL validation in Section 2.5 checks whether data is actually arriving, not just whether the connector is configured.

What your data source inventory means for detection scope

Your detection surface is determined by which families have data flowing into your workspace. Every missing family is a detection blind spot — not a data gap you might close later, but a set of attack techniques that are invisible to your detection program right now.

If your identity tables are empty (SigninLogs, AuditLogs), you cannot detect credential attacks, token theft, privilege escalation through role assignment, or identity-based persistence. These techniques account for the initial access and privilege escalation phases of most cloud-focused attack chains. Without identity telemetry, the first two phases of CHAIN-HARVEST are invisible.

If your endpoint tables are empty (DeviceProcessEvents, DeviceNetworkEvents), you cannot detect process-based attacks — credential dumping, LOLBin execution, lateral movement through remote services, ransomware pre-encryption indicators.

CHAIN-MESH and CHAIN-ENDPOINT depend entirely on endpoint telemetry for their mid-chain phases. An organization with identity and email data but no endpoint data can detect phishing and credential compromise but cannot see what the attacker does after landing on an endpoint.

If your email tables are empty (EmailEvents, EmailUrlInfo), you cannot detect phishing independent of Defender's verdict, BEC patterns, or email-based exfiltration.

The Advanced Hunting data connector — which brings EmailEvents and EmailUrlInfo into Sentinel — is the most commonly missing connector. Many organizations have Defender for Office 365 deployed and functional, but the Sentinel connector was never enabled, so the email telemetry exists in Defender's Advanced Hunting console but is not available in Sentinel for analytics rules.

If your cloud app tables are empty (CloudAppEvents, OfficeActivity), you cannot detect inbox rule manipulation, OAuth consent grant abuse, SharePoint bulk data access, or cloud-native persistence mechanisms. These are the techniques that distinguish cloud-native attacks from traditional endpoint attacks — and they're the techniques most template rules don't cover.

At a typical mid-size organization

All five families have data. The M365 E5 license with MDE, Defender for Office 365, and Cloud App Security populates identity, endpoint, email, and cloud app tables. Their Palo Alto firewalls forward via CEF to CommonSecurityLog. Their RHEL servers send syslog. Their Windows servers forward SecurityEvent via AMA. The organization has excellent data coverage — the detection gap is not a data problem.

The telemetry for all six attack chains exists in the workspace. Nobody has written rules to examine it. That gap — between data collected and data examined — is the gap detection engineering closes.

When the content modules begin, you'll run a KQL query that inventories your connected data sources. The output tells you which detection domains you can cover immediately and which require connector configuration before you can build rules.

Detection Engineering Principle

A missing data family is not a gap you might close later — it is a set of attack techniques that are invisible to your detection program right now. Every family you connect expands the techniques you can detect. Every cross-family join you write detects attack chain phases that no single-family rule can see.

Section 5 defines the four metrics that measure detection program health. These are the numbers that track whether your program is improving — and the numbers that go in the board report.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →