In this section

TH0.10 M365 Data Sources for Hunting

3-4 hours · Module 0 · Free

Operational Objective

Every hunt query targets a specific data table. If you do not know what each table contains, what it records, and what it misses, you cannot scope hunts effectively, you cannot interpret results accurately, and you will miss evidence that exists in a table you did not think to query. This subsection is the reference guide to the M365 telemetry tables used throughout this course — what each one records, what hunting questions it answers, and where the blind spots are.

Deliverable: A working knowledge of every M365 data source relevant to threat hunting, including what each table captures, what it does not capture, and which hunt campaigns depend on it.

⏱ Estimated completion: 30 minutes

The tables that matter

You do not need to memorize every column in every table. You need to know which table to query for which question, what the table records, and — critically — what it does not record. The gaps in telemetry are as important as the content, because a hunt that queries a table missing the relevant data produces a false negative: "we looked and found nothing" when the truth is "we looked in the wrong place."

Identity and authentication tables

SigninLogs — interactive user sign-ins. Every time a user opens a browser and authenticates to Entra ID, the event appears here. Contains: user principal name, IP address, location (country, city), device details (OS, browser), conditional access evaluation results, risk level, MFA requirement and method, authentication protocol, application accessed, result code.

// Quick check: are both authentication tables ingested?
// Both are required for comprehensive identity hunting
union
    (SigninLogs | where TimeGenerated > ago(1d)
    | summarize Count = count() | extend Table = "SigninLogs"),
    (AADNonInteractiveUserSignInLogs | where TimeGenerated > ago(1d)
    | summarize Count = count()
    | extend Table = "AADNonInteractiveUserSignInLogs")
| project Table, Count
// If AADNonInteractive returns 0, AiTM token replay is invisible
// This is the single most impactful data source gap in M365 hunting

// What is your actual data retention for hunting-critical tables?
Usage
| where TimeGenerated > ago(90d)
| where DataType in (
    "SigninLogs", "AADNonInteractiveUserSignInLogs",
    "AuditLogs", "CloudAppEvents", "SecurityAlert")
| summarize
    EarliestData = min(TimeGenerated),
    LatestData = max(TimeGenerated),
    RetentionDays = datetime_diff('day', max(TimeGenerated), min(TimeGenerated))
    by DataType
| sort by RetentionDays desc
// If RetentionDays < 90 for any table, long-window hunts are limited
// Consider configuring archive tiers for hunting-critical tables

Expand for Deeper Context

What it answers: Where are users signing in from? Which users are authenticating from new locations? Which sign-ins bypassed MFA? Which conditional access policies applied?

What it misses: Application-based sign-ins and token refreshes — those go to AADNonInteractiveUserSignInLogs. Service principal authentication — that goes to AADServicePrincipalSignInLogs. A hunt that only queries SigninLogs misses the entire token replay attack surface.

Hunt campaigns that use it: TH4 (identity compromise), TH7 (email-based threats), TH10 (lateral movement).

AADNonInteractiveUserSignInLogs — token refreshes and application-based sign-ins. When an application uses a refresh token to obtain a new access token, the event appears here. This is where AiTM token replay is visible — the attacker's stolen refresh token generating new access tokens from the attacker's IP.

What it answers: Which IPs are refreshing tokens for each user? Are refresh events coming from IPs that differ from the user's interactive sign-in IPs? Which applications are using tokens most frequently?

What it misses: The initial interactive authentication — that is in SigninLogs. The actual data access performed with the token — that is in CloudAppEvents or application-specific audit logs.

Hunt campaigns: TH4 (identity compromise — token replay), TH6 (privilege escalation — app sign-in patterns).

AADServicePrincipalSignInLogs — service principal authentication. Applications that authenticate with their own credentials (client secret or certificate) rather than on behalf of a user. This is where compromised application credentials appear.

What it answers: Which service principals are authenticating? From which IPs? How frequently? Has the authentication pattern changed?

What it misses: What the application does after authentication — that requires correlating with CloudAppEvents or MicrosoftGraphActivityLogs.

Hunt campaigns: TH6 (privilege escalation — post-consent behavior).

AuditLogs (Entra ID) — directory changes. User creation, deletion, and modification. Group membership changes. Role assignments. Application consent. Conditional access policy changes. MFA method registration.

What it answers: Who changed what in the directory? Were any roles assigned outside PIM? Were any conditional access policies weakened? Were any new MFA methods registered?

What it misses: Authentication events (those are in SigninLogs). Data access events (those are in CloudAppEvents). On-premises AD changes (those are in IdentityDirectoryEvents if Defender for Identity is deployed).

Hunt campaigns: TH6 (privilege escalation — consent events), TH7 (email-based threats), TH4 (identity compromise — MFA registration).

Cloud application and email tables

CloudAppEvents — Defender for Cloud Apps telemetry. The richest M365 hunting table for cloud-plane activity. Records Exchange Online operations (inbox rules, mail forwarding, message access), SharePoint/OneDrive file operations, Teams activity, Power Platform operations, and third-party SaaS activity visible to Defender for Cloud Apps.

What it answers: What did the user do after signing in? Were inbox rules created? Were files downloaded from SharePoint? Were sharing links created? What applications accessed the data?

What it misses: Authentication events (SigninLogs). Email content and delivery details (EmailEvents). Endpoint activity (Device* tables). If Defender for Cloud Apps is not connected, this table is empty — and a significant portion of your cloud hunting surface is dark.

Hunt campaigns: TH5 (cloud persistence), TH8 (data exfiltration), TH11 (application & API abuse), TH13 (insider threats).

EmailEvents — email delivery telemetry from Defender for Office 365. Records email delivery actions, threat detections, sender/recipient, subject, delivery location.

What it answers: Was a phishing email delivered to a user before an anomalous sign-in? What emails did a compromised account send? Were phishing emails sent from a compromised internal account?

What it misses: Email content (requires EmailUrlInfo and EmailAttachmentInfo for URLs and attachments). Post-delivery user actions on the email. Inbox rule processing after delivery.

Hunt campaigns: TH4 (identity compromise — phishing correlation), TH5 (cloud persistence — email activity).

Endpoint tables

DeviceProcessEvents — process execution on Defender for Endpoint-managed devices. Every process creation, with parent process, command line, file hash, user context, and timestamp.

What it answers: What processes executed? What were the command lines? What parent processes spawned them? Are there unusual process trees?

DeviceFileEvents — file creation, modification, deletion, and rename events on endpoints.

DeviceRegistryEvents — registry key creation, modification, and deletion. Critical for persistence detection — autostart entries, service creation, scheduled task registration.

DeviceNetworkEvents — network connections from endpoints. Destination IP, port, protocol, process that initiated the connection.

DeviceLogonEvents — logon events on endpoints. Local and remote (RDP, network) logon types.

These five tables collectively provide the endpoint hunting surface. They are required for TH9 (endpoint threats), TH10 (lateral movement), and TH12 (pre-ransomware activity).

What they miss: Cloud-plane activity (identity, email, SaaS). An attacker operating entirely through the browser or Graph API without dropping files or executing processes on the endpoint generates zero events in these tables.

Supplementary tables

IdentityLogonEvents / IdentityDirectoryEvents — Defender for Identity telemetry. On-premises Active Directory logon and directory change events. Required for hybrid hunting in TH10. If not ingested, cloud-to-on-prem pivot detection has no on-prem visibility.

MicrosoftGraphActivityLogs — Graph API call activity. What applications and users are doing through the API. Relatively new (2024). If ingested, dramatically enriches TH5 (cloud persistence — inbox rule creation via Graph) and TH6 (privilege escalation — post-consent data access via Graph). If not ingested, these API-based attack paths are invisible.

OfficeActivity — legacy Office 365 audit log connector. Overlaps with CloudAppEvents but with different schema and less enrichment. If your environment uses OfficeActivity instead of CloudAppEvents, the hunt queries need adaptation — the campaigns in this course are written for CloudAppEvents.

Figure TH0.10 — M365 hunting data source map. Three clusters (identity, cloud apps, endpoint) cover the three attack planes. Each cluster has common ingestion gaps noted.

The retention question

Advanced Hunting queries the last 30 days of data. If your hunt hypothesis covers a longer window — and long-dwell hypotheses (APT, supply chain) often do — you need to query through Sentinel's Log Analytics interface (which respects your configured retention period) or use search jobs for archived data.

Check your retention:

Try it yourself

Exercise: Audit your hunting data estate

Run the data source check query from above. For each of the three clusters (identity, cloud apps, endpoint), answer:

Identity: Are all four tables ingested? If AADNonInteractiveUserSignInLogs is missing, TH4 (identity compromise) will have a critical blind spot.

Cloud apps: Is CloudAppEvents populated? If not, is Defender for Cloud Apps connected? Is MicrosoftGraphActivityLogs enabled?

Endpoint: Are all five Device* tables populated? If not, is Defender for Endpoint deployed to all relevant device groups?

Document the gaps. Each gap is either a prerequisite to fix before hunting that domain or a known limitation to record in hunt records that depend on the missing table.

⚠ Compliance Myth: "We ingest all our logs into Sentinel — we have full visibility"

The myth: If data is flowing into Sentinel, it is available for hunting. Log ingestion equals visibility.

The reality: Ingestion is necessary but not sufficient. Many organizations ingest SigninLogs but not AADNonInteractiveUserSignInLogs — leaving the entire token replay attack surface invisible. Many ingest CloudAppEvents but have not connected all relevant data sources in Defender for Cloud Apps — leaving specific application activities unreported. Some ingest endpoint tables but only from a subset of devices (servers but not workstations, or managed devices but not BYOD). The audit is not "is the table ingested?" but "is the table ingested completely, for all relevant entities, with sufficient retention?" Each gap is a hunting blind spot.

Extend this reference

Microsoft's data source landscape evolves. New tables appear (MicrosoftGraphActivityLogs was introduced in 2024). Existing tables gain new columns. Defender for Cloud Apps adds new application connectors. Before starting any hunt campaign, check the Microsoft Learn documentation for the specific table to confirm the columns you need are available and populated in your environment. The table schemas in this subsection are accurate as of the course publication date but may have expanded since then.

📋 Operational Artifact — Data Source Quick Reference for Hunt Campaigns

Per campaign, confirm these tables are ingested before hunting:

TH4 (Identity compromise): SigninLogs ☐ + AADNonInteractiveUserSignInLogs ☐ + AuditLogs ☐

TH5 (Cloud persistence): CloudAppEvents ☐ + EmailEvents ☐

TH6 (privilege escalation): AuditLogs ☐ + AADServicePrincipalSignInLogs ☐

TH7 (email-based threats): AuditLogs ☐ + SigninLogs ☐

TH8 (data exfiltration): CloudAppEvents ☐

TH9 (endpoint threats): DeviceProcessEvents ☐ + DeviceRegistryEvents ☐ + DeviceFileEvents ☐

TH10 (lateral movement): SigninLogs ☐ + DeviceLogonEvents ☐ + IdentityLogonEvents ☐

TH11 (application & API abuse): CloudAppEvents ☐

TH12 (pre-ransomware activity): DeviceProcessEvents ☐ + DeviceNetworkEvents ☐ + DeviceFileEvents ☐

TH13 (insider threats): CloudAppEvents ☐ + SigninLogs ☐ + DeviceFileEvents ☐

References Used in This Subsection

Microsoft. "Advanced Hunting Schema Reference." Microsoft Learn. https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-schema-tables
Microsoft. "Microsoft Sentinel — Data Connectors Reference." Microsoft Learn. https://learn.microsoft.com/en-us/azure/sentinel/data-connectors-reference
Microsoft. "MicrosoftGraphActivityLogs." Microsoft Learn. https://learn.microsoft.com/en-us/graph/microsoft-graph-activity-logs-overview

Decision point

You have time for one hunt this quarter. Do you hunt for the threat in the latest advisory or for the gap in your ATT&CK coverage matrix?

Hunt the coverage gap. Advisories describe threats that are CURRENT but may not target NE. Coverage gaps describe techniques that COULD target NE and would succeed undetected. The coverage gap hunt produces a detection rule (closing the gap permanently). The advisory-driven hunt produces a point-in-time assessment (confirming the specific threat is not present today). Both are valuable — but the coverage gap hunt has a longer-lasting impact because it produces a permanent detection improvement.

A hunt query returns 200 results. You have 4 hours remaining in the hunt window. You can investigate 20 results thoroughly or review all 200 superficially. Which approach produces better hunt outcomes?

Review all 200 — you might miss a critical finding in the 180 you skip.

Investigate 20 thoroughly. A superficial review of 200 results produces 200 'looked at it, seemed okay' assessments that provide no investigative value and no documentation for future reference. A thorough investigation of 20 results produces: confirmed findings (true positives requiring remediation), confirmed benign patterns (documented baselines for future comparison), and inconclusive results (flagged for monitoring). Prioritise the 20 by: highest anomaly score, highest-value assets involved, and highest-risk users involved. Document why the remaining 180 were not investigated and recommend a follow-up hunt with refined query criteria to reduce the result set.

Investigate 20 — but only if they are from the most recent 24 hours.

Neither — refine the query first to reduce the result set below 50.

You understand the detection gap and the hunt cycle.

TH0 showed you what detection rules fundamentally cannot catch. TH1 gave you the hypothesis-driven methodology that closes that gap. Now you run the hunts.

10 complete hunt campaigns — from hypothesis through KQL execution through finding disposition, each campaign based on a real TTP
70 production hunt queries — every one mapped to MITRE ATT&CK and tested against realistic telemetry
Advanced KQL for hunting — UEBA composite risk scoring, retroactive IOC sweeps, and hunt management metrics
Hypothesis-Driven Hunt Toolkit lab pack — 30 days of realistic M365 and endpoint telemetry with multiple attack patterns seeded in
TH16 — Scaling hunts across a team — the operating model for a production hunt program

Unlock the full course with Premium See Full Syllabus

← Previous Next →