5.9 Workspace Health and Monitoring
Workspace Health and Monitoring
By the end of this subsection, you will have KQL queries for monitoring ingestion health, detecting data gaps, and tracking workspace operational metrics.
A workspace that silently stops ingesting data is worse than no workspace at all — you believe you have coverage when you do not. These monitoring queries catch problems before they become blind spots.
Connector health check
| |
Expected Output
| TableName | LastEvent | EventCount | HoursSinceLastEvent |
|---|---|---|---|
| IdentityLogonEvents | 2026-03-21 08:14 | 12 | 6.3 |
| AuditLogs | 2026-03-21 13:45 | 234 | 1.1 |
What to look for: Any table with
HoursSinceLastEvent > 4 during business hours indicates a potential connector failure. IdentityLogonEvents showing 6.3 hours of silence means Defender for Identity may have lost connectivity to a domain controller. AuditLogs at 1.1 hours is borderline — check again in 30 minutes.Ingestion anomaly detection
| |
Expected Output
| DataType | TodayMB | AvgDailyMB | PercentChange |
|---|---|---|---|
| CommonSecurityLog | 8,450 | 2,100 | +302% |
| SigninLogs | 180 | 1,200 | -85% |
What to look for: Two types of anomaly matter. Spikes: CommonSecurityLog at +302% means firewall log volume tripled — possible DDoS, a config change, or a new verbose rule. Investigate the cause and consider a DCR filter. Drops: SigninLogs at -85% is more dangerous — it means the Entra ID connector may have failed. You are missing sign-in data, which means your token replay detections are blind. Fix immediately.
Analytics rule health
| |
Expected Output
| SentinelResourceName | Succeeded | Failed | LastRun |
|---|---|---|---|
| Token replay from novel IP | 92 | 4 | 2026-03-21 14:15 |
What to look for: Any rule with
Failed > 0 needs investigation. Common failure causes: the KQL query references a table that was moved to Basic tier (join not supported), the query exceeded the execution time limit (optimize with time filters), or the table was renamed in a schema update. 4 failures out of 96 runs may be transient (service hiccup) — 96 out of 96 means the rule is broken.Build a health monitoring workbook
Combine these three queries into a Sentinel workbook that runs automatically. Module 26 covers workbook construction in detail. For now, run these queries manually once per week during your workspace health check.
Check your understanding
1. SigninLogs ingestion dropped 85% compared to the 7-day average. What is the operational impact?
Data gaps are invisible to analysts who do not monitor ingestion health. A query that returns fewer results than expected looks like "nothing happened" rather than "data is missing." Monitoring ingestion volume is how you distinguish "quiet day" from "broken connector."