DE0.7 The Data You Already Have

2-3 hours · Module 0 · Free
Operational Objective
The Ingestion-Detection Gap: Northgate Engineering ingests 18 GB/day into Sentinel from 20 data sources at approximately $2,700/month. Their 23 analytics rules query 6 of those 20 tables. The other 14 tables contain telemetry that could detect dozens of attack techniques — but no rule looks at them. This subsection maps every data source in the NE workspace, identifies which tables are queried by existing rules and which are not, and quantifies the detection potential sitting unused in the data you already pay to ingest.
Deliverable: A data source audit for the NE environment — which tables have detection rules and which represent untapped detection potential.
⏱ Estimated completion: 20 minutes

The ingestion-detection gap

Your SIEM ingestion bill pays for data. Your analytics rules determine how much of that data produces security value. If you ingest 20 tables and query 6, you are paying full price for 14 tables of data that generates zero alerts. That data is not wasted from a retention perspective — you can hunt through it manually, you can query it during incident response — but from a detection perspective, it is idle.

The ingestion-detection gap is the ratio of tables with at least one analytics rule to total ingested tables. Northgate Engineering’s gap: 6 / 20 = 30%. Seventy percent of their ingested data has no automated detection.

This is the detection engineer’s opportunity. Every new rule built against a previously unqueried table extracts additional value from data the organization already pays to ingest. No new data connector needed. No new ingestion cost. Just KQL against existing tables.

NORTHGATE ENGINEERING — DATA SOURCE AUDITQUERIED BY RULES (6 tables)SigninLogs2.1 GB/dSecurityAlert0.1 GB/dOfficeActivity0.9 GB/dDeviceProcessEvents3.2 GB/dIdentityLogonEvents0.3 GB/dAuditLogs0.3 GB/dNOT QUERIED BY ANY RULE (14 tables)AADNonInteractiveUserSignInLogs4.8 GB/d ← highest volume, zero rulesDeviceNetworkEvents2.4 GB/d ← lateral movement invisibleDeviceFileEvents1.1 GB/d ← exfiltration invisibleCommonSecurityLog1.2 GB/d ← firewall data unanalyzedWindowsEvent0.8 GB/d ← DC events unmonitoredDeviceLogonEvents0.8 GB/d ← RDP/logon invisibleDeviceRegistryEvents0.6 GB/d ← persistence invisible+ CloudAppEvents, EmailEvents, EmailUrlInfo, EmailAttachmentInfo,IdentityDirectoryEvents, Syslog, SecurityIncident$2,700/month buys 20 tables of data. Rules query 6. 14 tables produce zero detection value.

Figure DE0.7 — Northgate Engineering data source audit. Green tables have at least one analytics rule. Red tables are ingested and paid for but have zero detection rules — including the highest-volume table (AADNonInteractiveUserSignInLogs at 4.8 GB/day).

The highest-value unqueried tables

AADNonInteractiveUserSignInLogs (4.8 GB/day). The highest-volume table in the workspace and zero rules query it. This table records every non-interactive authentication: token refreshes, app-based authentication, service principal sign-ins. AiTM session token theft (CHAIN-HARVEST Phase 2) produces a signature pattern in this table — a non-interactive token refresh from a different IP than the preceding interactive sign-in. Token replay attacks, application-level compromise, and service principal abuse are all detectable here. DE4 builds rules against this table.

DeviceNetworkEvents (2.4 GB/day). Every network connection from every managed endpoint. RDP lateral movement (CHAIN-MESH Phases 3 and 5), C2 beaconing (CHAIN-ENDPOINT Phase 7), DNS tunneling, and data exfiltration over unusual ports — all visible here. This is the primary table for lateral movement and exfiltration detection. DE8 builds rules against it.

DeviceFileEvents (1.1 GB/day). Every file creation, modification, and deletion on managed endpoints. Bulk file reads from file shares (CHAIN-ENDPOINT collection), USB file writes (CHAIN-FACTORY exfiltration), ransomware mass encryption (CHAIN-MESH Phase 7), and document staging for exfiltration — all visible here. DE7 builds rules against it.

CommonSecurityLog (1.2 GB/day). Palo Alto firewall logs — allow and deny events across all sites. Inter-site traffic patterns, egress to suspicious destinations, large data transfers, and VPN session metadata. This is the only visibility into network-layer activity for devices without Defender for Endpoint. DE8 uses this table for cross-site lateral movement correlation.

DeviceRegistryEvents (0.6 GB/day). Registry modifications on managed endpoints. Persistence mechanisms (Run keys, services), security tool tampering (Defender configuration changes), and malware configuration writes. DE5 builds persistence detection rules against this table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Identify your unqueried tables
// Run in your Sentinel workspace to find tables with data but no rules
let queriedTables = SecurityAlert
    | where TimeGenerated > ago(90d)
    | where ProviderName == "ASI Scheduled Alerts"
    | extend RuleQuery = tostring(ExtendedProperties.["Query"])
    | mv-expand TableName = extract_all(@"(\w+)\s*\|", RuleQuery)
    | summarize by tostring(TableName);
Usage
| where TimeGenerated > ago(30d)
| where IsBillable == true
| summarize DailyGB = round(sum(Quantity) / 1024 / 30, 2) by DataType
| where DailyGB > 0.01
| join kind=leftanti queriedTables on $left.DataType == $right.TableName
| sort by DailyGB desc
// Output: tables you pay for but have zero detection rules against
⚠ Compliance Myth: "We should reduce ingestion to save costs — we are ingesting more data than we use"

The myth: If analytics rules only query 6 tables, the other 14 are unnecessary and should be disconnected to reduce Sentinel costs.

The reality: The unqueried tables are not unnecessary — they are underutilized. Disconnecting them reduces cost AND eliminates the detection potential and the investigation capability they provide. The correct response to unqueried tables is not to stop ingesting them but to build detection rules against them. DeviceNetworkEvents at 2.4 GB/day costs approximately $345/month. A single detection rule against that table that catches one lateral movement incident saves hundreds of thousands in breach costs. The cost conversation should be: “we pay $345/month for network telemetry and currently extract zero detection value from it — let us fix that” not “we pay $345/month for data we do not use — let us cut it.”

Data sources NOT connected

The detection gap includes not only unqueried tables but also data sources that are not connected to Sentinel at all. Northgate has five significant telemetry gaps:

AWS CloudTrail — the engineering team uses AWS S3 for some file storage. No visibility into API calls, IAM activity, or data access in AWS. A cross-cloud attack that pivots from Microsoft to AWS is invisible.

OT network logs — the manufacturing floor network at Bristol and Sheffield is air-gapped (correctly). Sunderland’s OT network is connected to the corporate network (a design flaw). No OT logs are forwarded to Sentinel from any site.

RHEL auditd — the 4 RHEL CAD/CAM servers have auditd configured but the logs are reviewed manually, not forwarded to Sentinel. The crown jewels have no real-time detection.

VPN session logs — Prisma Access session metadata is available in Panorama but not forwarded to Sentinel. VPN authentication goes through Entra ID (visible in SigninLogs) but VPN session duration, bandwidth, and disconnect events are not.

DNS query logs — internal DNS resolvers do not log queries. DNS tunneling, DGA domain resolution, and C2 over DNS are invisible.

Connecting these sources increases ingestion cost. The detection engineer must justify each connection with specific detection rules it enables — the detection-first principle from the course blueprint. DE2 includes this cost-benefit analysis in the 90-day roadmap.

Try it yourself

Exercise: Run the unqueried tables query

Run the KQL query above in your Sentinel workspace. It identifies tables where you pay for ingestion but have zero analytics rules. Sort by DailyGB descending — the highest-volume unqueried table is your biggest detection gap per dollar spent.

If you find DeviceNetworkEvents, DeviceFileEvents, or AADNonInteractiveUserSignInLogs in the unqueried list, you have the same gap pattern as Northgate Engineering. These are the tables that DE4, DE7, and DE8 build rules against.

Check your understanding

Your CFO approves connecting the RHEL auditd logs to Sentinel. The estimated ingestion is 0.5 GB/day at approximately $75/month. What detection rules must you build BEFORE connecting the data source to justify the cost?

Answer: At minimum: (1) SSH authentication anomaly detection (failed/succeeded patterns from auth.log), (2) privilege escalation detection (sudo to root from unexpected users, SUID binary execution), (3) file integrity monitoring for the CAD/CAM engineering files (unauthorized reads from the rendering farm directories), and (4) suspicious process execution (processes executed by www-data or other service accounts that should not spawn interactive shells). These 4 rules provide the detection justification for the $75/month ingestion. If none of these rules produce true positive value after 90 days, the connection should be re-evaluated. This is the detection-first principle: define the rules before committing the cost.

Troubleshooting: “We do not know what tables we have”

Run the Usage table query. Usage | where TimeGenerated > ago(30d) | where IsBillable == true | summarize DailyGB = round(sum(Quantity) / 1024 / 30, 2) by DataType | sort by DailyGB desc shows every table with data and its volume. This is the starting point for any data source audit.

Check the data connectors page. Sentinel → Data connectors shows all configured connectors and their status. Connected connectors with “last log received” timestamps confirm active data flow. Connectors that show “not connected” or stale timestamps indicate data sources that were configured but stopped working — a common issue after tenant changes or API key rotations.


References used in this subsection

  • NE Training Universe blueprint (data source inventory)
  • Course cross-references: DE2 (cost-benefit analysis for new data sources), DE4 (AADNonInteractiveUserSignInLogs rules), DE7 (DeviceFileEvents rules), DE8 (DeviceNetworkEvents and CommonSecurityLog rules)

You're reading the free modules of Detection Engineering

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus