6.7 Connector Validation and Ongoing Monitoring

75 minutes · Module 6

Connector Validation and Ongoing Monitoring

By the end of this subsection, you will have a validation checklist for new connectors and a weekly monitoring routine that catches connector failures before they create blind spots.

New connector validation checklist

Every time you enable a new connector, run through this checklist before marking it as complete:

Check	Query or action	Expected result
Data arriving	Query the target table for events in the last hour	EventCount > 0
Correct table	Verify data lands in the expected table (e.g., CommonSecurityLog not Syslog)	Table name matches connector documentation
Fields populated	Check critical columns are not null	SourceIP, DeviceAction, TimeGenerated populated
Ingestion latency	Run the latency query from 6.6	Average < 5 minutes
Volume matches estimate	Compare actual 24-hour volume to your pre-connection estimate	Within 30% of estimate
DCR filtering active	If a DCR was applied, verify volume reduction	Reduction matches expected percentage
No duplicates	Run the duplicate detection query from 6.6	Zero or near-zero duplicates
Analytics rules compatible	Run any planned analytics rules against the new data	Rules return expected results

Create a validation template

Copy this checklist into your SOC wiki and fill it out for every new connector. The completed checklist serves as documentation — when someone asks "when was the Palo Alto connector set up and is it working?" you have the answer with evidence.

Weekly monitoring routine

Run these three queries every Monday (or at the start of each shift in a 24/7 SOC):

1. Connector health — all tables:

1
2
3
4
5
6
7
8
9
union withsource=TableName *
| where TimeGenerated > ago(24h)
| summarize
    EventCount = count(),
    LastEvent = max(TimeGenerated),
    HoursAgo = round(datetime_diff('minute', now(), max(TimeGenerated)) / 60.0, 1)
    by TableName
| where HoursAgo > 2
| sort by HoursAgo desc

Expected Output — Healthy = Empty Results

No results — all tables have events within the last 2 hours

What to look for: Any table appearing in this result has a data gap. A 2-hour threshold during business hours is the alert level. During weekends or off-hours, some tables (like AuditLogs) may naturally go quiet — adjust the threshold if your org has low weekend activity.

2. Volume anomaly check:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
let baseline =
    Usage
    | where TimeGenerated between (ago(8d) .. ago(1d))
    | where IsBillable == true
    | summarize AvgDailyMB = round(avg(Quantity), 0) by DataType;
Usage
| where TimeGenerated > ago(1d)
| where IsBillable == true
| summarize TodayMB = round(sum(Quantity), 0) by DataType
| join kind=inner baseline on DataType
| extend PercentChange = round((TodayMB - AvgDailyMB) * 100.0 / AvgDailyMB, 0)
| where abs(PercentChange) > 50
| project DataType, TodayMB, AvgDailyMB, PercentChange
| sort by abs(PercentChange) desc

Expected Output

DataType	TodayMB	AvgDailyMB	PercentChange
CommonSecurityLog	8,450	2,100	+302%
SigninLogs	180	1,200	-85%

What to look for: Two anomaly types matter equally. Spikes (+302%): unexpected cost increase — investigate the cause (DDoS, config change, new verbose log source). Drops (-85%): data loss — your detections are blind. A drop in SigninLogs means your token replay and brute-force rules are not firing. Fix drops before spikes — missing data is worse than extra data.

3. Analytics rule health:

1
2
3
4
5
6
7
SentinelHealth
| where TimeGenerated > ago(7d)
| where SentinelResourceType == "Analytic rule"
| where Status == "Failure"
| summarize FailCount = count(), LastFailure = max(TimeGenerated)
    by SentinelResourceName
| sort by FailCount desc

Expected Output

SentinelResourceName	FailCount	LastFailure
Token replay from novel IP	14	2026-03-21 08:15
Inbox rule with suspicious keywords	2	2026-03-19 14:22

What to look for: 14 failures for the token replay rule means it has been broken for days — 14 scheduled runs failed. Common cause: the rule references a table that was moved to Basic tier (join not supported) or a table that stopped receiving data. 2 failures may be transient (service hiccup) — monitor but do not panic.

Three queries. Five minutes. Every week.

These three queries — connector health, volume anomalies, rule health — are the minimum monitoring for any Sentinel deployment. They catch 90% of operational problems before they impact detection capability. Save them as favorites in your workspace and run them at the start of every shift or every Monday.

Check your understanding

1. The volume anomaly query shows SigninLogs at -85% compared to the 7-day average. What is the correct response priority?

Highest priority — fix immediately. SigninLogs data loss means token replay detection, brute force detection, and impossible travel detection are all blind. Every minute without sign-in data is a minute you cannot detect account compromise.

Medium priority — investigate next week

Low priority — users can still sign in

Data drops are the highest-priority operational issue in a SIEM. Your security posture degrades silently — no alerts fire, no one notices until an investigation returns empty results. Fix ingestion gaps before building new detections, before tuning existing rules, before everything else.

← 6.6 Connector Troubleshooting 6.8 Module Assessment →