6.7 Connector Validation and Ongoing Monitoring

75 minutes · Module 6

Connector Validation and Ongoing Monitoring

By the end of this subsection, you will have a validation checklist for new connectors and a weekly monitoring routine that catches connector failures before they create blind spots.

New connector validation checklist

Every time you enable a new connector, run through this checklist before marking it as complete:

CheckQuery or actionExpected result
Data arrivingQuery the target table for events in the last hourEventCount > 0
Correct tableVerify data lands in the expected table (e.g., CommonSecurityLog not Syslog)Table name matches connector documentation
Fields populatedCheck critical columns are not nullSourceIP, DeviceAction, TimeGenerated populated
Ingestion latencyRun the latency query from 6.6Average < 5 minutes
Volume matches estimateCompare actual 24-hour volume to your pre-connection estimateWithin 30% of estimate
DCR filtering activeIf a DCR was applied, verify volume reductionReduction matches expected percentage
No duplicatesRun the duplicate detection query from 6.6Zero or near-zero duplicates
Analytics rules compatibleRun any planned analytics rules against the new dataRules return expected results
Create a validation template

Copy this checklist into your SOC wiki and fill it out for every new connector. The completed checklist serves as documentation — when someone asks "when was the Palo Alto connector set up and is it working?" you have the answer with evidence.

Weekly monitoring routine

Run these three queries every Monday (or at the start of each shift in a 24/7 SOC):

1. Connector health — all tables:

1
2
3
4
5
6
7
8
9
union withsource=TableName *
| where TimeGenerated > ago(24h)
| summarize
    EventCount = count(),
    LastEvent = max(TimeGenerated),
    HoursAgo = round(datetime_diff('minute', now(), max(TimeGenerated)) / 60.0, 1)
    by TableName
| where HoursAgo > 2
| sort by HoursAgo desc
Expected Output — Healthy = Empty Results
No results — all tables have events within the last 2 hours
What to look for: Any table appearing in this result has a data gap. A 2-hour threshold during business hours is the alert level. During weekends or off-hours, some tables (like AuditLogs) may naturally go quiet — adjust the threshold if your org has low weekend activity.

2. Volume anomaly check:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
let baseline =
    Usage
    | where TimeGenerated between (ago(8d) .. ago(1d))
    | where IsBillable == true
    | summarize AvgDailyMB = round(avg(Quantity), 0) by DataType;
Usage
| where TimeGenerated > ago(1d)
| where IsBillable == true
| summarize TodayMB = round(sum(Quantity), 0) by DataType
| join kind=inner baseline on DataType
| extend PercentChange = round((TodayMB - AvgDailyMB) * 100.0 / AvgDailyMB, 0)
| where abs(PercentChange) > 50
| project DataType, TodayMB, AvgDailyMB, PercentChange
| sort by abs(PercentChange) desc
Expected Output
DataTypeTodayMBAvgDailyMBPercentChange
CommonSecurityLog8,4502,100+302%
SigninLogs1801,200-85%
What to look for: Two anomaly types matter equally. Spikes (+302%): unexpected cost increase — investigate the cause (DDoS, config change, new verbose log source). Drops (-85%): data loss — your detections are blind. A drop in SigninLogs means your token replay and brute-force rules are not firing. Fix drops before spikes — missing data is worse than extra data.

3. Analytics rule health:

1
2
3
4
5
6
7
SentinelHealth
| where TimeGenerated > ago(7d)
| where SentinelResourceType == "Analytic rule"
| where Status == "Failure"
| summarize FailCount = count(), LastFailure = max(TimeGenerated)
    by SentinelResourceName
| sort by FailCount desc
Expected Output
SentinelResourceNameFailCountLastFailure
Token replay from novel IP142026-03-21 08:15
Inbox rule with suspicious keywords22026-03-19 14:22
What to look for: 14 failures for the token replay rule means it has been broken for days — 14 scheduled runs failed. Common cause: the rule references a table that was moved to Basic tier (join not supported) or a table that stopped receiving data. 2 failures may be transient (service hiccup) — monitor but do not panic.
Three queries. Five minutes. Every week.

These three queries — connector health, volume anomalies, rule health — are the minimum monitoring for any Sentinel deployment. They catch 90% of operational problems before they impact detection capability. Save them as favorites in your workspace and run them at the start of every shift or every Monday.

Check your understanding

1. The volume anomaly query shows SigninLogs at -85% compared to the 7-day average. What is the correct response priority?

Highest priority — fix immediately. SigninLogs data loss means token replay detection, brute force detection, and impossible travel detection are all blind. Every minute without sign-in data is a minute you cannot detect account compromise.
Medium priority — investigate next week
Low priority — users can still sign in