6.6 Connector Troubleshooting
Connector Troubleshooting
By the end of this subsection, you will be able to diagnose and fix the five most common connector failures using systematic troubleshooting and KQL diagnostic queries.
Connectors fail silently. The workspace does not alert you when data stops flowing — you discover the failure when an investigation returns empty results or an analytics rule stops firing. This subsection teaches you to find and fix failures before they create blind spots.
The universal diagnostic approach
Every connector failure falls into one of three zones. Determine which zone first, then drill down.
Zone 1: Source device — Is the device generating and sending logs? Zone 2: Transport layer — Is the data reaching the forwarder/ingestion endpoint? Zone 3: Workspace — Is the data being accepted, transformed, and stored?
Start at Zone 1, work forward. This prevents the common mistake of debugging the workspace when the firewall stopped sending logs.
Problem 1: Connector shows “Connected” but no data
Symptom: Green status in the connector page. Target table returns zero events.
Zone 1 check — is the device sending?
For Syslog/CEF sources, SSH to the forwarder:
| |
Zone 2 check — is rsyslog processing?
| |
Zone 3 check — is the AMA connected?
Check Azure Portal → Monitor → Data Collection Rules → verify the forwarder VM is listed. Check AMA agent health: sudo /opt/microsoft/azuremonitoragent/bin/mdsd --version on the forwarder. If AMA is not running, restart: sudo systemctl restart azuremonitoragent.
For Microsoft first-party connectors (no forwarder), check:
- Entra ID: Are diagnostic settings still configured? (Settings can be deleted during admin changes)
- M365 Defender: Is the connector still enabled in the Sentinel portal? (Portal updates can reset connector state)
- Azure Activity: Is the subscription still selected?
Problem 2: Data flowing but with high latency
Symptom: Data arrives 10-30 minutes after the event occurred.
| |
| AvgDelay (min) | P95Delay (min) | MaxDelay (min) |
|---|---|---|
| 18 | 32 | 47 |
top), disk I/O bottleneck (rsyslog buffering), old AMA version, network congestion between forwarder and Azure. Fix: upgrade AMA, increase VM resources, move forwarder to same Azure region as workspace.Problem 3: Duplicate events
Symptom: The same event appears multiple times.
| |
| DuplicateEvents | AffectedBuckets |
|---|---|
| 1,247 | 623 |
Problem 4: Missing fields (null columns)
Symptom: Data arrives but SourceIP, DeviceAction, or other critical fields are null.
| |
| DeviceVendor | DeviceProduct | SourceIP | DeviceAction | Message (truncated) |
|---|---|---|---|---|
| Palo Alto | PAN-OS | (empty) | (empty) | Mar 21 14:32 fw01 1,2026/03/21 14:32,TRAFFIC,allow... |
Problem 5: Connector was working, then stopped
Systematic check (in order):
- Source device: Configuration change? Reboot that lost Syslog config? Firmware update that changed log format?
- Network: Firewall rule change blocking port 514 (to forwarder) or 443 (forwarder to Azure)?
- Forwarder: VM running? Disk full? rsyslog running? AMA running?
- DCR: Was the DCR modified? A syntax error in the transformation silently drops all data. Check the DCR’s “last modified” timestamp.
- Workspace: Accepting data? Check the ingestion health query from Module 5.9.
A broken DCR transformation does not generate an error message. It silently drops all data that passes through it. If a connector was working and suddenly stopped after someone modified the DCR, revert the transformation to the previous version and test.
Try it yourself
Zone 1 (source): SSH to forwarder, run sudo tcpdump -i any port 514 -c 10.
→ If packets arrive: Zone 1 is healthy. Move to Zone 2.
→ If no packets: Check the Palo Alto Syslog server configuration. Did someone remove or change the Syslog destination? Did the firewall reboot and lose the config?
Zone 2 (forwarder): systemctl status rsyslog — is it running? df -h — is the disk full? tail -f /var/log/syslog | grep CEF — are CEF messages being processed?
→ If rsyslog is processing CEF: Move to Zone 3.
→ If disk is 95%+: AMA cannot send. Clear rsyslog buffer after fixing AMA.
Zone 3 (AMA/workspace): Check AMA status: systemctl status azuremonitoragent. Check DCR: was it modified in the last 4 hours? Run the workspace ingestion health query to verify other tables are still flowing.
→ If other tables flow but CommonSecurityLog does not: DCR or AMA issue specific to CEF.
→ If all tables stopped: Workspace-level problem (throttling, permissions, subscription issue).
Check your understanding
1. SigninLogs stopped flowing but all other tables are healthy. Where is the most likely failure?
2. A colleague modified a DCR transformation yesterday. Today, CommonSecurityLog has zero events. No error messages anywhere. What happened?