6.3 Third-Party Connectors: Syslog and CEF

75 minutes · Module 6

Third-Party Connectors: Syslog and CEF

By the end of this subsection, you will understand the Syslog and CEF ingestion paths, know how to deploy and size a Linux forwarder, configure rsyslog, and design a high-availability architecture for production environments.

Most third-party security devices — firewalls, IDS/IPS, proxies, VPN concentrators, switches — send logs via Syslog or Common Event Format (CEF). Getting this data into Sentinel requires a Linux forwarder running the Azure Monitor Agent (AMA).

Syslog vs CEF — choose CEF when available

AttributeSyslogCEF
StructureUnstructured text with facility/severity headerStructured key-value pairs with a defined schema
TableSyslogCommonSecurityLog
KQL queryabilityRequires parse or extract to get usable fieldsFields are pre-parsed: SourceIP, DestinationIP, DeviceAction are columns
Query performanceSlower — regex at query timeFaster — direct column filtering
Best forLinux servers, custom apps, devices that only output raw SyslogFirewalls, IDS/IPS, WAFs, security appliances
CEF is always preferred when the device supports it

A KQL query filtering where SourceIP == "203.0.113.45" against CommonSecurityLog runs in seconds. The same filter against raw Syslog requires where SyslogMessage has "203.0.113.45" or a parse operation — slower, less precise, and more error-prone. Configure your devices for CEF output whenever the option exists.

Forwarder architecture

SYSLOG/CEF INGESTION ARCHITECTUREFirewallIDS/IPSCEF/514Linux Forwarderrsyslog + AMA + DCRUbuntu 22.04 | 4 vCPU | 8 GB RAMHTTPS/443Sentinel WorkspaceCommonSecurityLog / Syslog

Forwarder sizing

Daily CEF/Syslog volumevCPURAMDiskNotes
Under 10 GB/day24 GB30 GBLab or small environment
10-50 GB/day48 GB64 GBMost production environments
50-100 GB/day816 GB128 GBHigh-volume with multiple sources
Over 100 GB/dayMultiple forwarders behind load balancer16 GB each128 GB eachDistribute load across VMs
Disk space is the silent killer

When the AMA cannot send data (network issue, workspace throttling), rsyslog buffers to disk. A 30 GB disk fills in hours under high volume, and a full disk causes data loss. Size the disk for at least 24 hours of buffer: daily volume in GB + 50% overhead. Monitor disk usage with an Azure alert at 80% threshold.

Step-by-step deployment

Step 1: Deploy the VM. Ubuntu 22.04 LTS recommended. Azure VM for cloud-native environments; on-premises VM when the data sources are on-premises (reduces egress bandwidth). Place the VM in the same network segment as the source devices if possible.

Step 2: Configure rsyslog to receive on port 514. SSH into the forwarder:

1
2
3
4
5
6
7
-- Note: these are shell commands, not KQL
-- Edit /etc/rsyslog.conf and uncomment:
-- module(load="imudp")
-- input(type="imudp" port="514")
-- module(load="imtcp")
-- input(type="imtcp" port="514")
-- Then restart: sudo systemctl restart rsyslog

Enable both UDP and TCP. UDP is the Syslog default but drops packets under load. TCP provides reliable delivery. Configure your source devices for TCP when available.

Step 3: Install the AMA. In Sentinel, navigate to the CEF connector page. Follow the AMA installation command — it is a single shell command that downloads and registers the agent with your workspace.

Step 4: Create a Data Collection Rule. The DCR associates the forwarder VM with your workspace and defines the log facility/severity filters and any KQL transformations. Configuration details are in subsection 6.4.

Step 5: Configure source devices. On each firewall/device, set the Syslog output destination to the forwarder VM’s IP, port 514, TCP preferred. Set format to CEF if supported.

Verification

1
2
3
4
5
CommonSecurityLog
| where TimeGenerated > ago(1h)
| summarize EventCount = count(), LastEvent = max(TimeGenerated)
    by DeviceVendor, DeviceProduct
| sort by EventCount desc
Expected Output
DeviceVendorDeviceProductEventCountLastEvent
Palo Alto NetworksPAN-OS4,52114:31
FortinetFortiGate2,84714:32
What to look for: Your device vendor and product appear with recent events. If empty: (1) Is the device sending? Test: sudo tcpdump -i any port 514 -c 10 on the forwarder. (2) Is rsyslog running? systemctl status rsyslog. (3) Is AMA connected? Check Azure Portal, Monitor, Data Collection Rules, verify the VM is listed.

High availability for production

A single forwarder is a single point of failure. If it goes down, all third-party data stops.

Production HA architecture: Deploy two forwarders behind an Azure Load Balancer (or on-premises load balancer). Configure source devices to send to the load balancer VIP. Both forwarders run AMA and send to the same workspace. The load balancer distributes traffic; if one forwarder fails, the other handles all traffic.

Test failover before you need it

After deploying HA, shut down one forwarder and verify data continues flowing. Then shut down the other and verify on the first. Document the failover behavior. The worst time to discover your HA does not work is during an incident when you need the firewall data.

Try it yourself

Your organization has 3 Palo Alto firewalls generating a combined 35 GB/day of CEF data, and 5 Linux servers generating 3 GB/day of Syslog. Design the forwarder architecture: how many VMs, what specs, and should you use one forwarder for both or separate them?

Recommended: Two forwarders behind a load balancer.

Combined volume: 38 GB/day. This is in the 10-50 GB range, so each forwarder needs 4 vCPU, 8 GB RAM, 128 GB disk (24 hours of buffer at full load per forwarder).

Both CEF (firewalls) and Syslog (Linux servers) can go through the same forwarders — rsyslog handles both protocols and the AMA routes them to the correct tables (CommonSecurityLog for CEF, Syslog for raw). Separate forwarders for each protocol are unnecessary unless you have specific isolation requirements.

The load balancer distributes across both VMs. Under normal conditions, each handles ~19 GB/day. During a failover, the surviving forwarder handles 38 GB/day — within spec for the 4 vCPU / 8 GB configuration.

Check your understanding

1. Why configure TCP instead of UDP for Syslog transport when both are available?

TCP provides reliable, ordered delivery. UDP drops packets when the forwarder is under load or the network is congested — dropped packets are dropped log events, which creates gaps in your security data. TCP retransmits lost packets.
TCP is faster
UDP does not work with CEF

2. Your forwarder VM disk is 92% full. What is happening and what is the risk?

rsyslog is buffering data to disk because the AMA cannot forward to the workspace (network issue, AMA crash, or throttling). At 100%, rsyslog cannot buffer and begins dropping events — data loss. Fix the AMA connection immediately, then the buffer drains.
Too many log files — safe to ignore
The VM needs a larger disk as a permanent fix