8.6 Syslog Data Sources
Syslog Data Sources
Introduction
Syslog is the universal logging protocol for Linux servers and network infrastructure. Unlike CEF (which is a structured format carried over Syslog), plain Syslog messages are typically unstructured text — making them harder to parse but essential for visibility into Linux hosts, routers, switches, load balancers, and applications that do not support CEF.
Syslog architecture
The Syslog connector uses the same log forwarder architecture as CEF (subsection 8.5), with one difference: non-CEF Syslog messages land in the Syslog table instead of CommonSecurityLog.
Direct collection (Linux servers with AMA): If the Syslog source is a Linux server that can run AMA, deploy AMA directly on that server. No log forwarder needed — AMA collects Syslog data from the local rsyslog daemon.
Forwarded collection (network devices): If the source is a network device (router, switch, firewall sending non-CEF Syslog), use the log forwarder model from subsection 8.5 — the device sends Syslog to a Linux forwarder, AMA on the forwarder sends it to the workspace.
Configuring Syslog collection with DCRs
Create a Data Collection Rule that specifies which Syslog facilities and severity levels to collect.
Syslog facilities categorise the source of the message: auth (authentication events), authpriv (privileged authentication), daemon (system daemons), kern (kernel messages), local0-local7 (custom applications), syslog (Syslog daemon itself), user (user-level messages), cron (scheduled tasks).
Severity levels (from most to least severe): emerg, alert, crit, err, warning, notice, info, debug.
For security operations, collect:
auth and authpriv at info level and above — captures authentication events (SSH logins, sudo usage, PAM events). This is the most security-relevant Syslog data from Linux hosts.
daemon at warning level and above — captures service failures and unusual daemon behaviour.
kern at warning level and above — captures kernel-level events (firewall rules, SELinux denials).
local0-local7 at the level your applications use — if your application writes security events to local4, configure local4 at the appropriate level.
Do NOT collect info or debug for all facilities unless you have a specific investigation need. These levels generate enormous volume (routine service status messages, debug output) with minimal security value.
The Syslog table
Syslog events land in the Syslog table.
Key columns: TimeGenerated, Computer (hostname), HostIP, Facility, SeverityLevel, SyslogMessage (the raw message text), ProcessName (the program that generated the message).
| |
| |
Parsing unstructured Syslog with KQL
Unlike CEF data (which arrives with structured columns), plain Syslog messages require KQL parsing to extract useful fields. The parse operator and extract function are your primary tools.
Common parsing patterns:
| |
| |
Linux security investigation patterns
Syslog from Linux hosts provides the investigation data for server-side attacks: brute-force SSH attacks, privilege escalation via sudo, unauthorised service installation, and cron-based persistence.
Pattern 1: SSH brute-force detection. Multiple failed SSH authentication attempts from the same source IP within a short time window.
| |
This pattern detects brute-force attacks targeting SSH. The make_set function shows which usernames were attempted — a single username suggests targeted credential stuffing, while many usernames suggest a dictionary attack.
Pattern 2: Successful login after brute-force. The most dangerous pattern: many failures followed by a success — indicating the attacker guessed the password.
| |
If this query returns results, an account has been compromised via brute-force — investigate immediately.
Pattern 3: Suspicious sudo usage. Detect users running unusual commands with elevated privileges.
| |
Pattern 4: Cron-based persistence. Detect new cron jobs that may indicate attacker persistence.
| |
ASIM parsers: normalising vendor-specific data
The Advanced Security Information Model (ASIM) provides parsers that normalise data from specific vendors into a standardised schema. Instead of writing vendor-specific KQL for each Syslog source, ASIM parsers handle the format differences and expose a unified set of columns.
How ASIM works: You deploy an ASIM parser (available through Content Hub solutions) as a KQL function in your workspace. Instead of querying the Syslog table directly, you query the ASIM function (e.g., imAuthentication for authentication events, imNetworkSession for network events). The function internally queries Syslog (and other tables), parses vendor-specific formats, and returns standardised columns.
Benefits: Analytics rules written against ASIM-normalised data work across all vendors without modification. A brute-force detection rule that queries imAuthentication detects brute-force against SSH (Syslog), Windows logon (SecurityEvent), and Entra ID sign-in (SigninLogs) simultaneously — because all three data sources are normalised to the same schema by their respective ASIM parsers.
| |
This single query detects brute-force across Linux SSH, Windows RDP, and Entra ID sign-ins — without writing three separate queries. This is the power of ASIM normalisation.
ASIM parsers: Microsoft provides Advanced Security Information Model (ASIM) parsers that normalise Syslog data from specific vendors into a standardised schema. If an ASIM parser exists for your device (check Content Hub), use it instead of writing custom parse statements — it handles the vendor-specific format and maps fields to the ASIM standard, enabling cross-vendor analytics rules.
Direct collection vs forwarded collection
Direct collection (AMA installed on the Syslog source):
Advantages: no intermediate infrastructure, lower latency, simpler architecture. Use when: the source is a Linux server that can run AMA (Azure VM, Arc-connected on-premises server).
Forwarded collection (source sends to a log forwarder):
Advantages: supports devices that cannot run AMA (network appliances, IoT devices, embedded systems). Use when: the source is a network device, appliance, or any system that outputs Syslog but cannot run an agent.
Dual-use forwarder: A single Linux VM can serve as both a CEF forwarder (subsection 8.5) and a Syslog forwarder simultaneously. CEF messages are recognised by the CEF header and routed to CommonSecurityLog. Non-CEF messages go to the Syslog table. One VM, two data paths.
Syslog volume estimation and cost planning
Linux Syslog volume varies dramatically based on the host role and the facility/severity configuration.
Web server (nginx/Apache): auth facility at info generates ~50-200 MB/day per server (SSH attempts, sudo, PAM events). Adding daemon at warning adds ~20-50 MB/day. Total: ~70-250 MB/day per server.
Database server (PostgreSQL/MySQL): auth at info generates similar volume to web servers. If the database logs queries to Syslog, volume can spike to 1-5 GB/day depending on query volume. Filter database query logs to errors only unless you have specific audit requirements.
Network device (router/switch sending non-CEF Syslog): Volume depends entirely on the device’s logging configuration. A core router logging all interface state changes generates 100-500 MB/day. A switch logging only authentication events generates 10-50 MB/day. Configure the device to send only security-relevant events to the forwarder.
Estimation query (after initial deployment):
| |
Use this query after the first week of collection to identify high-volume hosts that may need facility/severity tuning or DCR filtering.
Cross-platform correlation: Linux Syslog + Windows SecurityEvent + Entra ID SigninLogs
The highest-value investigation pattern combines data from all three authentication sources — detecting attackers who move between Linux, Windows, and cloud environments.
Scenario: compromised credential used across platforms. An attacker obtains a user’s password (via phishing or password spray). They use it to: SSH into a Linux server (Syslog), RDP into a Windows server (SecurityEvent), and sign into M365 (SigninLogs) — all from the same external IP within a short time window.
| |
If this query returns results from multiple platforms for the same IP, the attacker has moved laterally from cloud to on-premises infrastructure — a high-severity finding that requires immediate containment across all three environments.
Linux-specific detection rules using Syslog
Rule 1: SSH key-based authentication from a new key. Detect when a new SSH public key is used for authentication — may indicate an attacker who added their key to authorized_keys for persistence.
| |
Rule 2: Service started or stopped unexpectedly. Detect when a critical service (sshd, httpd, postgresql) is stopped or started outside of a maintenance window.
| |
Try it yourself
If you have a Linux VM with AMA deployed, create a DCR for Syslog collection with auth and authpriv facilities at info level. SSH into the VM (generating auth events), then query: Syslog | where TimeGenerated > ago(30m) | where Facility in ("auth", "authpriv") | take 10. Parse the SSH authentication events using the parse pattern above.
What you should observe
Your SSH login generates auth events in the Syslog table within 5-10 minutes. The SyslogMessage contains the raw text ("Accepted publickey for user from IP port 22"). The parse operator extracts the structured fields. This is the workflow for any Syslog source: collect raw, parse with KQL, build analytics rules on the parsed fields.
Knowledge check
Check your understanding
1. A Linux web server generates auth, daemon, and kern Syslog messages. You want to collect security-relevant events without excessive volume. Which facilities and levels do you configure in the DCR?