8.6 Syslog Data Sources

14-18 hours · Module 8

Syslog Data Sources

Introduction

Syslog is the universal logging protocol for Linux servers and network infrastructure. Unlike CEF (which is a structured format carried over Syslog), plain Syslog messages are typically unstructured text — making them harder to parse but essential for visibility into Linux hosts, routers, switches, load balancers, and applications that do not support CEF.

Syslog architecture

The Syslog connector uses the same log forwarder architecture as CEF (subsection 8.5), with one difference: non-CEF Syslog messages land in the Syslog table instead of CommonSecurityLog.

Direct collection (Linux servers with AMA): If the Syslog source is a Linux server that can run AMA, deploy AMA directly on that server. No log forwarder needed — AMA collects Syslog data from the local rsyslog daemon.

Forwarded collection (network devices): If the source is a network device (router, switch, firewall sending non-CEF Syslog), use the log forwarder model from subsection 8.5 — the device sends Syslog to a Linux forwarder, AMA on the forwarder sends it to the workspace.

Configuring Syslog collection with DCRs

Create a Data Collection Rule that specifies which Syslog facilities and severity levels to collect.

Syslog facilities categorise the source of the message: auth (authentication events), authpriv (privileged authentication), daemon (system daemons), kern (kernel messages), local0-local7 (custom applications), syslog (Syslog daemon itself), user (user-level messages), cron (scheduled tasks).

Severity levels (from most to least severe): emerg, alert, crit, err, warning, notice, info, debug.

For security operations, collect:

auth and authpriv at info level and above — captures authentication events (SSH logins, sudo usage, PAM events). This is the most security-relevant Syslog data from Linux hosts.

daemon at warning level and above — captures service failures and unusual daemon behaviour.

kern at warning level and above — captures kernel-level events (firewall rules, SELinux denials).

local0-local7 at the level your applications use — if your application writes security events to local4, configure local4 at the appropriate level.

Do NOT collect info or debug for all facilities unless you have a specific investigation need. These levels generate enormous volume (routine service status messages, debug output) with minimal security value.

The Syslog table

Syslog events land in the Syslog table.

Key columns: TimeGenerated, Computer (hostname), HostIP, Facility, SeverityLevel, SyslogMessage (the raw message text), ProcessName (the program that generated the message).

1
2
3
4
5
6
7
8
9
// Find failed SSH authentication attempts
Syslog
| where TimeGenerated > ago(24h)
| where Facility == "auth" or Facility == "authpriv"
| where SyslogMessage has "Failed password" or SyslogMessage has "authentication failure"
| parse SyslogMessage with * "from " SourceIP " port " *
| summarize FailedAttempts = count() by SourceIP, Computer
| where FailedAttempts > 10
| order by FailedAttempts desc

1
2
3
4
5
6
7
8
// Find successful sudo usage — privilege escalation on Linux hosts
Syslog
| where TimeGenerated > ago(24h)
| where ProcessName == "sudo"
| where SyslogMessage has "COMMAND"
| parse SyslogMessage with User " : TTY=" * " ; PWD=" * " ; USER=" TargetUser " ; COMMAND=" Command
| project TimeGenerated, Computer, User, TargetUser, Command
| order by TimeGenerated desc

Parsing unstructured Syslog with KQL

Unlike CEF data (which arrives with structured columns), plain Syslog messages require KQL parsing to extract useful fields. The parse operator and extract function are your primary tools.

Common parsing patterns:

1
2
3
4
5
6
7
// Parse SSH authentication logs
Syslog
| where Facility in ("auth", "authpriv")
| where SyslogMessage has "Accepted" or SyslogMessage has "Failed"
| parse SyslogMessage with AuthResult " password for "
    Username " from " SourceIP " port " SourcePort:int " " Protocol
| project TimeGenerated, Computer, AuthResult, Username, SourceIP, SourcePort

1
2
3
4
5
6
7
8
9
// Parse firewall logs (iptables/nftables)
Syslog
| where Facility == "kern"
| where SyslogMessage has "IPTABLES" or SyslogMessage has "nftables"
| extend SourceIP = extract(@"SRC=(\d+\.\d+\.\d+\.\d+)", 1, SyslogMessage)
| extend DestIP = extract(@"DST=(\d+\.\d+\.\d+\.\d+)", 1, SyslogMessage)
| extend DestPort = extract(@"DPT=(\d+)", 1, SyslogMessage)
| extend Protocol = extract(@"PROTO=(\w+)", 1, SyslogMessage)
| project TimeGenerated, Computer, SourceIP, DestIP, DestPort, Protocol

Linux security investigation patterns

Syslog from Linux hosts provides the investigation data for server-side attacks: brute-force SSH attacks, privilege escalation via sudo, unauthorised service installation, and cron-based persistence.

Pattern 1: SSH brute-force detection. Multiple failed SSH authentication attempts from the same source IP within a short time window.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
Syslog
| where TimeGenerated > ago(1h)
| where Facility in ("auth", "authpriv")
| where SyslogMessage has "Failed password"
| parse SyslogMessage with * "from " SourceIP " port " *
| summarize FailedAttempts = count(), TargetUsers = make_set(
    extract(@"for (\S+)", 1, SyslogMessage), 10)
    by SourceIP, Computer
| where FailedAttempts > 20
| project SourceIP, Computer, FailedAttempts, TargetUsers
| order by FailedAttempts desc

This pattern detects brute-force attacks targeting SSH. The make_set function shows which usernames were attempted — a single username suggests targeted credential stuffing, while many usernames suggest a dictionary attack.

Pattern 2: Successful login after brute-force. The most dangerous pattern: many failures followed by a success — indicating the attacker guessed the password.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
let BruteForceIPs = Syslog
| where TimeGenerated > ago(24h)
| where Facility in ("auth", "authpriv")
| where SyslogMessage has "Failed password"
| parse SyslogMessage with * "from " SourceIP " port " *
| summarize FailCount = count() by SourceIP
| where FailCount > 20
| project SourceIP;
Syslog
| where TimeGenerated > ago(24h)
| where Facility in ("auth", "authpriv")
| where SyslogMessage has "Accepted"
| parse SyslogMessage with * "from " SourceIP " port " *
| where SourceIP in (BruteForceIPs)
| project TimeGenerated, Computer, SyslogMessage, SourceIP

If this query returns results, an account has been compromised via brute-force — investigate immediately.

Pattern 3: Suspicious sudo usage. Detect users running unusual commands with elevated privileges.

1
2
3
4
5
6
7
8
9
Syslog
| where TimeGenerated > ago(24h)
| where ProcessName == "sudo"
| where SyslogMessage has "COMMAND"
| parse SyslogMessage with User " : TTY=" * " ; PWD=" WorkingDir " ; USER=" TargetUser " ; COMMAND=" Command
| where Command has_any ("chmod 777", "useradd", "passwd", "visudo",
    "iptables -F", "systemctl disable", "curl | bash", "wget -O -",
    "/etc/shadow", "/etc/sudoers", "crontab -e")
| project TimeGenerated, Computer, User, TargetUser, Command, WorkingDir

Pattern 4: Cron-based persistence. Detect new cron jobs that may indicate attacker persistence.

1
2
3
4
5
6
7
Syslog
| where TimeGenerated > ago(7d)
| where ProcessName == "crontab" or (Facility == "cron" and SyslogMessage has "CMD")
| where SyslogMessage has_any ("REPLACE", "LIST", "BEGIN EDIT")
    or SyslogMessage has_any ("wget", "curl", "bash -c", "python", "nc ", "ncat")
| project TimeGenerated, Computer, SyslogMessage
| order by TimeGenerated desc

ASIM parsers: normalising vendor-specific data

The Advanced Security Information Model (ASIM) provides parsers that normalise data from specific vendors into a standardised schema. Instead of writing vendor-specific KQL for each Syslog source, ASIM parsers handle the format differences and expose a unified set of columns.

How ASIM works: You deploy an ASIM parser (available through Content Hub solutions) as a KQL function in your workspace. Instead of querying the Syslog table directly, you query the ASIM function (e.g., imAuthentication for authentication events, imNetworkSession for network events). The function internally queries Syslog (and other tables), parses vendor-specific formats, and returns standardised columns.

Benefits: Analytics rules written against ASIM-normalised data work across all vendors without modification. A brute-force detection rule that queries imAuthentication detects brute-force against SSH (Syslog), Windows logon (SecurityEvent), and Entra ID sign-in (SigninLogs) simultaneously — because all three data sources are normalised to the same schema by their respective ASIM parsers.

1
2
3
4
5
6
7
// ASIM authentication query — works across all normalised sources
imAuthentication
| where TimeGenerated > ago(1h)
| where EventResult == "Failure"
| summarize FailureCount = count() by SrcIpAddr, TargetUsername, EventProduct
| where FailureCount > 20
| order by FailureCount desc

This single query detects brute-force across Linux SSH, Windows RDP, and Entra ID sign-ins — without writing three separate queries. This is the power of ASIM normalisation.

ASIM parsers: Microsoft provides Advanced Security Information Model (ASIM) parsers that normalise Syslog data from specific vendors into a standardised schema. If an ASIM parser exists for your device (check Content Hub), use it instead of writing custom parse statements — it handles the vendor-specific format and maps fields to the ASIM standard, enabling cross-vendor analytics rules.

Direct collection vs forwarded collection

Direct collection (AMA installed on the Syslog source):

Advantages: no intermediate infrastructure, lower latency, simpler architecture. Use when: the source is a Linux server that can run AMA (Azure VM, Arc-connected on-premises server).

Forwarded collection (source sends to a log forwarder):

Advantages: supports devices that cannot run AMA (network appliances, IoT devices, embedded systems). Use when: the source is a network device, appliance, or any system that outputs Syslog but cannot run an agent.

Dual-use forwarder: A single Linux VM can serve as both a CEF forwarder (subsection 8.5) and a Syslog forwarder simultaneously. CEF messages are recognised by the CEF header and routed to CommonSecurityLog. Non-CEF messages go to the Syslog table. One VM, two data paths.

Syslog volume estimation and cost planning

Linux Syslog volume varies dramatically based on the host role and the facility/severity configuration.

Web server (nginx/Apache): auth facility at info generates ~50-200 MB/day per server (SSH attempts, sudo, PAM events). Adding daemon at warning adds ~20-50 MB/day. Total: ~70-250 MB/day per server.

Database server (PostgreSQL/MySQL): auth at info generates similar volume to web servers. If the database logs queries to Syslog, volume can spike to 1-5 GB/day depending on query volume. Filter database query logs to errors only unless you have specific audit requirements.

Network device (router/switch sending non-CEF Syslog): Volume depends entirely on the device’s logging configuration. A core router logging all interface state changes generates 100-500 MB/day. A switch logging only authentication events generates 10-50 MB/day. Configure the device to send only security-relevant events to the forwarder.

Estimation query (after initial deployment):

1
2
3
4
5
6
7
8
// Estimate Syslog volume per host and facility
Syslog
| where TimeGenerated > ago(7d)
| summarize
    WeeklyEvents = count(),
    EstDailyMB = count() * 0.5 / 1024 / 7  // ~500 bytes per event average
    by Computer, Facility
| order by EstDailyMB desc

Use this query after the first week of collection to identify high-volume hosts that may need facility/severity tuning or DCR filtering.

Cross-platform correlation: Linux Syslog + Windows SecurityEvent + Entra ID SigninLogs

The highest-value investigation pattern combines data from all three authentication sources — detecting attackers who move between Linux, Windows, and cloud environments.

Scenario: compromised credential used across platforms. An attacker obtains a user’s password (via phishing or password spray). They use it to: SSH into a Linux server (Syslog), RDP into a Windows server (SecurityEvent), and sign into M365 (SigninLogs) — all from the same external IP within a short time window.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Cross-platform authentication correlation
let SuspiciousIPs = SigninLogs
| where TimeGenerated > ago(1h)
| where RiskLevelDuringSignIn in ("medium", "high")
| distinct IPAddress;
// Check Linux SSH from the same IPs
let LinuxAuth = Syslog
| where TimeGenerated > ago(1h)
| where Facility in ("auth", "authpriv")
| where SyslogMessage has "Accepted"
| parse SyslogMessage with * "from " SourceIP " port " *
| where SourceIP in (SuspiciousIPs)
| project TimeGenerated, Platform = "Linux", Computer, SourceIP;
// Check Windows RDP from the same IPs
let WindowsAuth = SecurityEvent
| where TimeGenerated > ago(1h)
| where EventID == 4624 and LogonType == 10
| where IpAddress in (SuspiciousIPs)
| project TimeGenerated, Platform = "Windows", Computer, SourceIP = IpAddress;
// Combine
union LinuxAuth, WindowsAuth
| order by TimeGenerated asc

If this query returns results from multiple platforms for the same IP, the attacker has moved laterally from cloud to on-premises infrastructure — a high-severity finding that requires immediate containment across all three environments.

Linux-specific detection rules using Syslog

Rule 1: SSH key-based authentication from a new key. Detect when a new SSH public key is used for authentication — may indicate an attacker who added their key to authorized_keys for persistence.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Syslog
| where TimeGenerated > ago(1h)
| where Facility in ("auth", "authpriv")
| where SyslogMessage has "Accepted publickey"
| parse SyslogMessage with * "for " Username " from " SourceIP " port " * " ssh2: " KeyFingerprint
| join kind=leftanti (
    Syslog
    | where TimeGenerated between(ago(30d) .. ago(1h))
    | where SyslogMessage has "Accepted publickey"
    | parse SyslogMessage with * "ssh2: " HistoricalKey
    | distinct HistoricalKey
) on $left.KeyFingerprint == $right.HistoricalKey
| project TimeGenerated, Computer, Username, SourceIP, KeyFingerprint

Rule 2: Service started or stopped unexpectedly. Detect when a critical service (sshd, httpd, postgresql) is stopped or started outside of a maintenance window.

1
2
3
4
5
6
7
Syslog
| where TimeGenerated > ago(1h)
| where Facility == "daemon"
| where SyslogMessage has_any ("Stopping", "Starting", "Started", "Stopped")
| where ProcessName in ("systemd", "init")
| where SyslogMessage has_any ("sshd", "httpd", "nginx", "postgresql", "mysql", "docker")
| project TimeGenerated, Computer, SyslogMessage

Try it yourself

If you have a Linux VM with AMA deployed, create a DCR for Syslog collection with auth and authpriv facilities at info level. SSH into the VM (generating auth events), then query: Syslog | where TimeGenerated > ago(30m) | where Facility in ("auth", "authpriv") | take 10. Parse the SSH authentication events using the parse pattern above.

What you should observe

Your SSH login generates auth events in the Syslog table within 5-10 minutes. The SyslogMessage contains the raw text ("Accepted publickey for user from IP port 22"). The parse operator extracts the structured fields. This is the workflow for any Syslog source: collect raw, parse with KQL, build analytics rules on the parsed fields.

Knowledge check

Check your understanding

1. A Linux web server generates auth, daemon, and kern Syslog messages. You want to collect security-relevant events without excessive volume. Which facilities and levels do you configure in the DCR?

auth and authpriv at info level (captures all authentication events including successful and failed logins). daemon at warning level (captures service failures, excludes routine status). kern at warning level (captures kernel security events like firewall rules). This provides security visibility while excluding the high-volume info and debug messages from daemon and kern that rarely contribute to investigation.

All facilities at debug level

Only auth at emerg level

Only kern at all levels

Selective facility and level configuration is the correct approach. All facilities at debug generates massive volume with minimal security value. Emerg-only misses most security events (failed logins are info-level). The targeted configuration provides security coverage at manageable volume.

← 8.5 Common Event Format (CEF) Connectors 8.7 Data Collection Rules: Filter, Transform, Route →