8.8 Custom Logs and API Ingestion

14-18 hours · Module 8

Custom Logs and API Ingestion

Introduction

Standard connectors cover Microsoft services, CEF devices, and Syslog sources. But every organisation has data sources that do not fit these categories: custom web applications with their own authentication logs, SaaS platforms with webhook-based event export, legacy systems with proprietary log formats, and security tools from vendors without Sentinel connectors. The Logs Ingestion API brings all of this data into Sentinel.


Logs Ingestion API architecture

The Logs Ingestion API is an HTTPS endpoint that accepts JSON payloads and routes them through a Data Collection Rule to a table in the workspace.

The pipeline: Your application → HTTPS POST to Data Collection Endpoint (DCE) → Data Collection Rule (DCR) processes and transforms → data lands in a custom table (or standard table) in the workspace.

Components:

Data Collection Endpoint (DCE) — the HTTPS endpoint that receives the data. Created in Azure Monitor. Each DCE has a unique URL: https://<dce-name>.<region>.ingest.monitor.azure.com.

Data Collection Rule (DCR) — defines the schema (what fields to expect), transformations (how to process the data), and destination (which table to write to). The DCR ID is referenced in the API call.

Custom table — a table in the workspace with a schema you define. Custom table names must end with _CL (custom log suffix). Example: ApplicationAuth_CL, SaaSAudit_CL, LegacyFirewall_CL.

Authentication — the API call authenticates with an Entra ID service principal or managed identity. The identity must have the “Monitoring Metrics Publisher” role on the DCR.


Creating a custom log ingestion pipeline

Step 1: Create the custom table. Navigate to Log Analytics workspace → Tables → Create → New custom log (DCR-based). Define the table name (e.g., WebAppAuth_CL) and the schema (columns and data types).

Step 2: Create the DCE. Azure portal → Monitor → Data Collection Endpoints → Create. Select the region matching your workspace.

Step 3: Create the DCR. Azure portal → Monitor → Data Collection Rules → Create. Select “Custom” as the data source type. Define the incoming data schema (the JSON fields your application sends), the transformation KQL (optional), and the destination (your custom table in the workspace).

Step 4: Configure authentication. Create a service principal or managed identity. Assign the “Monitoring Metrics Publisher” role on the DCR. Your application uses this identity to authenticate API calls.

Step 5: Send data. Your application makes HTTPS POST requests to the DCE:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Conceptual API call (not KQL  shown for reference):
// POST https://<dce>.uksouth.ingest.monitor.azure.com
//      /dataCollectionRules/<dcr-id>/streams/Custom-WebAppAuth_CL
//      ?api-version=2023-01-01
// Headers:
//   Authorization: Bearer <token>
//   Content-Type: application/json
// Body:
// [
//   {
//     "TimeGenerated": "2026-03-22T10:15:00Z",
//     "Username": "j.morrison@northgateeng.com",
//     "Action": "login_failed",
//     "SourceIP": "203.0.113.47",
//     "Application": "InternalHR",
//     "FailureReason": "invalid_password"
//   }
// ]

Step 6: Verify. Query the custom table:

1
2
3
4
WebAppAuth_CL
| where TimeGenerated > ago(1h)
| project TimeGenerated, Username, Action, SourceIP, Application, FailureReason
| order by TimeGenerated desc

Common custom log use cases

Custom web application authentication. Your internal application has its own authentication system (not Entra ID). Authentication events (login success, login failure, password reset, account lockout) are written to the application’s log file or database. A scheduled script or Logic App reads these events and sends them to Sentinel via the API. Analytics rules then detect brute-force attacks, credential stuffing, and compromised accounts in the custom application.

SaaS platform audit events. A SaaS platform (e.g., Salesforce, HubSpot, custom CRM) provides webhook notifications for security events (user creation, permission changes, data exports). Configure the webhook to send events to an Azure Function, which formats them as JSON and POSTs to the DCE. Analytics rules detect suspicious activity (bulk data export, admin role assignment to a new user).

Legacy system integration. An older system (mainframe, legacy database, proprietary application) writes logs to a flat file. A scheduled script on the server reads new log entries, parses them into JSON, and sends to the DCE. This brings legacy system visibility into the same Sentinel workspace as your modern infrastructure.

Threat intelligence enrichment. External threat intelligence sources that provide indicators via API (not STIX/TAXII) can push indicators to a custom table. Analytics rules join this table with SigninLogs and DeviceNetworkEvents for TI-matching (similar to ThreatIntelligenceIndicator but for custom feeds).


Implementation patterns: Logic App vs Azure Function vs Script

Three common approaches to sending data to the Logs Ingestion API. Choose based on your environment and skillset.

Logic App (low-code, event-driven). Create a Logic App triggered by a schedule (every 5 minutes), an HTTP webhook (when the SaaS platform sends an event), or a queue message (when another system deposits events). The Logic App formats the data as JSON and calls the DCE endpoint. Logic Apps handle authentication, retries, and error handling natively. Best for: SaaS webhook integrations, scheduled data pulls from REST APIs, and non-developer teams.

Azure Function (code-based, event-driven). Write a function in Python, C#, or JavaScript that receives events (via HTTP trigger, timer trigger, or queue trigger), formats them, authenticates to the DCE, and POSTs the data. Azure Functions provide more control over parsing, transformation, and error handling than Logic Apps. Best for: complex data sources with custom parsing requirements, high-volume ingestion, and development teams comfortable with code.

Scheduled script (on-premises). A PowerShell or Python script running on a scheduled task (Windows) or cron job (Linux). The script reads log files or database records, formats as JSON, authenticates to the DCE via service principal, and POSTs the data. Best for: legacy systems, on-premises data sources that cannot reach Azure services directly, and environments without Azure Function/Logic App infrastructure.

Example: PowerShell ingestion script pattern:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Conceptual PowerShell pattern (not KQL):
// $token = Get-AzAccessToken -ResourceUrl "https://monitor.azure.com"
// $body = @(
//   @{
//     TimeGenerated = (Get-Date).ToUniversalTime().ToString("o")
//     Username = "j.morrison"
//     Action = "login_failed"
//     SourceIP = "203.0.113.47"
//   }
// ) | ConvertTo-Json -AsArray
// Invoke-RestMethod -Uri "$dceEndpoint/dataCollectionRules/$dcrId/streams/Custom-AppAuth_CL?api-version=2023-01-01" `
//   -Method POST -Body $body -Headers @{Authorization="Bearer $($token.Token)"} `
//   -ContentType "application/json"

Error handling and rate limiting

The Logs Ingestion API has rate limits and returns specific error codes. Your ingestion mechanism must handle these.

HTTP 429 (Too Many Requests). The API rate limit has been exceeded. Back off and retry after the interval specified in the Retry-After header. Common when sending large batches in rapid succession. Mitigate by: batching events (send 100 events per request instead of 1), spacing requests (add a 1-second delay between batches), and increasing the batch interval (collect events for 5 minutes, then send in one batch).

HTTP 400 (Bad Request). The JSON payload does not match the DCR schema. Common causes: missing required fields (TimeGenerated), wrong data types (sending a string where the schema expects a number), extra fields not in the schema, and malformed JSON. Fix: validate the payload against the DCR schema before sending.

HTTP 403 (Forbidden). Authentication failed. The service principal token has expired, the managed identity does not have the “Monitoring Metrics Publisher” role on the DCR, or the DCE is in a different tenant. Fix: refresh the token, verify role assignments.

HTTP 413 (Payload Too Large). The request body exceeds the maximum size (1 MB per request). Fix: split large batches into smaller requests.

Retry strategy: Implement exponential backoff with jitter for transient errors (429, 500, 503). Do not retry 400 errors (fix the payload). Do not retry 403 errors (fix authentication). Log all errors for troubleshooting.


Custom table schema design

Design the table schema based on how you will query the data. Include: TimeGenerated (required — every table must have this), entity columns (UserPrincipalName, IPAddress, DeviceName — for entity mapping in analytics rules), action columns (what happened), and context columns (why it happened, which resource was accessed).

Entity mapping is critical. If your custom table includes an IP address column, analytics rules can map it as an IP entity — enabling automatic correlation with other tables that contain the same IP. Without entity-compatible column names, the analytics rule engine cannot correlate custom data with standard data.

Recommended column naming: Use the same column names as standard Sentinel tables where the data type matches. If your custom table has an IP address, name the column IPAddress (matching SigninLogs). If it has a user identifier, name it UserPrincipalName (matching SigninLogs) or AccountName (matching DeviceLogonEvents). This enables cross-table joins without column renaming.


Schema design examples

Example 1: Custom web application authentication table.

Table name: WebAppAuth_CL

ColumnTypePurpose
TimeGenerateddatetimeRequired — when the event occurred
UserPrincipalNamestringEntity-compatible — the user who authenticated
IPAddressstringEntity-compatible — the source IP
ActionstringWhat happened: login_success, login_failed, password_reset, account_locked
ApplicationstringWhich application was accessed
FailureReasonstringWhy the authentication failed (if applicable)
UserAgentstringBrowser or client information
SessionIdstringCorrelation identifier for the session

Analytics rule example: detect brute-force against the custom app:

1
2
3
4
5
6
WebAppAuth_CL
| where TimeGenerated > ago(1h)
| where Action == "login_failed"
| summarize FailCount = count(), TargetUsers = make_set(UserPrincipalName, 10)
    by IPAddress
| where FailCount > 20

Example 2: SaaS audit events table.

Table name: SaaSAudit_CL

ColumnTypePurpose
TimeGenerateddatetimeRequired
UserPrincipalNamestringEntity-compatible — who performed the action
IPAddressstringEntity-compatible — from where
ActionTypestringWhat was done: user_created, permission_changed, data_exported, config_modified
ResourceTypestringWhat was affected: user, role, document, setting
ResourceNamestringSpecific resource identifier
DetailsstringAdditional context (JSON or structured text)

Analytics rule example: detect bulk data export:

1
2
3
4
5
6
SaaSAudit_CL
| where TimeGenerated > ago(1h)
| where ActionType == "data_exported"
| summarize ExportCount = count(), ExportedResources = make_set(ResourceName, 20)
    by UserPrincipalName, IPAddress
| where ExportCount > 5

Monitoring custom ingestion health

Custom log connectors require their own health monitoring because they are not tracked by the standard SentinelHealth connector events.

DCR metrics. Navigate to Azure Monitor → Data Collection Rules → select your custom DCR → Metrics. Key metrics: Logs Ingestion Requests (total and failed), Logs Rows Received, and Logs Rows Dropped. Failed requests indicate authentication or schema issues. Dropped rows indicate transformation errors.

Custom health query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Monitor custom table ingestion health
let CustomTables = datatable(TableName:string) [
    "WebAppAuth_CL",
    "SaaSAudit_CL"
];
union withsource=TableName *
| where TimeGenerated > ago(4h)
| where TableName in (CustomTables)
| summarize LastEvent = max(TimeGenerated), EventCount = count() by TableName
| extend MinutesSinceLastEvent = datetime_diff('minute', now(), LastEvent)
| extend Status = iff(MinutesSinceLastEvent > 60, "⚠️ CHECK", "✓ OK")

Alerting on ingestion failure. Create a Sentinel analytics rule that fires when a custom table has zero events for more than 2 hours (during business hours). This catches script failures, authentication expiry, and API endpoint issues before they create investigation blind spots.


Data validation for custom tables

After the initial ingestion is working, validate the data quality.

Completeness check: Compare the event count in Sentinel with the event count at the source. If your application logged 1,000 events but only 800 arrived in Sentinel, 20% were lost — investigate the ingestion mechanism for errors.

Schema validation: Check for empty fields that should be populated:

1
2
3
4
5
6
7
8
9
WebAppAuth_CL
| where TimeGenerated > ago(24h)
| summarize
    EmptyIP = countif(isempty(IPAddress)),
    EmptyUser = countif(isempty(UserPrincipalName)),
    EmptyAction = countif(isempty(Action)),
    Total = count()
| extend IPCompleteness = round((1.0 - (EmptyIP * 1.0 / Total)) * 100, 1)
| extend UserCompleteness = round((1.0 - (EmptyUser * 1.0 / Total)) * 100, 1)

If any critical field has less than 95% completeness, investigate why events are arriving with missing data. The issue is typically in the source application’s logging or the ingestion script’s parsing logic.

Custom logs close the last visibility gap

Microsoft connectors cover identity, endpoint, email, and cloud. CEF/Syslog connectors cover network infrastructure and Linux. Custom logs cover everything else — bespoke applications, SaaS platforms, legacy systems, and vendor-specific security tools. With all three connector categories deployed, there are no dark spots in your environment that Sentinel cannot see.

Try it yourself

Create a custom table called TestAuth_CL with columns: TimeGenerated (datetime), Username (string), Action (string), SourceIP (string), Result (string). Create a DCE and DCR for this table. Use PowerShell, curl, or Postman to send a test JSON payload to the DCE. Query the table to verify the data arrived. This exercise validates the entire custom log pipeline — you can adapt it for any bespoke data source in your environment.

What you should observe

The test event appears in TestAuth_CL within 5-15 minutes. The columns match your schema definition. The TimeGenerated column is populated. This confirms: DCE is reachable, authentication works, DCR is routing correctly, and the custom table is receiving data. Any failure at any step generates specific error messages — check the DCR's metrics for ingestion errors.


Knowledge check

Check your understanding

1. Your organisation has an internal HR application with its own authentication system. You want to detect brute-force attacks against this application in Sentinel. How do you get the data in?

Use the Logs Ingestion API. Create a custom table (HRAppAuth_CL) with columns for TimeGenerated, Username, Action (login_success/login_failed), SourceIP, and FailureReason. Create a DCE and DCR. Build a scheduled script or Logic App that reads authentication events from the HR application and POSTs them as JSON to the DCE. Create an analytics rule that detects multiple failed logins from the same IP within a time window — the standard brute-force detection pattern, applied to custom data.
Install AMA on the HR application server
Use the Entra ID connector
Custom applications cannot be monitored in Sentinel