8.8 Custom Logs and API Ingestion
Custom Logs and API Ingestion
Introduction
Standard connectors cover Microsoft services, CEF devices, and Syslog sources. But every organisation has data sources that do not fit these categories: custom web applications with their own authentication logs, SaaS platforms with webhook-based event export, legacy systems with proprietary log formats, and security tools from vendors without Sentinel connectors. The Logs Ingestion API brings all of this data into Sentinel.
Logs Ingestion API architecture
The Logs Ingestion API is an HTTPS endpoint that accepts JSON payloads and routes them through a Data Collection Rule to a table in the workspace.
The pipeline: Your application → HTTPS POST to Data Collection Endpoint (DCE) → Data Collection Rule (DCR) processes and transforms → data lands in a custom table (or standard table) in the workspace.
Components:
Data Collection Endpoint (DCE) — the HTTPS endpoint that receives the data. Created in Azure Monitor. Each DCE has a unique URL: https://<dce-name>.<region>.ingest.monitor.azure.com.
Data Collection Rule (DCR) — defines the schema (what fields to expect), transformations (how to process the data), and destination (which table to write to). The DCR ID is referenced in the API call.
Custom table — a table in the workspace with a schema you define. Custom table names must end with _CL (custom log suffix). Example: ApplicationAuth_CL, SaaSAudit_CL, LegacyFirewall_CL.
Authentication — the API call authenticates with an Entra ID service principal or managed identity. The identity must have the “Monitoring Metrics Publisher” role on the DCR.
Creating a custom log ingestion pipeline
Step 1: Create the custom table. Navigate to Log Analytics workspace → Tables → Create → New custom log (DCR-based). Define the table name (e.g., WebAppAuth_CL) and the schema (columns and data types).
Step 2: Create the DCE. Azure portal → Monitor → Data Collection Endpoints → Create. Select the region matching your workspace.
Step 3: Create the DCR. Azure portal → Monitor → Data Collection Rules → Create. Select “Custom” as the data source type. Define the incoming data schema (the JSON fields your application sends), the transformation KQL (optional), and the destination (your custom table in the workspace).
Step 4: Configure authentication. Create a service principal or managed identity. Assign the “Monitoring Metrics Publisher” role on the DCR. Your application uses this identity to authenticate API calls.
Step 5: Send data. Your application makes HTTPS POST requests to the DCE:
| |
Step 6: Verify. Query the custom table:
| |
Common custom log use cases
Custom web application authentication. Your internal application has its own authentication system (not Entra ID). Authentication events (login success, login failure, password reset, account lockout) are written to the application’s log file or database. A scheduled script or Logic App reads these events and sends them to Sentinel via the API. Analytics rules then detect brute-force attacks, credential stuffing, and compromised accounts in the custom application.
SaaS platform audit events. A SaaS platform (e.g., Salesforce, HubSpot, custom CRM) provides webhook notifications for security events (user creation, permission changes, data exports). Configure the webhook to send events to an Azure Function, which formats them as JSON and POSTs to the DCE. Analytics rules detect suspicious activity (bulk data export, admin role assignment to a new user).
Legacy system integration. An older system (mainframe, legacy database, proprietary application) writes logs to a flat file. A scheduled script on the server reads new log entries, parses them into JSON, and sends to the DCE. This brings legacy system visibility into the same Sentinel workspace as your modern infrastructure.
Threat intelligence enrichment. External threat intelligence sources that provide indicators via API (not STIX/TAXII) can push indicators to a custom table. Analytics rules join this table with SigninLogs and DeviceNetworkEvents for TI-matching (similar to ThreatIntelligenceIndicator but for custom feeds).
Implementation patterns: Logic App vs Azure Function vs Script
Three common approaches to sending data to the Logs Ingestion API. Choose based on your environment and skillset.
Logic App (low-code, event-driven). Create a Logic App triggered by a schedule (every 5 minutes), an HTTP webhook (when the SaaS platform sends an event), or a queue message (when another system deposits events). The Logic App formats the data as JSON and calls the DCE endpoint. Logic Apps handle authentication, retries, and error handling natively. Best for: SaaS webhook integrations, scheduled data pulls from REST APIs, and non-developer teams.
Azure Function (code-based, event-driven). Write a function in Python, C#, or JavaScript that receives events (via HTTP trigger, timer trigger, or queue trigger), formats them, authenticates to the DCE, and POSTs the data. Azure Functions provide more control over parsing, transformation, and error handling than Logic Apps. Best for: complex data sources with custom parsing requirements, high-volume ingestion, and development teams comfortable with code.
Scheduled script (on-premises). A PowerShell or Python script running on a scheduled task (Windows) or cron job (Linux). The script reads log files or database records, formats as JSON, authenticates to the DCE via service principal, and POSTs the data. Best for: legacy systems, on-premises data sources that cannot reach Azure services directly, and environments without Azure Function/Logic App infrastructure.
Example: PowerShell ingestion script pattern:
| |
Error handling and rate limiting
The Logs Ingestion API has rate limits and returns specific error codes. Your ingestion mechanism must handle these.
HTTP 429 (Too Many Requests). The API rate limit has been exceeded. Back off and retry after the interval specified in the Retry-After header. Common when sending large batches in rapid succession. Mitigate by: batching events (send 100 events per request instead of 1), spacing requests (add a 1-second delay between batches), and increasing the batch interval (collect events for 5 minutes, then send in one batch).
HTTP 400 (Bad Request). The JSON payload does not match the DCR schema. Common causes: missing required fields (TimeGenerated), wrong data types (sending a string where the schema expects a number), extra fields not in the schema, and malformed JSON. Fix: validate the payload against the DCR schema before sending.
HTTP 403 (Forbidden). Authentication failed. The service principal token has expired, the managed identity does not have the “Monitoring Metrics Publisher” role on the DCR, or the DCE is in a different tenant. Fix: refresh the token, verify role assignments.
HTTP 413 (Payload Too Large). The request body exceeds the maximum size (1 MB per request). Fix: split large batches into smaller requests.
Retry strategy: Implement exponential backoff with jitter for transient errors (429, 500, 503). Do not retry 400 errors (fix the payload). Do not retry 403 errors (fix authentication). Log all errors for troubleshooting.
Custom table schema design
Design the table schema based on how you will query the data. Include: TimeGenerated (required — every table must have this), entity columns (UserPrincipalName, IPAddress, DeviceName — for entity mapping in analytics rules), action columns (what happened), and context columns (why it happened, which resource was accessed).
Entity mapping is critical. If your custom table includes an IP address column, analytics rules can map it as an IP entity — enabling automatic correlation with other tables that contain the same IP. Without entity-compatible column names, the analytics rule engine cannot correlate custom data with standard data.
Recommended column naming: Use the same column names as standard Sentinel tables where the data type matches. If your custom table has an IP address, name the column IPAddress (matching SigninLogs). If it has a user identifier, name it UserPrincipalName (matching SigninLogs) or AccountName (matching DeviceLogonEvents). This enables cross-table joins without column renaming.
Schema design examples
Example 1: Custom web application authentication table.
Table name: WebAppAuth_CL
| Column | Type | Purpose |
|---|---|---|
| TimeGenerated | datetime | Required — when the event occurred |
| UserPrincipalName | string | Entity-compatible — the user who authenticated |
| IPAddress | string | Entity-compatible — the source IP |
| Action | string | What happened: login_success, login_failed, password_reset, account_locked |
| Application | string | Which application was accessed |
| FailureReason | string | Why the authentication failed (if applicable) |
| UserAgent | string | Browser or client information |
| SessionId | string | Correlation identifier for the session |
Analytics rule example: detect brute-force against the custom app:
| |
Example 2: SaaS audit events table.
Table name: SaaSAudit_CL
| Column | Type | Purpose |
|---|---|---|
| TimeGenerated | datetime | Required |
| UserPrincipalName | string | Entity-compatible — who performed the action |
| IPAddress | string | Entity-compatible — from where |
| ActionType | string | What was done: user_created, permission_changed, data_exported, config_modified |
| ResourceType | string | What was affected: user, role, document, setting |
| ResourceName | string | Specific resource identifier |
| Details | string | Additional context (JSON or structured text) |
Analytics rule example: detect bulk data export:
| |
Monitoring custom ingestion health
Custom log connectors require their own health monitoring because they are not tracked by the standard SentinelHealth connector events.
DCR metrics. Navigate to Azure Monitor → Data Collection Rules → select your custom DCR → Metrics. Key metrics: Logs Ingestion Requests (total and failed), Logs Rows Received, and Logs Rows Dropped. Failed requests indicate authentication or schema issues. Dropped rows indicate transformation errors.
Custom health query:
| |
Alerting on ingestion failure. Create a Sentinel analytics rule that fires when a custom table has zero events for more than 2 hours (during business hours). This catches script failures, authentication expiry, and API endpoint issues before they create investigation blind spots.
Data validation for custom tables
After the initial ingestion is working, validate the data quality.
Completeness check: Compare the event count in Sentinel with the event count at the source. If your application logged 1,000 events but only 800 arrived in Sentinel, 20% were lost — investigate the ingestion mechanism for errors.
Schema validation: Check for empty fields that should be populated:
| |
If any critical field has less than 95% completeness, investigate why events are arriving with missing data. The issue is typically in the source application’s logging or the ingestion script’s parsing logic.
Microsoft connectors cover identity, endpoint, email, and cloud. CEF/Syslog connectors cover network infrastructure and Linux. Custom logs cover everything else — bespoke applications, SaaS platforms, legacy systems, and vendor-specific security tools. With all three connector categories deployed, there are no dark spots in your environment that Sentinel cannot see.
Try it yourself
Create a custom table called TestAuth_CL with columns: TimeGenerated (datetime), Username (string), Action (string), SourceIP (string), Result (string). Create a DCE and DCR for this table. Use PowerShell, curl, or Postman to send a test JSON payload to the DCE. Query the table to verify the data arrived. This exercise validates the entire custom log pipeline — you can adapt it for any bespoke data source in your environment.
What you should observe
The test event appears in TestAuth_CL within 5-15 minutes. The columns match your schema definition. The TimeGenerated column is populated. This confirms: DCE is reachable, authentication works, DCR is routing correctly, and the custom table is receiving data. Any failure at any step generates specific error messages — check the DCR's metrics for ingestion errors.
Knowledge check
Check your understanding
1. Your organisation has an internal HR application with its own authentication system. You want to detect brute-force attacks against this application in Sentinel. How do you get the data in?