8.11 Building the Complete Ingestion Pipeline

14-18 hours · Module 8

Building the Complete Ingestion Pipeline

Introduction

Subsections 8.1 through 8.10 taught individual connector types. This subsection puts them together into a phased deployment plan with validation at each stage. The goal is a production-ready ingestion pipeline where every relevant data source is connected, every connector is validated, and ongoing monitoring catches failures before they affect detection capability.

The phased deployment plan

Do not enable every connector simultaneously. Deploy in phases, validating each before proceeding. This ensures: each connector is confirmed working before adding the next, the cost baseline is established incrementally (no surprise 200 GB/day bill on day two), and any issues are isolated to the most recently deployed connector.

Phase 1: Core identity and XDR (Day 1)

Enable the Microsoft Entra ID connector (all log types). Enable the Microsoft Defender XDR connector with bi-directional incident sync and the recommended Advanced Hunting tables from subsection 8.3. Enable the Azure Activity connector through Azure Policy.

Validation: run the verification queries from subsection 8.2 for each connector. Confirm SigninLogs, AuditLogs, SecurityAlert, AzureActivity, and at least one Defender table (DeviceProcessEvents or EmailEvents) are populated. Establish the daily ingestion baseline.

Phase 2: Microsoft 365 and Defender for Cloud (Day 2-3)

Enable the Office 365 connector (Exchange, SharePoint, Teams). Enable the Defender for Cloud connector with bi-directional sync. Install the Content Hub solutions for each connected source. Review and enable the highest-priority analytics rule templates.

Validation: verify OfficeActivity and SecurityAlert (from Defender for Cloud) tables. Check for duplicate incidents (subsection 8.3 — disable Microsoft Security incident creation rules if sync is enabled). Compare OfficeActivity vs CloudAppEvents overlap (subsection 8.10) and decide whether both are needed.

Phase 3: Windows hosts (Week 1)

Deploy AMA to a pilot group of Windows servers (5-10 servers). Create the DCR for Windows Security Events at “Common” collection level. Validate data flow and event completeness.

Scale: deploy AMA to all Windows servers using Azure Policy. Monitor daily ingestion volume. If volume exceeds budget, consider switching to custom XPath collection.

Validation: SecurityEvent table populated from all servers. Heartbeat table shows all agents reporting. Compare event count per server with expectations.

Phase 4: Network and infrastructure (Week 2)

Deploy the log forwarder VM. Configure CEF devices (firewalls, IDS) to forward to the forwarder. Configure Syslog collection for Linux servers (direct AMA where possible, forwarder for appliances).

Validation: CommonSecurityLog populated with structured CEF data (DeviceVendor and DeviceProduct columns not empty). Syslog table populated with auth events from Linux hosts. Latency measurement within acceptable range.

Phase 5: Custom and bespoke sources (Week 3+)

Identify custom data sources not covered by standard connectors. Create custom tables, DCEs, and DCRs for each. Build or configure the ingestion mechanism (API scripts, Logic Apps, webhooks). Deploy incrementally — one custom source at a time.

Validation: custom tables populated with correctly structured data. Entity columns (IP, UPN, hostname) compatible with analytics rule entity mapping. Latency measurement acceptable.

The production-ready checklist

Before declaring the ingestion pipeline production-ready, verify every item.

Connector coverage: All Priority 1 and 2 data sources connected (subsection 8.1). Each connector status shows “Connected” with recent data. No expected data sources missing.

Data quality: Each table has data in the expected columns. Key investigation fields (IPAddress, UserPrincipalName, DeviceName) are populated and correctly formatted. CEF data has structured fields (not empty DeviceVendor). Syslog data parses correctly with KQL.

Ingestion volume: Daily volume per table is within expected range. No unexpected spikes or drops. Total daily ingestion aligns with cost projections.

Health monitoring: SentinelHealth is enabled and reporting for all connectors. The connector health dashboard from subsection 8.9 shows all connectors healthy. Alerting rules for ingestion drops and connector failures are active.

Analytics rule compatibility: All enabled analytics rules query tables that are populated. No rules reference tables that are empty or missing. Rule execution health shows no failures related to missing data.

Documentation: Each connector is documented: what it collects, which table it populates, the DCR configuration, and the validation queries. This documentation is essential for troubleshooting when a connector fails at 2am and the on-call analyst needs to diagnose it quickly.

Ongoing operational monitoring

The ingestion pipeline is not “deploy and forget.” Connectors fail silently, agents crash, devices change configuration, and DCRs get accidentally modified. Ongoing monitoring catches these issues.

Daily checks (shift-start health check — Module 7.11): Run the comprehensive connector health query. Verify all tables have fresh data. Check SentinelHealth for connector failure events.

Weekly checks: Review Usage table for ingestion volume trends. Compare this week’s volume to last week’s. Investigate any table with a >25% change — sudden increases may indicate a misconfigured connector or new data source; sudden decreases may indicate a partial failure.

Monthly checks: Review the cost optimisation report from subsection 8.10. Verify all DCR transformations are still appropriate (analytics rules may have changed, requiring data that was previously filtered). Review the connector documentation for accuracy.

1
2
3
4
5
6
7
// Weekly ingestion trend — detect gradual drift
Usage
| where TimeGenerated > ago(14d)
| where IsBillable == true
| summarize DailyGB = sum(Quantity) / 1024 by bin(TimeGenerated, 1d), DataType
| evaluate pivot(DataType, sum(DailyGB))
| order by TimeGenerated asc

This pivot table shows daily ingestion per table over two weeks. Gradual increases indicate growing data sources (normal) or connector misconfiguration (investigate). Gradual decreases indicate data source retirement (expected) or silent connector degradation (investigate).

Scaling the pipeline

As the organisation grows — more users, more devices, more applications, more cloud workloads — the ingestion pipeline must scale with it.

New data sources. When a new SaaS application is deployed, a new office is opened, or a new cloud workload is provisioned, evaluate it against the ingestion priority framework and connect if the detection value justifies the cost.

Increased volume from existing sources. Onboarding 500 new employees increases SigninLogs volume. Deploying AMA to 200 additional servers increases SecurityEvent volume. Monitor the Usage table for these step-changes and adjust commitment tiers if the new volume crosses a tier threshold.

New connector types. Microsoft regularly releases new connectors and updates existing ones. Check Content Hub monthly for new solutions that cover your data sources. A data source that required custom API ingestion six months ago may now have a native connector that is simpler and more reliable.

The connector documentation template

Every connector in your pipeline should be documented in an operations runbook. When a connector fails at 2am, the on-call analyst needs to diagnose it quickly — without hunting through Azure portal settings.

Per-connector documentation:

Connector name and type: e.g., “Microsoft Entra ID — service-to-service”

Tables populated: SigninLogs, AuditLogs, AADNonInteractiveUserSignInLogs

Configuration details: which log types enabled, any DCR transformations applied, the DCR name and ID, the workspace destination

Expected daily volume: 2.5 GB/day (interactive sign-ins: 1.5 GB, audit: 0.5 GB, non-interactive: 0.5 GB after DCR filtering)

Verification query: SigninLogs | where TimeGenerated > ago(1h) | count

Troubleshooting steps: check licence (P1/P2), check diagnostic setting conflict, check connector status page

Last validated: date of most recent validation check

Owner: who is responsible for this connector’s health

Per-forwarder documentation (for CEF/Syslog):

Forwarder VM name and IP: log-forwarder-01, 10.0.1.50

AMA version: latest (auto-updated)

DCR name: dcr-cef-firewalls-uksouth

Connected devices: PAN-FW-01 (Palo Alto PAN-OS, deny logs), FORT-IDS-01 (Fortinet FortiGate, alert logs)

HA configuration: active-passive behind Azure Load Balancer VIP 10.0.1.100

Capacity: rated for 25,000 EPS, current load ~8,000 EPS

Troubleshooting: check AMA service (systemctl status azuremonitoragent), check Heartbeat table, check rsyslog queue (journalctl -u rsyslog)

Handling connector changes and decommissions

Connectors are not permanent. Data sources change: a firewall is replaced, a SaaS application is decommissioned, a new office is opened.

Adding a connector: Follow the phased deployment plan. Validate before declaring operational. Update documentation.

Modifying a connector: Any change to a DCR (collection level, transformation, destination) should go through the change control process from Module 7.12. Test the change against historical data. Monitor volume after deployment.

Decommissioning a connector: When a data source is retired, disable the connector and document the date. Do not delete the connector configuration — keep it disabled. The historical data remains in the workspace for the retention period. If the data source is replaced (old firewall → new firewall), enable the new connector before disabling the old one to avoid a data gap during the transition.

Connector hygiene: Quarterly, review all connectors. Identify: connectors that are configured but not receiving data (the source may have been decommissioned without notifying the SOC), connectors receiving data that is no longer needed (a retired application still sending logs), and connectors with DCR transformations that no longer match the current analytics rule set.

The ingestion coverage matrix

Map your connected data sources against the visibility requirements. This matrix is the definitive record of what Sentinel can and cannot see.

Ingestion Coverage Matrix — Example

Data Domain	Source	Connector	Table	Status
Identity	Entra ID	Microsoft Entra ID	SigninLogs, AuditLogs	✓ Connected
Identity	On-prem AD	Defender for Identity (XDR)	IdentityLogonEvents	✓ Connected
Endpoint	Windows endpoints	Defender XDR	DeviceProcessEvents, etc	✓ Connected
Endpoint	Windows servers	AMA + DCR	SecurityEvent	✓ Connected
Endpoint	Linux servers	AMA (Syslog)	Syslog	✓ Connected
Email	Exchange Online	Defender XDR	EmailEvents, UrlClickEvents	✓ Connected
Cloud	Azure subscriptions	Azure Activity	AzureActivity	✓ Connected
Cloud	Azure workloads	Defender for Cloud	SecurityAlert	✓ Connected
Network	Palo Alto firewalls	CEF via forwarder	CommonSecurityLog	✓ Connected
Network	Fortinet IDS	CEF via forwarder	CommonSecurityLog	✓ Connected
Application	HR system auth	Logs Ingestion API	HRAppAuth_CL	⬜ Planned Q2
Application	CRM audit	Logs Ingestion API	CRMAudit_CL	⬜ Planned Q3

Every row is a visibility decision. Connected sources are visible to Sentinel detection and investigation. Planned sources are acknowledged gaps with a deployment timeline. Missing rows are blind spots that need evaluation. The coverage matrix should be reviewed monthly and updated as the environment changes.

Pipeline maturity model

Rate your ingestion pipeline maturity to identify improvement areas.

Level 1 — Basic. Microsoft first-party connectors enabled (Entra ID, Defender XDR, Azure Activity). Incidents and alerts flowing. No third-party data. No DCR transformations. No health monitoring.

Level 2 — Operational. All Microsoft connectors plus Windows Security Events via AMA. Health monitoring enabled. Basic cost tracking. Manual connector documentation.

Level 3 — Optimised. All Level 2 plus CEF/Syslog from network infrastructure. DCR transformations for cost optimisation. Commitment tier right-sized. Automated health alerting. Connector documentation in operations runbook.

Level 4 — Advanced. All Level 3 plus custom log ingestion for bespoke applications. ASIM parsers deployed for cross-vendor normalisation. Pipeline as code (DCRs in Git, deployed via CI/CD). Coverage matrix maintained and reviewed monthly. Cost-per-incident reporting to management.

Level 5 — Mature. All Level 4 plus: multi-workspace deployment with centralised governance. Automated connector deployment for new data sources. Proactive capacity planning based on growth projections. Continuous optimisation with monthly review cadence. Full documentation with per-connector troubleshooting runbooks.

Most organisations deploying Sentinel for the first time should target Level 2 within the first month, Level 3 within 3 months, and Level 4 within 6 months. Level 5 is for mature SOC teams with dedicated Sentinel engineering capacity.

Operational KPIs for the ingestion pipeline

Track these metrics monthly to measure pipeline health and effectiveness.

Connector uptime. Percentage of time each connector is delivering data. Target: 99.5%+ for critical connectors (Entra ID, Defender XDR). Calculate from SentinelHealth events: (total hours - hours with failure events) / total hours * 100.

Mean time to detect connector failure (MTTD-C). Average time between a connector stopping data delivery and the SOC noticing. Target: under 2 hours for critical connectors. Measure from the last event timestamp vs the first health alert or shift-start check.

Data completeness. Percentage of expected events that arrive in Sentinel. Target: 99%+ for service-to-service connectors, 95%+ for agent-based connectors. Compare source event counts with Sentinel event counts periodically.

Ingestion latency P95. 95th percentile latency from event generation to Sentinel queryability. Target: under 10 minutes for critical tables. Measure with the latency query from subsection 8.9.

Cost efficiency. Monthly Sentinel cost divided by total GB ingested. Track month-over-month to verify optimisation efforts are working. Cost per GB should decrease over time as optimisations are applied.

The ingestion pipeline is the foundation of everything Sentinel does

Without data, Sentinel is an empty database. Every analytics rule, every hunting query, every workbook, every automation playbook depends on the connectors delivering the right data to the right tables at the right time. The time invested in deploying, validating, and monitoring the ingestion pipeline is the highest-ROI investment in your Sentinel deployment — it determines the ceiling of everything that follows in Modules 9 and 10.

Try it yourself

Build your lab ingestion pipeline following the phased plan above. Start with Phase 1 (Entra ID + Defender XDR + Azure Activity). Validate using the verification queries. If time permits, proceed to Phase 2 (Office 365 + Defender for Cloud). For each connector, document: the connector name, the tables it populates, the configuration choices you made (which log types, which collection level), and the validation query results. This documentation is the operations runbook for your Sentinel deployment.

What you should observe

After Phase 1, you should have 5+ tables with active data flow: SigninLogs, AuditLogs, SecurityAlert, AzureActivity, and at least one Defender XDR Advanced Hunting table. The comprehensive connector health query should show all tables as "Healthy." The daily ingestion volume in a lab should be 1-3 GB/day — well within the free tier. This is a functioning Sentinel deployment ready for analytics rules (Module 9) and hunting (Module 10).

Knowledge check

Check your understanding

1. You have deployed all connectors and the ingestion pipeline is running. Three weeks later, an analyst reports that a hunting query against CommonSecurityLog returns no results for the last 5 days, even though the firewall is actively logging. The connector status shows "Connected." What happened?

The log forwarder VM is down or AMA on the forwarder has stopped. The connector status in Sentinel shows "Connected" because the connector configuration is valid — it does not reflect whether data is actually flowing. Check the Heartbeat table for the forwarder VM: Heartbeat | where Computer == "log-forwarder" | summarize max(TimeGenerated). If no heartbeat in 5 days, the VM is down or AMA crashed. Also check the daily ingestion query from the shift-start health check — it should have flagged the CommonSecurityLog drop within hours, not days. If it did not, the health monitoring alerting is insufficient.

The firewall stopped generating logs

CommonSecurityLog was moved to Archive tier

The data was filtered by a DCR transformation

The log forwarder is the most likely point of failure for CEF data. The 5-day gap indicates both a forwarder failure AND a health monitoring gap — the shift-start check should catch this within hours. This scenario demonstrates why ongoing monitoring is essential: "Connected" status is not the same as "data flowing."

← 8.10 Ingestion Cost Optimisation at the Connector Level 8.12 Module Summary →