8.1 Ingestion Strategy and Connector Architecture
Ingestion Strategy and Connector Architecture
Introduction
Connecting every data source on day one is a mistake. You end up with 50 GB/day of firewall noise before you have a single analytics rule to use it. Ingestion strategy means connecting data sources in the order that produces the most detection value per pound spent — and the connector architecture determines how each source delivers its data to the workspace.
This subsection teaches you the decision framework for prioritising data sources, the four categories of connectors available in Sentinel, and the end-to-end architecture that moves data from source to queryable table.
The ingestion priority framework
Not all data sources contribute equally to detection and investigation. Prioritise by asking three questions for each potential data source.
Question 1: Does this data source enable detection of the threats most likely to target my environment? For M365-heavy environments, the answer is almost always: identity data (sign-in logs) and email data (Defender for Office 365) first, because credential phishing and BEC are the dominant threat vectors. Endpoint data (Defender for Endpoint) second, because post-compromise activity generates endpoint telemetry. Cloud infrastructure data (Azure Activity) third, because infrastructure attacks follow identity compromise.
Question 2: Does this data source fill a visibility gap that existing sources cannot cover? If you already have SigninLogs and DeviceProcessEvents, adding DeviceNetworkEvents fills the “what did the compromised device communicate with” gap. Adding Syslog from your firewall fills the “what crossed the network perimeter” gap. Adding custom application logs fills the “what happened inside the business application” gap. Each new source should close a specific visibility gap.
Question 3: What is the cost-to-value ratio? Some data sources generate high volume with low detection value. Verbose firewall accept logs (every permitted connection) may generate 20 GB/day but rarely contribute to incident investigation. The same firewall’s deny logs (blocked connections) generate 0.5 GB/day and are far more useful for detecting scanning and lateral movement attempts. Connect the high-value, lower-volume data first.
| Priority | Data Source | Connector | Why First |
|---|---|---|---|
| 1 | Entra ID | Microsoft Entra ID | Every investigation starts or passes through sign-in logs |
| 1 | Defender XDR | Microsoft Defender XDR | All Defender product alerts and Advanced Hunting tables |
| 2 | Azure Activity | Azure Activity | Cloud infrastructure changes — ARM operations audit trail |
| 2 | Microsoft 365 | Office 365 | Exchange, SharePoint, Teams audit events |
| 3 | Windows Security Events | AMA + DCR | On-prem and hybrid: logon events, process creation, privilege use |
| 3 | Firewall / IDS | CEF / Syslog | Network perimeter visibility — blocked connections, IDS alerts |
| 4 | Linux servers | Syslog | Auth logs, sudo events, service activity |
| 4 | Custom applications | Logs Ingestion API | Business-specific data not covered by standard connectors |
| 5 | DNS logs | AMA + DCR | C2 domain detection, DNS tunnelling — high volume, specialist use |
Visibility gap analysis: what can you NOT see?
Before connecting data sources, map your current visibility gaps. For each attack phase in the MITRE ATT&CK kill chain, ask: “If this attack happened right now, which phase would I detect it at? Which phases are invisible?”
Initial Access — detectable with: SigninLogs (credential compromise), EmailEvents (phishing delivery), CommonSecurityLog (firewall IDS alerts). Without email data, phishing delivery is invisible. Without sign-in data, credential compromise is invisible.
Execution — detectable with: DeviceProcessEvents (process creation), SecurityEvent Event ID 4688 (Windows process creation), Syslog (Linux command execution). Without endpoint data, post-compromise execution is invisible.
Persistence — detectable with: AuditLogs (role changes, app registrations), SecurityEvent Event IDs 4720/4728/4697 (account creation, group changes, service installation), CloudAppEvents (inbox rules), Syslog (cron jobs, SSH keys). Without audit data, persistence mechanisms are invisible.
Lateral Movement — detectable with: DeviceLogonEvents (device-to-device logons), SecurityEvent Event ID 4624 LogonType 3/10 (network/RDP logons), SigninLogs (cross-application sign-ins). Without logon event data, lateral movement is invisible.
Exfiltration — detectable with: CommonSecurityLog (large outbound transfers), DeviceNetworkEvents (unusual outbound connections), CloudAppEvents (file downloads, mail forwarding). Without network and application data, data theft is invisible.
The priority framework maps to this gap analysis. Priority 1 connectors (identity + XDR) provide Initial Access + Execution + Persistence visibility. Priority 2-3 connectors (Azure Activity + Windows/network) add Lateral Movement + Exfiltration visibility. Priority 4-5 connectors (Linux, custom, DNS) fill the remaining gaps.
Detection value per table
Not all ingested data contributes equally to detection. This table shows the detection density — how many analytics rules and hunting queries typically use each table.
| Table | Typical Rule Count | Investigation Frequency | Detection Value |
|---|---|---|---|
| SigninLogs | 15-25 rules | Daily | Critical |
| SecurityAlert | 10-20 rules | Daily | Critical |
| DeviceProcessEvents | 10-15 rules | Weekly | High |
| CloudAppEvents | 8-12 rules | Weekly | High |
| AuditLogs | 5-10 rules | Weekly | High |
| EmailEvents | 5-8 rules | Weekly | High |
| CommonSecurityLog | 3-8 rules | Monthly | Medium |
| AzureActivity | 3-5 rules | Monthly | Medium |
| SecurityEvent | 5-10 rules | Weekly | Medium-High |
| Syslog | 2-5 rules | Monthly | Medium |
| DeviceNetworkEvents | 3-5 rules | Weekly | Medium |
The four connector categories
Sentinel data connectors fall into four architectural categories. Understanding the category determines the deployment method, configuration complexity, and troubleshooting approach.
Category 1: Service-to-service (Microsoft first-party). Direct API integration between Microsoft services and the Sentinel workspace. Configuration: one-click enablement in the Sentinel portal (or via Content Hub solution). No agent deployment. No intermediate infrastructure. Examples: Entra ID connector, Azure Activity connector, Microsoft 365 connector, Defender for Cloud connector. These are the simplest connectors — enable and verify.
Category 2: Service-to-service (Defender XDR). The Defender XDR connector is a special case of service-to-service that deserves separate treatment. It ingests both incidents/alerts and Advanced Hunting raw data tables. It supports bi-directional incident sync with Sentinel. And it is tightly integrated with the unified security operations platform. Configuration is more complex than standard first-party connectors because you select which Advanced Hunting tables to ingest (affecting both data coverage and cost).
Category 3: Agent-based (Azure Monitor Agent). The Azure Monitor Agent (AMA) is deployed on the data source machine (Windows server, Linux server, or a log forwarder VM). AMA collects data according to Data Collection Rules (DCRs) and sends it to the workspace. This is the current architecture for Windows Security Events, Syslog, CEF, and custom performance counters. AMA replaces the legacy Log Analytics Agent (MMA/OMS), which is deprecated.
Category 4: API-based (Logs Ingestion API). For data sources that cannot run an agent and do not have a built-in connector, the Logs Ingestion API allows any application to send data to a custom table in the workspace. The application makes HTTPS POST requests to a Data Collection Endpoint (DCE), which routes the data through a DCR for transformation and into the workspace. This is used for custom application logs, SaaS platforms with webhook integrations, and bespoke data sources.
Content Hub solutions vs standalone connectors
Many connectors are deployed through Content Hub solutions (Module 7.10) rather than configured as standalone connectors. The Content Hub solution bundles the connector with analytics rules, workbooks, and hunting queries that use the connector’s data.
When to use Content Hub: For any data source that has a Content Hub solution, deploy through Content Hub. You get the connector plus the detection content in one installation. The Entra ID solution, for example, deploys the Entra ID connector plus 15+ analytics rule templates that query SigninLogs and AuditLogs.
When to use standalone configuration: For connectors that do not have Content Hub solutions (some legacy third-party connectors), or when you need custom configuration that the Content Hub solution does not support (specific DCR transformations, custom table routing).
The AMA migration: legacy agent deprecation
If you are joining an existing Sentinel deployment, you may encounter the legacy Log Analytics Agent (also called the Microsoft Monitoring Agent or MMA/OMS agent). This agent is deprecated — Microsoft ended support in August 2024. All new deployments should use the Azure Monitor Agent (AMA).
Key differences: AMA uses Data Collection Rules (DCRs) for configuration — you define what to collect, how to transform it, and where to send it. The legacy agent used workspace-level configuration — all agents collected the same data. AMA supports multi-homing (sending data to multiple workspaces) through multiple DCRs. AMA supports ingestion-time transformation (filtering and column removal before data reaches the workspace). AMA is managed through Azure Policy for automated deployment at scale.
If your environment still has legacy agents, plan a migration to AMA. The migration path: deploy AMA alongside the legacy agent (dual-agent period), configure DCRs to match the legacy collection, verify data parity, then remove the legacy agent. Microsoft provides a migration tool (AMA Migration Helper) that analyses your current legacy agent configuration and generates equivalent DCRs.
Connector selection decision matrix
For each potential data source, walk through this decision matrix.
Step 1: Does a Microsoft first-party connector exist? Check Content Hub and the Data connectors page. If yes → use it. First-party connectors are always the simplest and most reliable option.
Step 2: Does the Defender XDR connector cover this data? If the data comes from a Defender product (MDE, MDO, MDI, MDA) → it is ingested through the Defender XDR connector. Do not create a separate connector for the same data.
Step 3: Can the source run AMA? If the source is a Windows or Linux server (Azure VM, Arc-connected, or on-premises) → deploy AMA directly and configure a DCR. This avoids intermediate infrastructure.
Step 4: Does the source output CEF or Syslog? If yes → deploy a log forwarder and configure CEF/Syslog collection. CEF is preferred over plain Syslog because it provides structured fields.
Step 5: Does the source have an API or webhook? If yes → use the Logs Ingestion API. Build an Azure Function, Logic App, or scheduled script to pull or receive data and send to the DCE.
Step 6: None of the above apply. The data source may not be connectable to Sentinel. Evaluate: is the data available in a file export (CSV, JSON)? Can a script read the export and send to the API? Is a third-party integration platform (like Cribl or Fluentd) needed to translate the data format? In rare cases, the data source simply cannot be connected — document it as a visibility gap and mitigate with compensating controls.
Data sovereignty and connector routing
When connecting data sources in multi-region environments, consider where the data is generated and where it must be stored.
Data residency requirements. Some regulations require that data from specific regions stays in that region. If your UK sign-in data must stay in UK South and your EU sign-in data must stay in West Europe, you need: either separate Entra ID connectors routing to regional workspaces (requires multi-workspace architecture from Module 7.2), or a single connector with a workspace transformation that tags data by region for compliance reporting (simpler but does not address data residency at the storage level).
CEF/Syslog regional routing. Deploy regional log forwarders — one per major office or data centre region. Each forwarder connects to the workspace in its region. This keeps network device data geographically close to the workspace, reduces cross-region network traffic, and ensures data residency compliance for network logs.
Custom log regional considerations. When using the Logs Ingestion API, the Data Collection Endpoint (DCE) must be in the same region as the target workspace. If you have workspaces in multiple regions, create DCEs in each region and route custom data to the appropriate DCE based on the data’s origin.
Third-party connector evaluation criteria
Content Hub includes third-party connectors from security vendors. Not all are equal in quality. Evaluate before deploying.
Connector type. Is it a native API connector (best — direct integration with the vendor’s API), a Syslog/CEF connector (good — standard protocol), or a codeless connector (varies — may have limitations)?
Maintenance status. When was the connector last updated? Connectors not updated in 12+ months may not work with current Sentinel APIs or table schemas. Check the Content Hub solution’s version history.
Data quality. Does the connector populate all expected fields? Or does it dump raw data into a single column requiring manual parsing? Install in a test environment and verify field mapping before production deployment.
Vendor support. Does the vendor actively support the connector? If the connector breaks after a product update, who fixes it — Microsoft, the vendor, or you?
Cost impact. What volume does the connector generate? Some vendor connectors ingest very high volumes (endpoint telemetry from a non-Microsoft EDR) that significantly increase your Sentinel cost. Estimate the monthly volume and cost before enabling.
Try it yourself
Navigate to your Sentinel workspace → Data connectors. Review the list of available connectors. How many are currently connected? Which connector categories are represented? If you enabled connectors during Module 0 or Module 7, verify their status shows "Connected" with recent data. If no connectors are enabled, identify the two Priority 1 connectors (Entra ID and Defender XDR) — you will configure them in subsections 8.2 and 8.3.
What you should observe
The Data connectors page shows all available connectors with their status (Connected, Not connected, or Unavailable due to missing licences). In a lab with the M365 developer tenant, you should see Microsoft Entra ID and Microsoft Defender XDR as available. Azure Activity is available if you have an Azure subscription connected to the same tenant. The connector count and status tell you your current visibility level — any gap is a data source you cannot detect threats in.
Knowledge check
Check your understanding
1. You have a new Sentinel workspace with no connectors enabled. Which two connectors should you enable first for an M365 environment?
2. Your organisation still uses the legacy Log Analytics Agent (MMA) on 500 Windows servers. What should you do?
3. A SaaS application in your environment can send webhook notifications but has no Sentinel connector. How do you get this data into Sentinel?