8.1 Ingestion Strategy and Connector Architecture

14-18 hours · Module 8

Ingestion Strategy and Connector Architecture

Introduction

Connecting every data source on day one is a mistake. You end up with 50 GB/day of firewall noise before you have a single analytics rule to use it. Ingestion strategy means connecting data sources in the order that produces the most detection value per pound spent — and the connector architecture determines how each source delivers its data to the workspace.

This subsection teaches you the decision framework for prioritising data sources, the four categories of connectors available in Sentinel, and the end-to-end architecture that moves data from source to queryable table.

The ingestion priority framework

Not all data sources contribute equally to detection and investigation. Prioritise by asking three questions for each potential data source.

Question 1: Does this data source enable detection of the threats most likely to target my environment? For M365-heavy environments, the answer is almost always: identity data (sign-in logs) and email data (Defender for Office 365) first, because credential phishing and BEC are the dominant threat vectors. Endpoint data (Defender for Endpoint) second, because post-compromise activity generates endpoint telemetry. Cloud infrastructure data (Azure Activity) third, because infrastructure attacks follow identity compromise.

Question 2: Does this data source fill a visibility gap that existing sources cannot cover? If you already have SigninLogs and DeviceProcessEvents, adding DeviceNetworkEvents fills the “what did the compromised device communicate with” gap. Adding Syslog from your firewall fills the “what crossed the network perimeter” gap. Adding custom application logs fills the “what happened inside the business application” gap. Each new source should close a specific visibility gap.

Question 3: What is the cost-to-value ratio? Some data sources generate high volume with low detection value. Verbose firewall accept logs (every permitted connection) may generate 20 GB/day but rarely contribute to incident investigation. The same firewall’s deny logs (blocked connections) generate 0.5 GB/day and are far more useful for detecting scanning and lateral movement attempts. Connect the high-value, lower-volume data first.

Recommended Ingestion Priority for M365 Environments

Priority	Data Source	Connector	Why First
1	Entra ID	Microsoft Entra ID	Every investigation starts or passes through sign-in logs
1	Defender XDR	Microsoft Defender XDR	All Defender product alerts and Advanced Hunting tables
2	Azure Activity	Azure Activity	Cloud infrastructure changes — ARM operations audit trail
2	Microsoft 365	Office 365	Exchange, SharePoint, Teams audit events
3	Windows Security Events	AMA + DCR	On-prem and hybrid: logon events, process creation, privilege use
3	Firewall / IDS	CEF / Syslog	Network perimeter visibility — blocked connections, IDS alerts
4	Linux servers	Syslog	Auth logs, sudo events, service activity
4	Custom applications	Logs Ingestion API	Business-specific data not covered by standard connectors
5	DNS logs	AMA + DCR	C2 domain detection, DNS tunnelling — high volume, specialist use

Priority 1 connectors should be enabled on day one. They provide the data needed for 80% of security investigations at minimal cost (Defender XDR data may be included in your licence through the XDR tier). Priority 2-3 connectors should be enabled within the first week. Priority 4-5 connectors are enabled as the analytics rule library grows to consume the data.

Visibility gap analysis: what can you NOT see?

Before connecting data sources, map your current visibility gaps. For each attack phase in the MITRE ATT&CK kill chain, ask: “If this attack happened right now, which phase would I detect it at? Which phases are invisible?”

Initial Access — detectable with: SigninLogs (credential compromise), EmailEvents (phishing delivery), CommonSecurityLog (firewall IDS alerts). Without email data, phishing delivery is invisible. Without sign-in data, credential compromise is invisible.

Execution — detectable with: DeviceProcessEvents (process creation), SecurityEvent Event ID 4688 (Windows process creation), Syslog (Linux command execution). Without endpoint data, post-compromise execution is invisible.

Persistence — detectable with: AuditLogs (role changes, app registrations), SecurityEvent Event IDs 4720/4728/4697 (account creation, group changes, service installation), CloudAppEvents (inbox rules), Syslog (cron jobs, SSH keys). Without audit data, persistence mechanisms are invisible.

Lateral Movement — detectable with: DeviceLogonEvents (device-to-device logons), SecurityEvent Event ID 4624 LogonType 3/10 (network/RDP logons), SigninLogs (cross-application sign-ins). Without logon event data, lateral movement is invisible.

Exfiltration — detectable with: CommonSecurityLog (large outbound transfers), DeviceNetworkEvents (unusual outbound connections), CloudAppEvents (file downloads, mail forwarding). Without network and application data, data theft is invisible.

The priority framework maps to this gap analysis. Priority 1 connectors (identity + XDR) provide Initial Access + Execution + Persistence visibility. Priority 2-3 connectors (Azure Activity + Windows/network) add Lateral Movement + Exfiltration visibility. Priority 4-5 connectors (Linux, custom, DNS) fill the remaining gaps.

Detection value per table

Not all ingested data contributes equally to detection. This table shows the detection density — how many analytics rules and hunting queries typically use each table.

Detection Value per Sentinel Table

Table	Typical Rule Count	Investigation Frequency	Detection Value
SigninLogs	15-25 rules	Daily	Critical
SecurityAlert	10-20 rules	Daily	Critical
DeviceProcessEvents	10-15 rules	Weekly	High
CloudAppEvents	8-12 rules	Weekly	High
AuditLogs	5-10 rules	Weekly	High
EmailEvents	5-8 rules	Weekly	High
CommonSecurityLog	3-8 rules	Monthly	Medium
AzureActivity	3-5 rules	Monthly	Medium
SecurityEvent	5-10 rules	Weekly	Medium-High
Syslog	2-5 rules	Monthly	Medium
DeviceNetworkEvents	3-5 rules	Weekly	Medium

Detection value drives ingestion priority. Tables with "Critical" and "High" detection value should be on Analytics tier with Priority 1-2 connectors. Tables with "Medium" value can tolerate Basic tier or lower priority deployment.

The four connector categories

Sentinel data connectors fall into four architectural categories. Understanding the category determines the deployment method, configuration complexity, and troubleshooting approach.

Category 1: Service-to-service (Microsoft first-party). Direct API integration between Microsoft services and the Sentinel workspace. Configuration: one-click enablement in the Sentinel portal (or via Content Hub solution). No agent deployment. No intermediate infrastructure. Examples: Entra ID connector, Azure Activity connector, Microsoft 365 connector, Defender for Cloud connector. These are the simplest connectors — enable and verify.

Category 2: Service-to-service (Defender XDR). The Defender XDR connector is a special case of service-to-service that deserves separate treatment. It ingests both incidents/alerts and Advanced Hunting raw data tables. It supports bi-directional incident sync with Sentinel. And it is tightly integrated with the unified security operations platform. Configuration is more complex than standard first-party connectors because you select which Advanced Hunting tables to ingest (affecting both data coverage and cost).

Category 3: Agent-based (Azure Monitor Agent). The Azure Monitor Agent (AMA) is deployed on the data source machine (Windows server, Linux server, or a log forwarder VM). AMA collects data according to Data Collection Rules (DCRs) and sends it to the workspace. This is the current architecture for Windows Security Events, Syslog, CEF, and custom performance counters. AMA replaces the legacy Log Analytics Agent (MMA/OMS), which is deprecated.

Category 4: API-based (Logs Ingestion API). For data sources that cannot run an agent and do not have a built-in connector, the Logs Ingestion API allows any application to send data to a custom table in the workspace. The application makes HTTPS POST requests to a Data Collection Endpoint (DCE), which routes the data through a DCR for transformation and into the workspace. This is used for custom application logs, SaaS platforms with webhook integrations, and bespoke data sources.

Figure 8.1: The four connector categories. All paths converge on the Log Analytics workspace where Sentinel processes the data. The choice of category is determined by the data source type — Microsoft services use service-to-service, on-premises devices use AMA, and custom applications use the Logs Ingestion API.

Content Hub solutions vs standalone connectors

Many connectors are deployed through Content Hub solutions (Module 7.10) rather than configured as standalone connectors. The Content Hub solution bundles the connector with analytics rules, workbooks, and hunting queries that use the connector’s data.

When to use Content Hub: For any data source that has a Content Hub solution, deploy through Content Hub. You get the connector plus the detection content in one installation. The Entra ID solution, for example, deploys the Entra ID connector plus 15+ analytics rule templates that query SigninLogs and AuditLogs.

When to use standalone configuration: For connectors that do not have Content Hub solutions (some legacy third-party connectors), or when you need custom configuration that the Content Hub solution does not support (specific DCR transformations, custom table routing).

The AMA migration: legacy agent deprecation

If you are joining an existing Sentinel deployment, you may encounter the legacy Log Analytics Agent (also called the Microsoft Monitoring Agent or MMA/OMS agent). This agent is deprecated — Microsoft ended support in August 2024. All new deployments should use the Azure Monitor Agent (AMA).

Key differences: AMA uses Data Collection Rules (DCRs) for configuration — you define what to collect, how to transform it, and where to send it. The legacy agent used workspace-level configuration — all agents collected the same data. AMA supports multi-homing (sending data to multiple workspaces) through multiple DCRs. AMA supports ingestion-time transformation (filtering and column removal before data reaches the workspace). AMA is managed through Azure Policy for automated deployment at scale.

If your environment still has legacy agents, plan a migration to AMA. The migration path: deploy AMA alongside the legacy agent (dual-agent period), configure DCRs to match the legacy collection, verify data parity, then remove the legacy agent. Microsoft provides a migration tool (AMA Migration Helper) that analyses your current legacy agent configuration and generates equivalent DCRs.

Connector selection decision matrix

For each potential data source, walk through this decision matrix.

Step 1: Does a Microsoft first-party connector exist? Check Content Hub and the Data connectors page. If yes → use it. First-party connectors are always the simplest and most reliable option.

Step 2: Does the Defender XDR connector cover this data? If the data comes from a Defender product (MDE, MDO, MDI, MDA) → it is ingested through the Defender XDR connector. Do not create a separate connector for the same data.

Step 3: Can the source run AMA? If the source is a Windows or Linux server (Azure VM, Arc-connected, or on-premises) → deploy AMA directly and configure a DCR. This avoids intermediate infrastructure.

Step 4: Does the source output CEF or Syslog? If yes → deploy a log forwarder and configure CEF/Syslog collection. CEF is preferred over plain Syslog because it provides structured fields.

Step 5: Does the source have an API or webhook? If yes → use the Logs Ingestion API. Build an Azure Function, Logic App, or scheduled script to pull or receive data and send to the DCE.

Step 6: None of the above apply. The data source may not be connectable to Sentinel. Evaluate: is the data available in a file export (CSV, JSON)? Can a script read the export and send to the API? Is a third-party integration platform (like Cribl or Fluentd) needed to translate the data format? In rare cases, the data source simply cannot be connected — document it as a visibility gap and mitigate with compensating controls.

Data sovereignty and connector routing

When connecting data sources in multi-region environments, consider where the data is generated and where it must be stored.

Data residency requirements. Some regulations require that data from specific regions stays in that region. If your UK sign-in data must stay in UK South and your EU sign-in data must stay in West Europe, you need: either separate Entra ID connectors routing to regional workspaces (requires multi-workspace architecture from Module 7.2), or a single connector with a workspace transformation that tags data by region for compliance reporting (simpler but does not address data residency at the storage level).

CEF/Syslog regional routing. Deploy regional log forwarders — one per major office or data centre region. Each forwarder connects to the workspace in its region. This keeps network device data geographically close to the workspace, reduces cross-region network traffic, and ensures data residency compliance for network logs.

Custom log regional considerations. When using the Logs Ingestion API, the Data Collection Endpoint (DCE) must be in the same region as the target workspace. If you have workspaces in multiple regions, create DCEs in each region and route custom data to the appropriate DCE based on the data’s origin.

Third-party connector evaluation criteria

Content Hub includes third-party connectors from security vendors. Not all are equal in quality. Evaluate before deploying.

Connector type. Is it a native API connector (best — direct integration with the vendor’s API), a Syslog/CEF connector (good — standard protocol), or a codeless connector (varies — may have limitations)?

Maintenance status. When was the connector last updated? Connectors not updated in 12+ months may not work with current Sentinel APIs or table schemas. Check the Content Hub solution’s version history.

Data quality. Does the connector populate all expected fields? Or does it dump raw data into a single column requiring manual parsing? Install in a test environment and verify field mapping before production deployment.

Vendor support. Does the vendor actively support the connector? If the connector breaks after a product update, who fixes it — Microsoft, the vendor, or you?

Cost impact. What volume does the connector generate? Some vendor connectors ingest very high volumes (endpoint telemetry from a non-Microsoft EDR) that significantly increase your Sentinel cost. Estimate the monthly volume and cost before enabling.

Try it yourself

Navigate to your Sentinel workspace → Data connectors. Review the list of available connectors. How many are currently connected? Which connector categories are represented? If you enabled connectors during Module 0 or Module 7, verify their status shows "Connected" with recent data. If no connectors are enabled, identify the two Priority 1 connectors (Entra ID and Defender XDR) — you will configure them in subsections 8.2 and 8.3.

What you should observe

The Data connectors page shows all available connectors with their status (Connected, Not connected, or Unavailable due to missing licences). In a lab with the M365 developer tenant, you should see Microsoft Entra ID and Microsoft Defender XDR as available. Azure Activity is available if you have an Azure subscription connected to the same tenant. The connector count and status tell you your current visibility level — any gap is a data source you cannot detect threats in.

Knowledge check

Check your understanding

1. You have a new Sentinel workspace with no connectors enabled. Which two connectors should you enable first for an M365 environment?

Microsoft Entra ID and Microsoft Defender XDR. Entra ID provides sign-in and audit logs — the foundation of every identity investigation. Defender XDR provides alerts, incidents, and Advanced Hunting tables from all Defender products (Endpoint, Office 365, Identity, Cloud Apps). Together, these two connectors provide the data needed for 80% of security investigations in an M365 environment.

Syslog and CEF — network data is most important

Windows Security Events and DNS logs

Custom application logs and threat intelligence

Priority 1: identity + XDR. These two connectors provide the broadest detection and investigation coverage for the lowest deployment effort. Network and host data (Priority 3) are important but secondary to the identity and email data that covers the most common attack vectors.

2. Your organisation still uses the legacy Log Analytics Agent (MMA) on 500 Windows servers. What should you do?

Plan a migration to Azure Monitor Agent (AMA). Deploy AMA alongside the legacy agent on a pilot group, configure equivalent Data Collection Rules, verify data parity, then roll out AMA to all 500 servers and remove the legacy agent. The legacy agent is deprecated (support ended August 2024) and does not support DCR-based configuration, ingestion-time transformation, or Azure Policy-based deployment.

Keep the legacy agent — it still works

Remove the legacy agent immediately and deploy AMA

Wait for Microsoft to auto-migrate

Phased migration with dual-agent overlap is the correct approach. Keeping the legacy agent indefinitely means no DCR support and no future updates. Removing it immediately without AMA in place creates a data gap. Microsoft does not auto-migrate — it is the customer's responsibility.

3. A SaaS application in your environment can send webhook notifications but has no Sentinel connector. How do you get this data into Sentinel?

Use the Logs Ingestion API (Category 4). Create a custom table in the workspace to receive the data. Configure a Data Collection Endpoint (DCE) and a Data Collection Rule (DCR) that defines the table schema and any transformations. Configure the SaaS application's webhook to POST events to the DCE endpoint. The DCR processes the data and routes it to the custom table where it is queryable with KQL and available for analytics rules.

Install AMA on the SaaS platform

Use the Syslog connector

This data cannot be ingested into Sentinel

The Logs Ingestion API is designed for exactly this scenario — data sources that cannot run an agent and have no built-in connector. Any application that can make HTTPS POST requests can send data to Sentinel through the API.

8.2 Microsoft First-Party Connectors →