8.10 Ingestion Cost Optimisation at the Connector Level
Ingestion Cost Optimisation at the Connector Level
Introduction
Module 7.5 covered cost management at the workspace level — commitment tiers, log tier assignment, and retention policies. This subsection covers cost optimisation at the connector level — reducing the volume of data before it reaches the workspace. Connector-level optimisation is more surgical: instead of moving an entire table to Basic tier, you filter specific events within a table that have low investigation value.
The two layers work together: connector-level optimisation reduces volume (less data enters the workspace), workspace-level optimisation reduces cost per GB (how much you pay for the data that does enter).
The cost reduction hierarchy
Apply cost reduction in this order for maximum impact with minimum security risk.
Layer 1: Do not connect low-value sources. The cheapest data is data you do not ingest. Before connecting a data source, apply the ingestion priority framework from subsection 8.1: does this source enable detection of likely threats? Does it fill a visibility gap? Is the cost-to-value ratio acceptable? If the answer to any question is no, do not connect it.
Layer 2: Select the right collection level. For Windows Security Events, choose “Common” instead of “All Events” (50-70% reduction). For Syslog, collect only security-relevant facilities at appropriate severity levels (subsection 8.6). For Defender XDR, skip high-volume, low-value tables like DeviceNetworkInfo (subsection 8.3).
Layer 3: Apply DCR transformations. Filter individual events and remove unnecessary columns at ingestion time (subsection 8.7). This is the most precise tool — target specific high-volume, low-value event patterns.
Layer 4: Use the XDR tier. Verify which Defender XDR tables qualify for the XDR tier (no additional Sentinel ingestion cost). Structure your ingestion to maximise XDR-tier-eligible data.
Layer 5: Workspace-level optimisation. Commitment tiers for lower per-GB rates. Basic tier for tables that do not need full KQL or analytics rules. These are the Module 7.5 techniques — applied after connector-level optimisation has reduced the volume entering the workspace.
Connector-specific optimisation techniques
Entra ID — filter non-interactive sign-ins. AADNonInteractiveUserSignInLogs can generate 5-10x the volume of interactive sign-ins. Apply a DCR transformation to filter routine token refresh events from known service principals:
| |
This keeps the security-relevant events (failures, external sources) while filtering the high-volume successful token refreshes from internal service principals.
Windows Security Events — custom XPath instead of collection levels. If “Common” is still too verbose, switch to custom XPath queries that collect only the specific Event IDs your analytics rules reference. Audit your active rules to build the Event ID list:
| |
Event IDs with RuleHits > 0 are queried by your analytics rules — these must be collected. Event IDs with RuleHits = 0 and high DailyAvg are candidates for exclusion.
Syslog/CEF — filter at the source. Configure the Syslog device itself to send only security-relevant messages. A firewall can be configured to send deny logs but not accept logs. An IDS can send alert logs but not informational logs. Filtering at the source reduces network traffic to the forwarder and ingestion volume simultaneously.
Office 365 — evaluate overlap with CloudAppEvents. If you ingest both OfficeActivity and CloudAppEvents, you may be storing the same Exchange and SharePoint events twice. Compare: OfficeActivity | where TimeGenerated > ago(1d) | where OfficeWorkload == "Exchange" | count vs CloudAppEvents | where TimeGenerated > ago(1d) | where Application == "Microsoft Exchange Online" | count. If the counts are similar, disable OfficeActivity for Exchange and rely on CloudAppEvents (which has richer fields).
Real-world cost scenarios
These scenarios illustrate the cost impact of connector-level optimisation at different environment sizes.
Scenario 1: 500-user M365 environment, cloud-only.
Data sources: Entra ID (SigninLogs + AuditLogs), Defender XDR (incidents + 8 Advanced Hunting tables), Azure Activity, Office 365.
Before optimisation: ~15 GB/day. Estimated monthly cost: ~$2,350.
Optimisations applied: AADNonInteractiveUserSignInLogs filtered to failures + external IPs only (removes ~4 GB/day). OfficeActivity disabled for Exchange (CloudAppEvents covers it — removes ~1 GB/day). Defender XDR tables: DeviceNetworkInfo and DeviceInfo deselected (removes ~2 GB/day).
After optimisation: ~8 GB/day. Estimated monthly cost: ~$1,250. Savings: ~$1,100/month (47% reduction).
Detection impact: zero. All security-relevant data retained. The removed data was high-volume inventory/operational data and duplicate Exchange events.
Scenario 2: 2,000-user hybrid environment (M365 + 200 Windows servers + 10 firewalls).
Data sources: all Microsoft connectors + Windows Security Events + CEF from firewalls.
Before optimisation: ~80 GB/day. Estimated monthly cost: ~$12,500.
Optimisations applied: Windows Security Events changed from “All Events” to “Common” (removes ~25 GB/day). Firewall configured to send deny logs only (removes ~15 GB/day). AADNonInteractiveUserSignInLogs filtered (removes ~5 GB/day). Commitment tier set to 100 GB/day for better per-GB rate.
After optimisation: ~35 GB/day on 100 GB commitment tier. Estimated monthly cost: ~$4,800. Savings: ~$7,700/month (62% reduction).
Scenario 3: 10,000-user enterprise, full deployment.
Before optimisation: ~300 GB/day. Estimated monthly cost: ~$47,000.
Optimisations applied: DCR transformations on SecurityEvent (removes service account logons: ~40 GB/day). CEF source-level filtering (deny only: ~60 GB/day removed). Defender XDR selective table ingestion (~30 GB/day removed). Workspace transformation on non-interactive sign-ins (~20 GB/day removed). 200 GB commitment tier.
After optimisation: ~150 GB/day on 200 GB commitment tier. Estimated monthly cost: ~$16,500. Savings: ~$30,500/month (65% reduction).
The monthly optimisation review
Build a monthly review into your operational cadence.
Step 1: Run the cost report query. Identify the top 5 tables by volume. For each, calculate the cost and compare to the previous month.
Step 2: Check for new high-volume patterns. Has any table’s volume increased by >25%? If yes, investigate: new data source connected, connector misconfiguration, or legitimate growth?
Step 3: Review DCR transformation effectiveness. For each active transformation, verify the expected volume reduction is still occurring. If a transformation was bypassed (e.g., the DCR was recreated without the transformation), the volume spikes back.
Step 4: Evaluate new optimisation opportunities. Review analytics rule usage: are there tables with zero rule hits that could move to Basic tier or be disconnected? Are there high-volume Event IDs that no rule queries?
Step 5: Update the cost forecast. Project next month’s cost based on current trends. Alert management if the forecast exceeds budget.
Building the cost justification for management
Security leadership needs to understand not just what Sentinel costs, but what value it delivers. The cost-per-incident metric (introduced in Module 7.5) is the most powerful justification tool.
Cost-per-incident calculation:
| |
If Sentinel costs $5,000/month and the SOC investigates 40 true positive incidents, the cost per incident is $125. Compare this to: the average cost of a missed incident (IBM reports $4.88M per data breach in 2024), the cost of the SOC analyst time saved by automated detection (if each incident takes 4 hours and an analyst costs $75/hour, that is $300 per incident in investigation time — Sentinel’s detection saved the analyst the time of finding the incident manually), and the cost of the alternative (a traditional SIEM licence + hardware at $15,000-$50,000/year with similar capability).
Management reporting template:
Monthly Sentinel Cost Report: Total ingestion: X GB/day (Y% change from last month). Monthly cost: $Z. Cost per true positive incident: $W. Top 5 data sources by volume and cost. Optimisation actions taken this month (with savings achieved). Forecast for next month. Recommendation: maintain current configuration / implement proposed optimisation.
Connector-level vs workspace-level: the complete cost reduction map
Summarise all cost reduction techniques across both levels for quick reference.
| Level | Technique | Impact | Risk |
|---|---|---|---|
| Connector | Do not connect low-value sources | 100% for that source | Visibility gap |
| Connector | Collection level (Common vs All) | 50-70% for SecurityEvent | Low (security events retained) |
| Connector | Source-level filtering (deny only) | 60-80% for CEF | Low (accept logs rarely queried) |
| Connector | Defender XDR table selection | 30-50% for XDR data | Low (inventory tables excluded) |
| Connector | DCR transformation | 10-50% per table | Medium (filtered data is lost) |
| Connector | Workspace transformation | 10-50% per table | Medium (filtered data is lost) |
| Workspace | XDR tier (free Defender data) | 20-40% of total | None |
| Workspace | Commitment tier | 15-30% per-GB rate | None (overage at PAYG) |
| Workspace | Basic tier for low-value tables | 60% for those tables | No analytics rules, no joins |
| Workspace | Retention optimisation | Varies | None (active data unaffected) |
| |
Building the cost optimisation report
Combine connector-level and workspace-level data for a complete cost picture:
| |
This query generates a report showing each data type’s monthly volume, estimated cost, and specific optimisation recommendations. Present this to management alongside the cost-per-incident metric from Module 7.5 to justify ingestion spend and demonstrate optimisation progress.
The diminishing returns curve
Cost optimisation has diminishing returns. The first round of optimisation (correct collection levels, remove obvious low-value data) captures 60-80% of potential savings with minimal effort. The second round (DCR transformations, source-level filtering) captures another 10-15%. Beyond that, each additional percentage of savings requires increasingly complex analysis and carries increasing risk of filtering investigation-relevant data.
Know when to stop. If your Sentinel cost is within 20% of the optimised baseline and all security-relevant data sources are connected, the cost is justified. Spending analyst time on further optimisation beyond this point has negative ROI — the analyst’s time is better spent investigating incidents.
Commitment tier selection guide
Module 7.5 introduced commitment tiers. Here is the practical selection process after your connectors are deployed and volume is baselined.
Step 1: Establish your baseline daily ingestion. Run the Usage query for 30 days. Calculate the minimum, average, and maximum daily ingestion:
| |
Step 2: Select the tier at your P10 (10th percentile) daily volume. This is the level you exceed 90% of days. Example: if your P10 is 85 GB/day, average is 120 GB/day, and max is 180 GB/day → select the 100 GB/day commitment tier. You get the discounted rate for 100 GB every day. On days when you ingest more than 100 GB, the overage is billed at pay-as-you-go rates. On days when you ingest less than 100 GB (which happens less than 10% of days), you pay the 100 GB commitment minimum.
Step 3: Re-evaluate quarterly. As you add new connectors, apply DCR transformations, and the environment grows, your baseline shifts. Re-run the assessment quarterly and adjust the commitment tier if the baseline has moved significantly.
Common mistake: setting the tier at your average. If average is 120 GB/day and you set a 200 GB commitment tier, you are paying for 200 GB on every day that ingestion is below 200 GB — which is most days. This wastes money. Always set at the minimum consistent level, not the average.
Maximising the free tier in lab environments
The 5 GB/day free ingestion allowance (first workspace per subscription) is sufficient for a meaningful lab with careful connector selection.
Free tier budget allocation:
Entra ID (interactive sign-ins + audit): ~0.5-1.5 GB/day depending on tenant activity. Defender XDR (incidents + 3-4 selected tables): ~0.5-2 GB/day depending on endpoints and email volume. Azure Activity: ~0.1-0.3 GB/day. Total: ~1.1-3.8 GB/day — within the 5 GB free tier.
What to avoid in the free tier: Do not enable AADNonInteractiveUserSignInLogs (can be 3-5x interactive volume alone). Do not enable all Defender XDR Advanced Hunting tables. Do not enable “All Events” for Windows Security Events.
Monitoring free tier usage: Set a daily cap at 5 GB as a safety net (in lab only — never in production). Monitor with: Usage | where TimeGenerated > ago(1d) | where IsBillable == true | summarize sum(Quantity) / 1024. If daily ingestion approaches 4 GB, review which data sources are consuming the most and consider disabling the most verbose.
Automated cost anomaly detection
Create an analytics rule that detects unexpected ingestion spikes — catching misconfigured connectors and unexpected data sources before they generate large bills.
| |
Configure this as a daily scheduled rule with low severity. Assign to the Sentinel administrator. The alert includes the specific data type, the spike magnitude, and the estimated extra cost — giving the administrator the information needed to investigate and remediate quickly. Common causes of spikes: a new connector was enabled without volume estimation, a DCR transformation was accidentally removed, a device started sending verbose logs after a configuration change, or a security incident is generating a burst of events (which is legitimate and should not be suppressed).
Try it yourself
Run the cost optimisation report query against your workspace. Identify the top 3 data types by monthly volume. For each, determine whether a connector-level optimisation applies: can the collection level be reduced? Can a DCR transformation filter low-value events? Can source-level filtering reduce volume? Document the potential savings for each optimisation. This is the analysis that justifies connector-level changes to management.
What you should observe
In a lab, volumes are small and optimisation has minimal cost impact. The exercise builds the analytical skill: identifying high-volume tables, tracing them to specific connectors, and proposing connector-level reduction techniques. In production, the top 2-3 tables typically account for 60-80% of total cost — and connector-level optimisation on those tables delivers the majority of savings.
Knowledge check
Check your understanding
1. Your firewall sends both accept and deny logs via CEF, generating 30 GB/day of CommonSecurityLog data. Analytics rules only query deny events. How do you reduce cost?