6.1 Building an Ingestion Strategy

75 minutes · Module 6

Building an Ingestion Strategy

By the end of this subsection, you will be able to prioritize data sources by detection value, estimate ingestion volume and monthly cost, and build a phased connector deployment plan.

Connecting every data source on day one is a mistake. You end up with 50 GB/day of firewall noise before you have a single analytics rule to use it. Ingestion strategy means connecting data sources in the order that produces the most detection value per dollar spent.

The ingestion priority matrix

Prioritize connectors by answering two questions for each data source: (1) How many detection rules will this data enable? (2) How critical is this data during an active investigation?

PriorityData sourceDetection valueInvestigation valueTypical volume (500 users)Cost
1 — CriticalM365 Defender (Module 5.6)Highest — endpoint, email, identity, cloud appCore investigation data for every scenario5-15 GB/dayBillable
1 — CriticalEntra ID sign-in + audit (Module 5.7)Token replay, brute force, CA analysisEvery compromise investigation starts here1-3 GB/dayBillable
2 — HighAzure ActivityPrivilege escalation, resource tamperingCloud infrastructure investigation0.1-0.5 GB/dayFree
2 — HighOffice 365 audit logsMailItemsAccessed, file sharing anomaliesBEC, insider threat, data exfiltration0.5-2 GB/dayFree
3 — MediumFirewall (Palo Alto, Fortinet)C2 detection, lateral movement, data exfilNetwork-layer evidence, perimeter visibility5-50 GB/day (raw)Billable — DCR essential
3 — MediumDNS logsDNS tunneling, DGA detection, C2 beaconingMalware investigation, IOC correlation2-10 GB/dayBillable
4 — LowerWindows Event Logs (servers)Lateral movement, privilege escalation, persistenceServer-side forensics1-5 GB/day per serverBillable — DCR essential
4 — LowerLinux Syslog (servers)SSH brute force, cron manipulationServer/container investigation0.5-3 GB/day per serverBillable
5 — SpecializedThird-party SaaS (Okta, AWS, Salesforce)Cross-platform identity, cloud misconfigMulti-cloud investigationVariableBillable

Phased deployment timeline

PhaseTimelineConnectorsCumulative daily volumeMonthly cost
1Week 1M365 Defender + Entra ID (done in Module 5)~12 GB~$1,987
2Week 2Azure Activity (free) + Office 365 audit (free)~14 GB (+2 free)~$1,987
3Weeks 3-4Firewall via Syslog/CEF with DCR filtering~25 GB (+11 after DCR)~$4,140
4Month 2Windows Event Logs via AMA with DCR~35 GB (+10)~$5,796
5Month 3+DNS, Linux, SaaS as detection rules mature~45 GB (+10)~$7,452
Costs above use $5.52/GB combined (pay-as-you-go)

These are planning estimates. Actual costs depend on activity levels, DCR filter aggressiveness, and commitment tier selection. At Phase 4 (~35 GB/day), you are still below the 100 GB commitment threshold — stay on pay-as-you-go.

The detection-first principle

Before connecting any data source, answer: “What specific detections will I build with this data?”

If the answer is vague (“we might need it someday”), defer. If it is specific (“I will build a detection rule for DNS tunneling using the DnsEvents table”), connect it. Data without detections is cost with no security value.

Decision: should you connect this data source now?

Your security team asks you to connect the company's Cisco Meraki SD-WAN to Sentinel. It generates approximately 8 GB/day of flow data.
Do you have specific analytics rules planned for this data?

Estimating volume before connecting

Never connect a data source without estimating its daily volume. Three methods:

Method 1: Vendor documentation. Microsoft and third-party vendors publish typical volumes per connector.

Method 2: Test ingestion. Connect for 24 hours, then measure:

1
2
3
4
5
Usage
| where TimeGenerated > ago(24h)
| where IsBillable == true
| summarize DailyGB = round(sum(Quantity) / 1024, 2) by DataType
| sort by DailyGB desc
Expected Output
DataTypeDailyGB
DeviceProcessEvents3.4
AADNonInteractiveUserSignInLogs1.9
CommonSecurityLog22.7
What to look for: CommonSecurityLog at 22.7 GB is your newly connected firewall. Is that within budget? If not, apply a DCR filter (subsection 6.4) before the next billing cycle.

Method 3: Source-side calculation. Check the device’s log generation rate (events per second). Formula: (EPS x avg event size in bytes x 86,400) / 1,073,741,824 = daily GB.

Try it yourself

Your organization has 500 M365 E5 users, 4 Palo Alto firewalls, 10 Windows servers, and an Okta tenant. Using the volume estimates from the priority matrix, calculate: (1) total estimated daily ingestion, (2) billable vs free split, (3) monthly cost at pay-as-you-go, and (4) which deployment phase each source belongs in.

Volume estimate:

M365 Defender: ~10 GB/day (Phase 1). Entra ID: ~2 GB/day (Phase 1). Azure Activity: ~0.3 GB/day, free (Phase 2). Office 365 audit: ~1 GB/day, free (Phase 2). 4 Palo Alto firewalls: ~12 GB/day after DCR from ~50 GB raw (Phase 3). 10 Windows servers: ~15 GB/day after DCR from ~40 GB raw (Phase 4). Okta: ~0.5 GB/day (Phase 5).

Total: ~41 GB/day. Billable: ~39.5 GB. Free: ~1.3 GB.

Monthly cost: 39.5 x $5.52 x 30 = ~$6,541/month

DCR filters on firewalls and servers save approximately $13,000/month vs raw ingestion.

Check your understanding

1. Your team wants to connect 6 data sources simultaneously in week one. What is the risk?

You cannot build analytics rules for 6 data sources simultaneously. Sources without detections are cost with no security value. Phase the deployment to match your rule-building capacity — 2-3 sources per phase.
Sentinel cannot handle 6 connectors
Configuration conflicts

2. You connected a firewall and it is ingesting 22 GB/day — double your estimate. What is the immediate action?

Deploy a Data Collection Rule to filter the ingestion to security-relevant events only (denied, IDS, authentication). This is an emergency cost control measure — 22 GB/day at $5.52 = $3,643/month of unfiltered firewall data.
Disconnect the firewall
Increase the budget