6.1 Building an Ingestion Strategy
Building an Ingestion Strategy
By the end of this subsection, you will be able to prioritize data sources by detection value, estimate ingestion volume and monthly cost, and build a phased connector deployment plan.
Connecting every data source on day one is a mistake. You end up with 50 GB/day of firewall noise before you have a single analytics rule to use it. Ingestion strategy means connecting data sources in the order that produces the most detection value per dollar spent.
The ingestion priority matrix
Prioritize connectors by answering two questions for each data source: (1) How many detection rules will this data enable? (2) How critical is this data during an active investigation?
| Priority | Data source | Detection value | Investigation value | Typical volume (500 users) | Cost |
|---|---|---|---|---|---|
| 1 — Critical | M365 Defender (Module 5.6) | Highest — endpoint, email, identity, cloud app | Core investigation data for every scenario | 5-15 GB/day | Billable |
| 1 — Critical | Entra ID sign-in + audit (Module 5.7) | Token replay, brute force, CA analysis | Every compromise investigation starts here | 1-3 GB/day | Billable |
| 2 — High | Azure Activity | Privilege escalation, resource tampering | Cloud infrastructure investigation | 0.1-0.5 GB/day | Free |
| 2 — High | Office 365 audit logs | MailItemsAccessed, file sharing anomalies | BEC, insider threat, data exfiltration | 0.5-2 GB/day | Free |
| 3 — Medium | Firewall (Palo Alto, Fortinet) | C2 detection, lateral movement, data exfil | Network-layer evidence, perimeter visibility | 5-50 GB/day (raw) | Billable — DCR essential |
| 3 — Medium | DNS logs | DNS tunneling, DGA detection, C2 beaconing | Malware investigation, IOC correlation | 2-10 GB/day | Billable |
| 4 — Lower | Windows Event Logs (servers) | Lateral movement, privilege escalation, persistence | Server-side forensics | 1-5 GB/day per server | Billable — DCR essential |
| 4 — Lower | Linux Syslog (servers) | SSH brute force, cron manipulation | Server/container investigation | 0.5-3 GB/day per server | Billable |
| 5 — Specialized | Third-party SaaS (Okta, AWS, Salesforce) | Cross-platform identity, cloud misconfig | Multi-cloud investigation | Variable | Billable |
Phased deployment timeline
| Phase | Timeline | Connectors | Cumulative daily volume | Monthly cost |
|---|---|---|---|---|
| 1 | Week 1 | M365 Defender + Entra ID (done in Module 5) | ~12 GB | ~$1,987 |
| 2 | Week 2 | Azure Activity (free) + Office 365 audit (free) | ~14 GB (+2 free) | ~$1,987 |
| 3 | Weeks 3-4 | Firewall via Syslog/CEF with DCR filtering | ~25 GB (+11 after DCR) | ~$4,140 |
| 4 | Month 2 | Windows Event Logs via AMA with DCR | ~35 GB (+10) | ~$5,796 |
| 5 | Month 3+ | DNS, Linux, SaaS as detection rules mature | ~45 GB (+10) | ~$7,452 |
These are planning estimates. Actual costs depend on activity levels, DCR filter aggressiveness, and commitment tier selection. At Phase 4 (~35 GB/day), you are still below the 100 GB commitment threshold — stay on pay-as-you-go.
The detection-first principle
Before connecting any data source, answer: “What specific detections will I build with this data?”
If the answer is vague (“we might need it someday”), defer. If it is specific (“I will build a detection rule for DNS tunneling using the DnsEvents table”), connect it. Data without detections is cost with no security value.
Decision: should you connect this data source now?
Estimating volume before connecting
Never connect a data source without estimating its daily volume. Three methods:
Method 1: Vendor documentation. Microsoft and third-party vendors publish typical volumes per connector.
Method 2: Test ingestion. Connect for 24 hours, then measure:
| |
| DataType | DailyGB |
|---|---|
| DeviceProcessEvents | 3.4 |
| AADNonInteractiveUserSignInLogs | 1.9 |
| CommonSecurityLog | 22.7 |
Method 3: Source-side calculation. Check the device’s log generation rate (events per second). Formula: (EPS x avg event size in bytes x 86,400) / 1,073,741,824 = daily GB.
Try it yourself
Volume estimate:
M365 Defender: ~10 GB/day (Phase 1). Entra ID: ~2 GB/day (Phase 1). Azure Activity: ~0.3 GB/day, free (Phase 2). Office 365 audit: ~1 GB/day, free (Phase 2). 4 Palo Alto firewalls: ~12 GB/day after DCR from ~50 GB raw (Phase 3). 10 Windows servers: ~15 GB/day after DCR from ~40 GB raw (Phase 4). Okta: ~0.5 GB/day (Phase 5).
Total: ~41 GB/day. Billable: ~39.5 GB. Free: ~1.3 GB.
Monthly cost: 39.5 x $5.52 x 30 = ~$6,541/month
DCR filters on firewalls and servers save approximately $13,000/month vs raw ingestion.
Check your understanding
1. Your team wants to connect 6 data sources simultaneously in week one. What is the risk?
2. You connected a firewall and it is ingesting 22 GB/day — double your estimate. What is the immediate action?