6.4 Data Collection Rules (DCRs)
Data Collection Rules (DCRs)
By the end of this subsection, you will understand how DCRs filter and transform data before ingestion, be able to build DCRs that reduce firewall and server log volume by 70-90%, and know the ingestion-time KQL transformation syntax.
Data Collection Rules are the most powerful cost optimization tool in Sentinel. They sit between the data source and the workspace, filtering rows, dropping columns, and transforming data before you pay for it. Data that a DCR drops is never ingested, never stored, and never billed.
How DCRs work
KQL transformation syntax
DCR transformations use a subset of KQL. The keyword source represents the incoming data stream. Supported operators:
| Operator | Purpose | Example |
|---|---|---|
where | Filter rows | source | where DeviceAction != "allow" |
project | Keep only specified columns | source | project TimeGenerated, SourceIP, DestinationIP |
project-away | Remove specified columns | source | project-away FlexNumber1, FlexString1 |
extend | Add computed columns | source | extend Region = iff(SourceIP startswith "10.", "Internal", "External") |
parse | Extract fields from text | source | parse SyslogMessage with * "user=" User " " |
project-rename | Rename columns | source | project-rename SrcIP = SourceIP |
join, summarize, union, let, and lookup are NOT supported in DCR transformations. Transformations operate on a single row at a time — they cannot aggregate or correlate across rows. If you need aggregation-based filtering, do it in analytics rules, not DCRs.
Example 1: Palo Alto firewall — 81% volume reduction
A Palo Alto firewall generates 80 GB/day. Security-relevant events are: denied connections, IDS/IPS alerts, URL filtering events, and VPN/admin authentication. Permitted traffic from trusted sources accounts for ~80% of volume.
| |
| Action | What it drops | Volume saved |
|---|---|---|
| Row filter: drop permitted traffic | ~80% of events (routine allowed connections) | ~64 GB/day |
| Column filter: drop vendor-specific Flex fields | 15 rarely-used columns per event | ~1 GB/day |
| Keep: severity 4-5 (even if allowed) | Retains high-severity events regardless of action | 0 (retention, not savings) |
| Net result | 80 GB → 15 GB | 81% reduction = $10,750/month saved |
Example 2: Windows Event Logs — 90% volume reduction
Windows servers generate thousands of events per hour. Most are informational (service start/stop, scheduled tasks, system performance). The security-relevant events have specific Event IDs:
| |
| Event ID range | What it captures |
|---|---|
| 4624-4634 | Successful/failed logon, logoff |
| 4648, 4672 | Explicit credential use, special privileges assigned |
| 4688 | Process creation (critical for malware detection) |
| 4720-4726 | Account created, enabled, disabled, deleted, password change |
| 4728-4756 | Group membership changes (user added to admin group) |
| 4768-4776 | Kerberos authentication events |
| 7045 | Service installed (persistence mechanism) |
| 1102 | Audit log cleared (anti-forensics) |
Process creation auditing is not enabled by default on Windows servers. You must enable "Audit Process Creation" in the Advanced Audit Policy and enable "Include command line in process creation events" in Group Policy. Without this, 4688 events are either absent or missing the command line — which is the most valuable field for malware detection.
Example 3: Fortinet FortiGate — filtering by log type
| |
Creating a DCR in the portal
- Navigate to Azure Monitor → Data Collection Rules → Create
- Name:
dcr-paloalto-cef-filter(descriptive naming) - Platform: Linux
- Data source: Select the Syslog/CEF data source
- Transformation: Paste your KQL transformation
- Destination: Your Sentinel workspace
- Click Create
Changes take effect within 5-10 minutes. Verify with the Usage query from Module 5.5.
Data dropped by a DCR is gone permanently. It was never ingested, never stored, never billed — and cannot be retrieved. If you filter too aggressively and later need the dropped data for an investigation, your only option is the source device's local log retention. Start with conservative filters (keep all denied + alerts + severity 4-5). Monitor for 2 weeks. Tighten only after confirming your investigation workflows do not need the dropped events.
Try it yourself
Use the Fortinet example above. Estimated savings: 40 GB x 75% = 30 GB dropped/day. Remaining: 10 GB/day.
Monthly savings: 30 GB x $5.52 x 30 = $4,968/month
Check your actual Fortinet field values by running a sample query against CommonSecurityLog — the DeviceAction values may differ slightly from the example depending on firmware version.
Check your understanding
1. You want to build an analytics rule that joins CommonSecurityLog with SigninLogs. Can the CommonSecurityLog data come through a DCR?
2. Your Windows server DCR filters to 22 Event IDs. A forensic investigator asks for Event ID 4663 (file access audit). Can you retrieve it?