6.4 Data Collection Rules (DCRs)

75 minutes · Module 6

Data Collection Rules (DCRs)

By the end of this subsection, you will understand how DCRs filter and transform data before ingestion, be able to build DCRs that reduce firewall and server log volume by 70-90%, and know the ingestion-time KQL transformation syntax.

Data Collection Rules are the most powerful cost optimization tool in Sentinel. They sit between the data source and the workspace, filtering rows, dropping columns, and transforming data before you pay for it. Data that a DCR drops is never ingested, never stored, and never billed.

How DCRs work

Raw data80 GB/dayDCR TransformationFilter rows, drop columns, parseKQL at ingestion timeWorkspace15 GB/day (81% savings)

KQL transformation syntax

DCR transformations use a subset of KQL. The keyword source represents the incoming data stream. Supported operators:

OperatorPurposeExample
whereFilter rowssource | where DeviceAction != "allow"
projectKeep only specified columnssource | project TimeGenerated, SourceIP, DestinationIP
project-awayRemove specified columnssource | project-away FlexNumber1, FlexString1
extendAdd computed columnssource | extend Region = iff(SourceIP startswith "10.", "Internal", "External")
parseExtract fields from textsource | parse SyslogMessage with * "user=" User " "
project-renameRename columnssource | project-rename SrcIP = SourceIP
DCR KQL is NOT full KQL

join, summarize, union, let, and lookup are NOT supported in DCR transformations. Transformations operate on a single row at a time — they cannot aggregate or correlate across rows. If you need aggregation-based filtering, do it in analytics rules, not DCRs.

Example 1: Palo Alto firewall — 81% volume reduction

A Palo Alto firewall generates 80 GB/day. Security-relevant events are: denied connections, IDS/IPS alerts, URL filtering events, and VPN/admin authentication. Permitted traffic from trusted sources accounts for ~80% of volume.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
source
| where DeviceAction != "allow"
    or Activity has_any ("threat", "alert", "url-filtering", "auth", "wildfire")
    or LogSeverity in (4, 5)
| project-away
    FlexNumber1, FlexNumber2,
    FlexString1, FlexString2,
    DeviceCustomNumber1, DeviceCustomNumber2, DeviceCustomNumber3,
    DeviceCustomString1, DeviceCustomString2, DeviceCustomString3,
    DeviceCustomString4, DeviceCustomString5, DeviceCustomString6,
    AdditionalExtensions, ExternalID, ExtID
Impact Analysis
ActionWhat it dropsVolume saved
Row filter: drop permitted traffic~80% of events (routine allowed connections)~64 GB/day
Column filter: drop vendor-specific Flex fields15 rarely-used columns per event~1 GB/day
Keep: severity 4-5 (even if allowed)Retains high-severity events regardless of action0 (retention, not savings)
Net result80 GB → 15 GB81% reduction = $10,750/month saved
Why keep LogSeverity 4-5: High-severity events from a firewall may have an "allow" action (the traffic passed) but still indicate a security concern — an IDS signature match that was in alert-only mode, for example. Keeping severity 4-5 regardless of action prevents accidentally dropping important events.

Example 2: Windows Event Logs — 90% volume reduction

Windows servers generate thousands of events per hour. Most are informational (service start/stop, scheduled tasks, system performance). The security-relevant events have specific Event IDs:

1
2
3
4
5
6
7
8
9
source
| where EventID in (
    4624, 4625, 4634,
    4648, 4672, 4688,
    4720, 4722, 4723, 4724, 4725, 4726,
    4728, 4732, 4733, 4756,
    4768, 4769, 4771, 4776,
    7045, 1102
)
Event ID rangeWhat it captures
4624-4634Successful/failed logon, logoff
4648, 4672Explicit credential use, special privileges assigned
4688Process creation (critical for malware detection)
4720-4726Account created, enabled, disabled, deleted, password change
4728-4756Group membership changes (user added to admin group)
4768-4776Kerberos authentication events
7045Service installed (persistence mechanism)
1102Audit log cleared (anti-forensics)
Event ID 4688 (Process Creation) requires audit policy configuration

Process creation auditing is not enabled by default on Windows servers. You must enable "Audit Process Creation" in the Advanced Audit Policy and enable "Include command line in process creation events" in Group Policy. Without this, 4688 events are either absent or missing the command line — which is the most valuable field for malware detection.

Example 3: Fortinet FortiGate — filtering by log type

1
2
3
4
5
6
7
8
9
source
| where DeviceAction in ("deny", "drop", "block", "reset", "close")
    or Activity has_any ("ips", "intrusion", "anomaly", "virus", "botnet", "webfilter")
    or Activity has "vpn"
| project-away
    FlexNumber1, FlexNumber2,
    FlexString1, FlexString2,
    DeviceCustomString1, DeviceCustomString2,
    AdditionalExtensions

Creating a DCR in the portal

  1. Navigate to Azure Monitor → Data Collection Rules → Create
  2. Name: dcr-paloalto-cef-filter (descriptive naming)
  3. Platform: Linux
  4. Data source: Select the Syslog/CEF data source
  5. Transformation: Paste your KQL transformation
  6. Destination: Your Sentinel workspace
  7. Click Create

Changes take effect within 5-10 minutes. Verify with the Usage query from Module 5.5.

You cannot recover DCR-filtered data

Data dropped by a DCR is gone permanently. It was never ingested, never stored, never billed — and cannot be retrieved. If you filter too aggressively and later need the dropped data for an investigation, your only option is the source device's local log retention. Start with conservative filters (keep all denied + alerts + severity 4-5). Monitor for 2 weeks. Tighten only after confirming your investigation workflows do not need the dropped events.

Try it yourself

Your Fortinet firewall generates 40 GB/day. Approximately 75% is permitted traffic. Write the DCR transformation KQL, estimate the volume after filtering, and calculate the monthly cost savings at $5.52/GB combined rate.

Use the Fortinet example above. Estimated savings: 40 GB x 75% = 30 GB dropped/day. Remaining: 10 GB/day.

Monthly savings: 30 GB x $5.52 x 30 = $4,968/month

Check your actual Fortinet field values by running a sample query against CommonSecurityLog — the DeviceAction values may differ slightly from the example depending on firmware version.

Check your understanding

1. You want to build an analytics rule that joins CommonSecurityLog with SigninLogs. Can the CommonSecurityLog data come through a DCR?

Yes — the DCR filters and transforms data before ingestion, but the data that passes through lands in the normal CommonSecurityLog table. Analytics rules query the table, not the DCR. The DCR only affects what data reaches the table.
No — DCR data cannot be used in analytics rules
Only if the analytics rule runs on the Basic tier

2. Your Windows server DCR filters to 22 Event IDs. A forensic investigator asks for Event ID 4663 (file access audit). Can you retrieve it?

Restore from Sentinel archive
Re-run the DCR in catch-up mode
The DCR dropped Event 4663 before ingestion — it does not exist in Sentinel. Check the server's local Windows Event Log (typically retained 30-90 days depending on configuration). If needed for future investigations, add 4663 to the DCR filter.