DE1.2 Scheduled Rules — Query and Lookback Window

2-3 hours · Module 1 · Free

Operational Objective

The Query Design Question: A detection rule starts with a KQL query — but the query that works in Advanced Hunting does not automatically work as an analytics rule. The lookback window determines how far back the query searches, which affects both detection coverage (do you catch events that happened 4 hours ago?) and performance (does the query scan 14 days of data every 5 minutes?). This subsection teaches how to design the query and lookback window for production analytics rules — balancing detection coverage, performance, and deduplication.

Deliverable: Understanding of lookback window architecture and its interaction with query frequency, deduplication, and performance.

⏱ Estimated completion: 20 minutes

The lookback window

The lookback window defines the time range the query scans each time it runs. A 1-hour lookback means the query filters for TimeGenerated > ago(1h) on each execution. A 5-minute lookback means TimeGenerated > ago(5m).

The lookback window must be at least as long as the query frequency. If the rule runs every 30 minutes with a 15-minute lookback, there is a 15-minute gap between executions where events are never scanned. Events occurring in that gap are invisible — permanently missed. This is a silent detection failure that produces no error and no warning.

The recommended pattern is lookback = frequency + buffer. If the rule runs every 30 minutes, set the lookback to 35-40 minutes. The overlap ensures that events ingested with slight delay (common with some data connectors that batch events) are captured. The deduplication mechanism (alert grouping, covered in DE1.9) prevents the overlap from generating duplicate alerts.

Figure DE1.2 — Lookback window configuration. Top: correct overlap prevents gaps. Bottom: lookback shorter than frequency creates a permanent detection gap where events are never scanned.

Query design for analytics rules

An Advanced Hunting query and an analytics rule query serve different purposes. In Advanced Hunting, you run the query once and review the results interactively. In an analytics rule, the query runs automatically every N minutes and the results become alerts. This difference drives three design constraints.

Constraint 1: The query must be deterministic. Running the same query twice against the same data should produce the same results. If your query uses take 10 or sample, results vary between runs and alert behavior becomes unpredictable. Use top 10 by Timestamp desc for deterministic ordering.

Constraint 2: The query should return only actionable results. In Advanced Hunting, you might return 500 rows and manually filter. In an analytics rule, every returned row generates an alert (or contributes to an alert, depending on grouping). If the query returns benign activity alongside malicious activity, the benign rows become false positive alerts. Filter aggressively in the query — the fewer rows returned that are not true positives, the higher the rule’s TP rate.

Constraint 3: The query must include all entity fields in the output. Entity mapping (DE1.4) extracts Account, Host, IP, and other entities from the query results. If your query does not project or extend the fields that entity mapping needs, the resulting alerts lack entity context and cannot be correlated into multi-alert incidents.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// PRODUCTION PATTERN: Analytics rule query structure
let lookback = 1h;  // Must match or exceed the rule's lookback window setting
let threshold = 5;   // Detection threshold — tune per environment
SigninLogs
| where TimeGenerated > ago(lookback)
| where ResultType == "50126"  // Invalid username or password
| summarize
    FailedAttempts = count(),
    DistinctUsers = dcount(UserPrincipalName),
    TargetUsers = make_set(UserPrincipalName, 10),
    FirstAttempt = min(TimeGenerated),
    LastAttempt = max(TimeGenerated)
    by IPAddress, Location
| where FailedAttempts > threshold
    and DistinctUsers > 3  // Spray pattern: many users from one IP
// Entity fields in output: IPAddress (for IP entity), TargetUsers (for Account entity)
// Aggregated fields: FailedAttempts, DistinctUsers (for alert enrichment)

⚠ Compliance Myth: "Set the lookback to 14 days to ensure maximum detection coverage"

The myth: A longer lookback window catches more events. Setting the lookback to the maximum (14 days) ensures nothing is missed.

The reality: A 14-day lookback running every 5 minutes scans 14 days of data on every execution — that is 2,016 query executions per day, each scanning two weeks of logs. The query cost (in query compute units) is enormous and can exceed your workspace’s query limits. Additionally, a 14-day lookback with a 5-minute frequency means every event is scanned 4,032 times before it ages out of the window. Without proper deduplication (alert grouping), this generates massive duplicate alerts. Set the lookback to match the detection requirement: if you need to detect password spray within 1 hour, use a 1-hour lookback. If you need to detect slow exfiltration over 24 hours, use a 24-hour lookback with a corresponding lower frequency (every 1-4 hours).

Performance considerations

Every analytics rule query consumes compute resources. Sentinel charges are based on data ingestion, not query execution — but poorly designed queries can hit workspace query limits, causing rules to fail silently. The query execution timeout for analytics rules is 10 minutes. If the query does not complete in 10 minutes, the rule fails for that execution cycle.

Rules that join large tables without time filtering are the most common performance problem. A query that joins DeviceProcessEvents (3.2 GB/day for NE) with SigninLogs (2.1 GB/day) over a 24-hour lookback scans approximately 5.3 GB per execution. If the join condition is broad (UserPrincipalName only, without time correlation), the intermediate result set can be enormous.

The fix: always filter both sides of a join to the minimum required time window before joining. Use let statements to materialize filtered subsets. Apply where clauses before summarize to reduce row counts early in the query pipeline.

Production pattern: optimized cross-table join

The following pattern demonstrates how to join DeviceProcessEvents with SigninLogs for CHAIN-MESH detection — correlating a risky sign-in with subsequent suspicious process execution on a device the user accessed. The materialized let statements filter each table to the minimum required dataset before the join executes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// PRODUCTION PATTERN: Optimized cross-table join
// Detect: risky sign-in → RDP to new server → recon commands (CHAIN-MESH Phases 1-2)
let lookback = 1h;
let riskySignins = materialize(
    SigninLogs
    | where TimeGenerated > ago(lookback)
    | where RiskLevelDuringSignIn in ("medium", "high")
    | where AppDisplayName has_any ("Windows Sign In", "Microsoft Remote Desktop")
    | project SigninTime = TimeGenerated, UserPrincipalName, IPAddress,
        RiskLevel = RiskLevelDuringSignIn, DeviceDetail
);
let suspiciousProcesses = materialize(
    DeviceProcessEvents
    | where TimeGenerated > ago(lookback)
    | where FileName in~ ("nltest.exe", "net.exe", "whoami.exe", "systeminfo.exe")
    | where InitiatingProcessFileName in~ ("cmd.exe", "powershell.exe")
    | project ProcessTime = TimeGenerated, DeviceName, AccountName,
        FileName, ProcessCommandLine
);
riskySignins
| join kind=inner suspiciousProcesses
    on $left.UserPrincipalName == $right.AccountName
| where ProcessTime between (SigninTime .. (SigninTime + 30m))
// Only correlate processes within 30 min of the risky sign-in
| project SigninTime, ProcessTime, UserPrincipalName, IPAddress,
    RiskLevel, DeviceName, FileName, ProcessCommandLine
// Entity mapping: Account → UserPrincipalName, Host → DeviceName, IP → IPAddress

The materialize() function evaluates each subquery once and caches the result. Without it, the engine re-evaluates the subquery for each row in the join — potentially scanning the full table multiple times. For NE’s DeviceProcessEvents at 3.2 GB/day, this optimization reduces query time from minutes to seconds.

Lookback selection by detection type

Different detection patterns require different lookback windows. The lookback should match the detection’s temporal scope — not a one-size-fits-all default.

Single-event detections (5-15 minute lookback): Detecting individual suspicious events — vssadmin shadow copy deletion, LSASS access, encoded PowerShell command. The event is suspicious in isolation. A short lookback matching the frequency (5 minutes for a 5-minute rule) is sufficient. NE examples: ransomware pre-encryption (5 min), C2 beacon command pattern (5 min).

Volume-based detections (30-60 minute lookback): Detecting patterns that emerge from event volume — password spray (50+ failed logins from one IP), MFA fatigue (10+ MFA denials for one user), bulk file access (100+ file downloads in one session). The lookback must be long enough to capture the volume pattern. A password spray spread over 45 minutes requires at least a 45-minute lookback. NE examples: password spray (60 min), MFA push bombing (30 min), bulk SharePoint download (60 min).

Correlation-based detections (1-4 hour lookback): Detecting patterns that require two or more events across different data sources with a temporal relationship — risky sign-in THEN inbox rule creation WITHIN 60 minutes, PIM activation THEN app registration WITHIN 2 hours. The lookback must cover the entire correlation window. NE examples: AiTM → inbox rule (60 min lookback, 15-min frequency), PIM activation → scope creep (2-hour lookback, 15-min frequency).

Slow-burn detections (12-24 hour lookback): Detecting patterns that emerge over extended periods — data exfiltration spread across a full workday, reconnaissance commands distributed over 8 hours to avoid volume-based detection, configuration drift over multiple changes. The lookback covers a full business day. The frequency is lower to match (1-4 hours). NE examples: CHAIN-DRIFT config change + exploitation window (24-hour lookback, 4-hour frequency), slow exfiltration (12-hour lookback, 1-hour frequency).

Deduplication and the overlap problem

When the lookback exceeds the frequency (recommended), events in the overlap window are scanned by two consecutive executions. Without deduplication, the same event generates alerts on both executions — duplicate alerts.

Sentinel handles this through alert grouping (DE1.9). If alert grouping is configured to “group by entity within a time window,” the second alert for the same entity within the grouping window is absorbed into the existing incident rather than creating a new one. The overlap does not produce duplicate incidents.

However, if grouping is set to “one alert per event” or if the grouping window has expired, duplicates can occur. The detection engineer must be aware of this interaction: lookback overlap provides gap-free coverage (good), but requires alert grouping to prevent duplicates (configuration dependency). When designing a rule, the lookback and grouping configurations should be set together — not independently.

For NE’s detection library, the standard pattern is: lookback = frequency × 1.2 (20% overlap) with entity-based alert grouping and a 5-hour grouping window. This combination provides gap-free detection with automatic deduplication for all but the most unusual edge cases.

Try it yourself

Exercise: Check your lookback/frequency ratios

Open Sentinel → Analytics → Active rules. For each scheduled rule, compare the lookback window to the frequency. Are any rules configured with lookback < frequency? Those rules have detection gaps. Are any rules configured with a 14-day lookback and a 5-minute frequency? Those rules are consuming excessive query compute.

The optimal ratio: lookback = frequency × 1.1 to 1.5 (10-50% overlap). This provides gap-free coverage with minimal compute waste.

Check your understanding

A scheduled rule runs every 30 minutes with a 30-minute lookback. An event occurs at 10:14. The rule runs at 10:00, 10:30, and 11:00. At which execution is the event scanned?

Answer: At 10:30. The 10:30 execution scans events from 10:00 to 10:30 (30-minute lookback). The event at 10:14 falls within this window. The 10:00 execution scanned 09:30 to 10:00 — the event had not occurred yet. The 11:00 execution scans 10:30 to 11:00 — the event at 10:14 is no longer in the window. With zero overlap (lookback equals frequency), the event is scanned exactly once. If the event had occurred at 10:00:01 and was ingested with a 30-second delay (arriving at 10:00:31), the 10:00 execution would miss it (it queries data available at 10:00:00) and the 10:30 execution would catch it. This is why a small overlap (lookback > frequency) is recommended — it catches ingestion-delayed events.

Troubleshooting: Rule execution failures

“Rule failed to execute — query timeout.” The query took longer than 10 minutes. Optimize: add more specific where clauses, reduce the lookback window, or break a complex join into materialized subsets.

“Rule returned 0 results but I know the event occurred.” Check three things: (1) Is the lookback window long enough to cover the event’s TimeGenerated? (2) Is the data connector active — did the event actually ingest into the workspace? (3) Does the query’s where clause correctly match the event? Run the same query in Advanced Hunting with an explicit time filter around the known event to verify.

“Rule fires constantly — hundreds of alerts per day.” The query is too broad or the threshold is too low. Review the query results in Advanced Hunting for the past 24 hours. How many rows are returned? If the query returns 500 rows per day and your threshold is 1, you will get 500 alerts. Increase the threshold or add filtering to exclude known benign patterns.

References used in this subsection

Course cross-references: DE1.3 (frequency and trigger logic), DE1.4 (entity mapping), DE1.9 (alert grouping and deduplication), DE9 (tuning methodology)

You're reading the free modules of Detection Engineering

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus

← DE1.1 Sentinel Analytics Rule Types DE1.3 Scheduled Rules — Frequency and Trigger Logic →