9.2 Creating Scheduled Analytics Rules

16-20 hours · Module 9

Creating Scheduled Analytics Rules

Introduction

Scheduled analytics rules are the primary detection mechanism in Sentinel. You define the threat pattern as a KQL query, configure how often the query runs, set the threshold for when to alert, map entities for investigation, and define how alerts group into incidents. This subsection walks through the complete creation process with production-ready examples.


The creation wizard: step by step

Navigate to Sentinel → Analytics → Create → Scheduled query rule. The wizard has five tabs: General, Set rule logic, Incident settings, Automated response, and Review + create.

Tab 1: General. Name the rule with a clear, actionable description: “Multiple failed sign-ins followed by success from same IP” is better than “Brute force.” Set severity based on the guidelines from subsection 9.1. Map to MITRE ATT&CK technique(s). Add a description that explains what the rule detects and why it matters — the analyst reading the incident needs this context.

Tab 2: Set rule logic. This is where you define the detection.

The KQL query. Write or paste your KQL query. The query must return at least one row for an alert to generate. Every column in the output is available for entity mapping and custom details. Design the query to project the fields an analyst needs for triage: timestamps, user identities, IP addresses, device names, and contextual details.

Production example — brute-force detection:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
let FailureThreshold = 20;
let TimeWindow = 1h;
SigninLogs
| where TimeGenerated > ago(TimeWindow)
| where ResultType != "0"  // Failed sign-ins
| summarize
    FailureCount = count(),
    TargetAccounts = make_set(UserPrincipalName, 10),
    FailureReasons = make_set(ResultDescription, 5),
    FirstFailure = min(TimeGenerated),
    LastFailure = max(TimeGenerated)
    by IPAddress
| where FailureCount > FailureThreshold
// Check for subsequent success from the same IP
| join kind=inner (
    SigninLogs
    | where TimeGenerated > ago(TimeWindow)
    | where ResultType == "0"
    | project SuccessTime = TimeGenerated, IPAddress,
        CompromisedAccount = UserPrincipalName, AppDisplayName
) on IPAddress
| where SuccessTime > LastFailure  // Success must come after failures
| project
    IPAddress, FailureCount, TargetAccounts, FailureReasons,
    CompromisedAccount, SuccessTime, AppDisplayName,
    FirstFailure, LastFailure

This query detects the most dangerous brute-force pattern: many failures followed by a success — indicating the attacker guessed the correct password. The output projects every field the analyst needs: the attacking IP, the compromised account, the failure count, and the timestamp of the successful compromise.

Production example — inbox rule forwarding to external domain:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
CloudAppEvents
| where TimeGenerated > ago(1h)
| where ActionType == "New-InboxRule"
| extend RuleParams = parse_json(RawEventData)
| extend ForwardTo = tostring(RuleParams.Parameters
    | mv-expand RuleParams.Parameters
    | where tostring(RuleParams.Parameters.Name) == "ForwardTo"
    | project tostring(RuleParams.Parameters.Value))
| extend RedirectTo = tostring(RuleParams.Parameters
    | mv-expand RuleParams.Parameters
    | where tostring(RuleParams.Parameters.Name) == "RedirectTo"
    | project tostring(RuleParams.Parameters.Value))
| where isnotempty(ForwardTo) or isnotempty(RedirectTo)
| extend ExternalAddress = coalesce(ForwardTo, RedirectTo)
| where ExternalAddress !endswith "@northgateeng.com"  // Not internal domain
| project TimeGenerated, AccountObjectId,
    UserPrincipalName = tostring(RuleParams.UserId),
    ExternalAddress, IPAddress = tostring(RuleParams.ClientIP),
    RuleName = tostring(RuleParams.Parameters
        | mv-expand RuleParams.Parameters
        | where tostring(RuleParams.Parameters.Name) == "Name"
        | project tostring(RuleParams.Parameters.Value))

This rule detects BEC persistence — an attacker who has compromised a mailbox and creates a forwarding rule to exfiltrate email to an external address.

Schedule configuration. Set the frequency (how often the rule runs) and the lookback period (how far back the query examines). For the brute-force rule: run every 5 minutes, lookback 1 hour (matching the TimeWindow in the query). For the inbox rule: run every 15 minutes, lookback 1 hour.

Alert threshold. Generate an alert when the query returns more than 0 results (default). You can also set a threshold: “Generate alert when number of query results is greater than 10” — useful for noisy detections where a small number of results is normal.

Event grouping. Choose between “Group all events into a single alert” (one alert per rule execution, with all result rows in the alert) or “Trigger an alert for each event” (one alert per result row). For the brute-force rule: trigger per event (each attacking IP gets its own alert). For the inbox rule: trigger per event (each forwarding rule gets its own alert).

Tab 3: Incident settings. Enable incident creation (default: enabled). Configure alert grouping as described in subsection 9.1: group by entities for entity-correlated incidents.

Tab 4: Automated response. Attach automation rules or playbooks that run when the rule generates an incident. Covered in subsections 9.6 and 9.7.

Tab 5: Review + create. Review all settings and save. The rule begins executing on the configured schedule immediately.


Query design patterns for detection rules

Pattern 1: Threshold detection. Count events and alert when the count exceeds a threshold. Used for: brute-force, port scanning, mass file download.

1
2
3
4
5
6
7
// Pattern: count events, threshold, project entities
TableName
| where TimeGenerated > ago(1h)
| where <filter condition>
| summarize Count = count(), <aggregations> by <entity>
| where Count > <threshold>
| project <entity fields for mapping>

Pattern 2: Rare event detection. Alert when a specific event type occurs that should never (or almost never) happen. Used for: security log cleared, honeytoken activation, admin role assigned to unexpected user.

1
2
3
4
5
// Pattern: filter for the specific event, no aggregation needed
TableName
| where TimeGenerated > ago(15m)
| where <specific event condition>
| project <entity fields>

Pattern 3: Sequence detection. Alert when event A is followed by event B within a time window. Used for: brute-force-then-success, privilege escalation chains, phishing-then-inbox-rule.

1
2
3
4
5
6
7
// Pattern: join events with temporal correlation
let EventA = TableName | where <first event condition>;
let EventB = TableName | where <second event condition>;
EventA
| join kind=inner EventB on <shared entity>
| where EventB_TimeGenerated > EventA_TimeGenerated
| where datetime_diff('minute', EventB_TimeGenerated, EventA_TimeGenerated) < 60

Pattern 4: Baseline deviation. Alert when current activity deviates significantly from historical norms. Used for: unusual data access volume, first-time access to sensitive resource, abnormal process execution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Pattern: compare current to historical baseline
let Baseline = TableName
| where TimeGenerated between(ago(30d) .. ago(1d))
| summarize AvgDaily = count() / 29 by <entity>;
let Current = TableName
| where TimeGenerated > ago(1d)
| summarize TodayCount = count() by <entity>;
Current
| join kind=inner Baseline on <entity>
| where TodayCount > AvgDaily * 3
| project <entity>, TodayCount, AvgDaily, DeviationMultiple = TodayCount / AvgDaily

Pattern 5: Watchlist/TI match. Alert when an observed indicator matches a threat intelligence or watchlist entry. Used for: known malicious IP, domain, or file hash detection.

1
2
3
4
5
6
// Pattern: join observed data with threat intelligence
let ThreatIPs = _GetWatchlist('HighRiskIPs') | project IPAddress = SearchKey;
SigninLogs
| where TimeGenerated > ago(15m)
| where IPAddress in (ThreatIPs)
| project TimeGenerated, UserPrincipalName, IPAddress, AppDisplayName, ResultType

Testing rules before deployment

Never deploy a rule directly to production without testing. A poorly written rule generates false positives that overwhelm the SOC queue, or false negatives that miss real threats.

Test step 1: Run the query manually. Paste the KQL query into the Logs blade and run it against historical data. Does it return results? Are the results true positives (real threats or realistic test data)?

Test step 2: Estimate alert volume. Run the query against 7 days of data with the production threshold. How many alerts would it have generated per day? If the answer is more than 10 per day for a single rule, the threshold needs tuning or the query needs refinement.

Test step 3: Verify entity mapping. Check that the output columns match the entity mapping you will configure. If you want to map an Account entity, the query must output a column with the user principal name or account name.

Test step 4: Deploy in simulation mode. Set the rule to “Disabled” initially. Review the query results manually for one week. If the false positive rate is acceptable, enable the rule. If not, tune the query or threshold before enabling.


Additional production detection examples

Impossible travel detection (location-based):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// Detect sign-ins from geographically impossible locations within a short window
SigninLogs
| where TimeGenerated > ago(1h)
| where ResultType == "0"
| project TimeGenerated, UserPrincipalName, IPAddress, Location,
    Latitude = toreal(LocationDetails.geoCoordinates.latitude),
    Longitude = toreal(LocationDetails.geoCoordinates.longitude)
| sort by UserPrincipalName, TimeGenerated asc
| serialize
| extend PrevLat = prev(Latitude), PrevLon = prev(Longitude),
    PrevTime = prev(TimeGenerated), PrevUser = prev(UserPrincipalName)
| where UserPrincipalName == PrevUser
| extend Distance_km = geo_distance_2points(Longitude, Latitude, PrevLon, PrevLat) / 1000
| extend TimeDiff_hours = datetime_diff('minute', TimeGenerated, PrevTime) / 60.0
| extend Required_speed_kmh = Distance_km / TimeDiff_hours
| where Required_speed_kmh > 1000 and Distance_km > 500
| project TimeGenerated, UserPrincipalName, Location, IPAddress,
    Distance_km = round(Distance_km, 0),
    TimeDiff_hours = round(TimeDiff_hours, 2),
    Required_speed_kmh = round(Required_speed_kmh, 0)

This detects sign-ins that would require travelling faster than 1000 km/h — physically impossible. Adjusting the speed threshold accounts for VPN use (which can appear as different locations for the same user).

New admin role assignment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Detect when a user is assigned a privileged Entra ID role
AuditLogs
| where TimeGenerated > ago(15m)
| where OperationName has "Add member to role"
| where Result == "success"
| extend RoleName = tostring(TargetResources[0].modifiedProperties
    | mv-expand TargetResources[0].modifiedProperties
    | where tostring(TargetResources[0].modifiedProperties.displayName) == "Role.DisplayName"
    | project tostring(TargetResources[0].modifiedProperties.newValue))
| where RoleName has_any ("Global Administrator", "Exchange Administrator",
    "SharePoint Administrator", "Security Administrator",
    "Privileged Role Administrator", "Application Administrator")
| extend TargetUser = tostring(TargetResources[0].userPrincipalName)
| extend InitiatedBy = tostring(InitiatedBy.user.userPrincipalName)
| project TimeGenerated, InitiatedBy, TargetUser, RoleName, IPAddress

Privileged role assignment outside of a PIM workflow may indicate an attacker escalating privileges with a compromised admin account.


Rule naming conventions

Consistent naming enables quick identification in the Analytics blade, workbooks, and automation rule conditions.

Recommended format: [Priority]-[Attack Stage]-[Description]

Examples:

  • P1-InitialAccess-BruteForceSuccessFromExternalIP
  • P1-Persistence-InboxRuleForwardingToExternal
  • P2-LateralMovement-RDPFromUnusualSource
  • P2-PrivilegeEscalation-NewGlobalAdminAssignment
  • P3-Discovery-ADReconQueryBurst

The priority prefix (P1/P2/P3) enables automation rule conditions: “If analytics rule name starts with P1 → assign to senior analyst AND run containment playbook.” The attack stage prefix enables workbook grouping by kill chain phase.


Schedule optimisation: balancing detection latency with resource consumption

Every rule execution consumes workspace query resources. A workspace with 200 rules running every 5 minutes executes 2,400 queries per hour — which can cause query throttling and increased latency for all rules.

High-priority rules (P1): 5-minute schedule. Brute-force success, security log cleared, honeytoken activation, high-confidence TI match. These justify the resource cost for fast detection.

Standard rules (P2): 15-60 minute schedule. Inbox rule creation, suspicious process execution, unusual admin activity. These threats develop over minutes to hours — a 15-minute detection delay is acceptable.

Low-priority rules (P3): 1-4 hour schedule. Policy violations, posture deviations, informational anomalies. These are not time-critical and benefit from less frequent execution.

Daily rules: 24-hour schedule. Compliance checks, baseline reports, dormant account detection. Run once per day during off-peak hours to minimise resource impact.


Monitoring rule execution health

After deployment, monitor whether your rules are executing successfully.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Analytics rule execution health  last 7 days
SentinelHealth
| where TimeGenerated > ago(7d)
| where SentinelResourceType == "Analytic rule"
| summarize
    Executions = count(),
    Successes = countif(Status == "Success"),
    Failures = countif(Status != "Success")
    by SentinelResourceName
| extend SuccessRate = round(Successes * 100.0 / Executions, 1)
| where SuccessRate < 100
| order by SuccessRate asc

Rules with less than 100% success rate need investigation. Common failures: the KQL query times out (optimise the query or increase the timeout), the query references a table that does not exist or has no data (connector was disconnected), or a KQL syntax error was introduced during editing.

Query timeout handling. Scheduled rules have a maximum execution time. If your KQL query is complex (multiple joins, large time windows, many tables), it may time out. Optimise by: adding time filters as early as possible in the query (where TimeGenerated > ago(1h) as the first operator), reducing join complexity (materialise intermediate results with let statements), and narrowing the lookback window.


KQL query performance for analytics rules

Analytics rules run automatically — you cannot watch them execute and optimise in real time. Write efficient KQL from the start.

Performance pattern 1: Filter early. Place where TimeGenerated > ago(...) and other filters at the beginning of the query. This reduces the data volume that subsequent operators process.

Performance pattern 2: Materialise lookups. If your query joins with a watchlist or a small lookup table, materialise it first with let:

1
2
3
4
5
let ThreatIPs = materialize(_GetWatchlist('ThreatIPs')
    | project IPAddress = SearchKey);
SigninLogs
| where TimeGenerated > ago(15m)
| where IPAddress in (ThreatIPs)

The materialize function evaluates the watchlist once and caches the result — avoiding re-evaluation for each row in SigninLogs.

Performance pattern 3: Avoid unnecessary columns. Use project to select only the columns needed for the detection and entity mapping. Processing fewer columns = faster execution.

Performance pattern 4: Limit join cardinality. If joining two large tables, add filters to both sides before the join to reduce the number of rows being matched.


The starter rule library: 10 rules every Sentinel deployment needs

Deploy these 10 rules as the minimum viable detection library. They cover the most common attack techniques with well-tested KQL.

  1. Brute-force followed by success (T1110) — SigninLogs. Schedule: 15 min.
  2. Inbox rule forwarding to external (T1114.003) — CloudAppEvents. Schedule: 15 min.
  3. Security log cleared (T1070.001) — SecurityEvent. NRT rule.
  4. New Global Admin assignment (T1098.003) — AuditLogs. Schedule: 15 min.
  5. Suspicious PowerShell execution (T1059.001) — DeviceProcessEvents. Schedule: 15 min.
  6. Impossible travel (T1078) — SigninLogs. Schedule: 1 hour.
  7. Mass file download (T1530) — CloudAppEvents. Schedule: 1 hour.
  8. TI match on sign-in IP (Custom) — SigninLogs + Watchlist. Schedule: 15 min.
  9. OAuth consent grant to risky app (T1550.001) — AuditLogs. Schedule: 15 min.
  10. MFA fatigue/push spam (T1621) — SigninLogs. Schedule: 5 min.

Each rule maps to a specific MITRE technique, uses one of the five query patterns from earlier in this subsection, and includes entity mappings for Account and IP. Together, these 10 rules provide baseline detection coverage for the most common M365 attack vectors. Expand from this foundation using the detection engineering lifecycle (subsection 9.11).


Two of the starter rules deserve detailed KQL because they detect increasingly common attack techniques.

MFA fatigue / push spam detection (T1621):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Detect MFA push spam  many MFA challenges in short window
SigninLogs
| where TimeGenerated > ago(10m)
| where ResultType == "50074"  // MFA required
    or ResultType == "500121"  // MFA challenge failed
| summarize
    MFAChallenges = count(),
    UniqueApps = dcount(AppDisplayName),
    FirstChallenge = min(TimeGenerated),
    LastChallenge = max(TimeGenerated)
    by UserPrincipalName, IPAddress
| where MFAChallenges > 5
| extend DurationMinutes = datetime_diff('minute', LastChallenge, FirstChallenge)
| where DurationMinutes < 10  // 5+ challenges in under 10 minutes
// Check for subsequent successful MFA (user gave in to fatigue)
| join kind=leftouter (
    SigninLogs
    | where TimeGenerated > ago(10m)
    | where ResultType == "0"
    | where AuthenticationRequirement == "multiFactorAuthentication"
    | project SuccessTime = TimeGenerated, UserPrincipalName,
        SuccessIP = IPAddress, AppDisplayName
) on UserPrincipalName
| project UserPrincipalName, IPAddress, MFAChallenges,
    DurationMinutes, SuccessTime, SuccessIP, AppDisplayName

This detects the increasingly common MFA fatigue attack: an attacker with a valid password sends repeated MFA push notifications hoping the user will eventually approve out of frustration. The join checks whether the user subsequently approved — indicating successful compromise.

Suspicious OAuth consent grant (T1550.001):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Detect consent to applications requesting high-risk permissions
AuditLogs
| where TimeGenerated > ago(15m)
| where OperationName == "Consent to application"
| where Result == "success"
| extend AppName = tostring(TargetResources[0].displayName)
| extend ConsentedBy = tostring(InitiatedBy.user.userPrincipalName)
| extend Permissions = tostring(
    TargetResources[0].modifiedProperties
    | mv-expand TargetResources[0].modifiedProperties
    | where tostring(TargetResources[0].modifiedProperties.displayName)
        == "ConsentAction.Permissions"
    | project tostring(TargetResources[0].modifiedProperties.newValue))
| where Permissions has_any ("Mail.Read", "Mail.ReadWrite",
    "Files.ReadWrite.All", "Directory.ReadWrite.All",
    "User.ReadWrite.All", "MailboxSettings.ReadWrite")
| project TimeGenerated, ConsentedBy, AppName, Permissions,
    IPAddress = tostring(InitiatedBy.user.ipAddress)

OAuth consent phishing tricks users into granting a malicious application access to their mailbox, files, or directory. The rule detects consent grants for high-risk permissions — the same permissions an attacker needs for BEC and data exfiltration.


Rule deployment pre-flight checklist

Before enabling any new analytics rule, verify every item.

KQL validation: Query runs without errors in the Logs blade. Results contain the expected columns for entity mapping. No hardcoded values that should be parameters or watchlist lookups.

Schedule and lookback: Lookback >= schedule interval (no detection gaps). Schedule appropriate for detection urgency (P1: 5 min, P2: 15-60 min, P3: 1-4 hr).

Threshold: Tested against 7-30 days of historical data. Expected daily alert volume is manageable (< 10/day for most rules). Threshold set above the baseline noise level.

Entity mapping: At least Account and IP mapped (for identity detections). Entity column names match the KQL output exactly. Entity identifiers use the correct format (FullName for UPN, Address for IP).

Custom details: At least 3 context fields added. Dynamic alert title configured. Dynamic severity configured if applicable.

MITRE ATT&CK: At least one technique mapped. Tactic is appropriate for the detection stage.

Alert grouping: Entity-based grouping configured for rules that can fire multiple times per entity. Time window set appropriately.

Naming: Follows the [Priority]-[Stage]-[Description] convention. Name is searchable and unambiguous.

Documentation: Rule description explains what it detects, why it matters, and what the analyst should do. Version history tracked (in description or Git).

Try it yourself

Create a scheduled analytics rule using the brute-force detection query above. Set the schedule to every 15 minutes with a 1-hour lookback. Set the threshold to FailureThreshold = 5 (lower for a lab with limited activity). Map entities: IP (IPAddress), Account (CompromisedAccount). Enable alert grouping by IP entity. Run the rule and verify it executes without errors. If your lab has sign-in activity, generate a few failed sign-ins followed by a success to trigger the rule.

What you should observe

The rule appears in the Active rules list with status "Enabled." After the first execution cycle (15 minutes), check the rule's last run time and result count in the Analytics blade. If the rule matched events, an incident appears in the incident queue with the mapped entities (IP and Account) visible in the incident details.


Knowledge check

Check your understanding

1. Your scheduled rule runs every 5 minutes with a 1-hour lookback. What operational issue does this create?

Events in the 55-minute overlap window are evaluated multiple times, potentially generating duplicate alerts. Each execution evaluates the last hour of data, but the rule runs every 5 minutes — so an event from 30 minutes ago is evaluated 6 times before it exits the lookback window. If alert grouping is not configured (each event creates a separate alert), the same event can generate multiple incidents. Fix: set lookback equal to or slightly larger than the schedule interval (5-10 minutes for a 5-minute schedule), or use alert grouping to deduplicate.
A detection gap of 55 minutes
The rule costs too much to run
No issue — this is the recommended configuration