DE1.12 Common Architecture Mistakes and Anti-Patterns

2-3 hours · Module 1 · Free

Operational Objective

The Anti-Pattern Catalog: Detection rules that work perfectly in Advanced Hunting can fail operationally when deployed as analytics rules. These failures are not KQL errors — the queries are technically correct. They are architectural decisions that produce operational failures: missed detections, false positive floods, broken correlation, and silent rule failures. This subsection catalogs the 8 most common architectural mistakes, explains why they happen, and shows how the rule specification template prevents each one.

Deliverable: Recognition of the 8 most common architecture anti-patterns and the specification section that prevents each.

⏱ Estimated completion: 25 minutes

The eight mistakes

These mistakes recur across organizations and detection engineering teams regardless of experience level. They are not obvious from the KQL — a query can be technically perfect and still produce every one of these failures. The specification template (DE1.11) is designed to prevent them by requiring explicit decisions for each architectural dimension.

Figure DE1.12 — Eight architecture anti-patterns grouped by impact. Each maps to a specific section of the rule specification template that prevents it.

Mistake 1: Lookback shorter than frequency. Creates permanent detection gaps. Covered in DE1.2. Prevention: Spec Section 7 requires explicit lookback and frequency values with the rule: lookback ≥ frequency.

Mistake 2: Missing entity mapping. Produces isolated alerts that cannot correlate. Covered in DE1.4. Prevention: Spec Section 6 requires explicit entity mapping for every identifiable entity in the query output.

Mistake 3: Arbitrary severity assignment. All rules default to Medium because the engineer did not assess confidence × impact. The SOC treats all alerts identically — no prioritization. Prevention: Spec Section 8 requires explicit confidence, impact, and rationale.

Mistake 4: No MITRE ATT&CK mapping. Rules cannot contribute to coverage measurement. Gaps remain invisible. Prevention: Spec Section 3 requires at least one technique ID at sub-technique level.

Mistake 5: All rules at 5-minute frequency. Wastes compute on low-severity rules. May exceed workspace query limits, causing rules to silently fail. Prevention: Spec Section 7 requires frequency rationale aligned to severity (DE1.3).

Mistake 6: No false positive analysis before deployment. The rule hits production and generates 50 FPs per day. Analysts learn to ignore it — creating Layer 3 detection failure (DE0.1). Prevention: Spec Section 10 requires pre-deployment FP estimation from historical data.

Mistake 7: One-per-alert grouping on high-volume rules. A spray rule creates 500 incidents per day. The SOC queue becomes unusable. Prevention: Spec Section 9 requires explicit grouping strategy selection with expected incident volume estimate.

Mistake 8: Automating response on low-confidence rules. A rule with 60% TP rate automatically disables accounts. 40% of disabled accounts are legitimate users. The helpdesk is overwhelmed, IT loses trust in the security team. Prevention: Spec Sections 10 and 11 require TP rate assessment and automation tier assignment.

Worked example: before and after specification

To make these anti-patterns concrete, consider NE’s existing “Suspicious inbox rule creation” template rule — one of the 23 rules deployed in 2023.

Before (current state — 5 anti-patterns):

The rule is a Microsoft template enabled as-is. Query: fires on all New-InboxRule events where the action is “ForwardTo” or “RedirectTo.” Lookback: 5 hours (template default). Frequency: 5 hours (template default — lookback equals frequency, zero overlap). Entity mapping: none configured. Severity: Medium (template default). ATT&CK mapping: T1137 (tactic-level only, no sub-technique). Grouping: one alert per event. Automation: none. FP analysis: none performed.

Anti-patterns present: #1 variant (lookback = frequency, zero overlap — ingestion-delayed events at window boundaries may be missed). #2 (no entity mapping — alerts do not correlate with sign-in alerts or other identity-layer detections). #4 variant (ATT&CK mapped to technique level only, not sub-technique — T1137 instead of T1137.005). #6 (no FP analysis — the rule fires on every legitimate forwarding rule in the organization, approximately 3-5 per day). #7 (one-per-alert grouping — 3-5 incidents per day from legitimate inbox rules, each requiring manual dismissal).

Operational cost of the anti-patterns: 3-5 false positive incidents per day × 15 minutes analyst investigation time per incident = 45-75 minutes of wasted analyst time per day. Over 30 days: 22-37 hours of analyst time spent on a rule that produces zero true positive value in its current form. Additionally, the rule’s reputation with the SOC analysts is destroyed — they close these alerts reflexively, which means the one time the rule fires on an actual attacker-created forwarding rule (CHAIN-HARVEST Phase 3), the analyst closes it without investigation. Layer 3 detection failure.

After (re-engineered with specification):

Section 2 — Hypothesis: “Inbox rule creation from a session with identity risk signals (medium or high risk sign-in within the preceding 60 minutes) indicates attacker persistence, regardless of the rule’s action type (forward, redirect, move, delete).”

Section 5 — KQL: Joins OfficeActivity (inbox rule creation) with SigninLogs (risk signals) on UserPrincipalName within a 60-minute time window. Filters for ALL inbox rule action types — not just forwarding. The risk signal correlation eliminates FPs from legitimate inbox rule creation.

Section 6 — Entity mapping: Account → UserPrincipalName, IP → ClientIPAddress.

Section 7 — Frequency: 15 minutes (High severity). Lookback: 65 minutes (correlation window + 5-minute buffer). Overlap: 50 minutes.

Section 8 — Severity: High (high confidence when correlated with risk signal + significant impact — persistence mechanism for account compromise).

Section 9 — Grouping: by Account entity. All inbox rule alerts for the same user group into one incident.

Section 10 — FP analysis: Historical data test over 30 days shows 2 alerts. Both correlated with legitimate MFA setup scenarios (user changed phone, triggered medium-risk sign-in, then created a legitimate inbox rule). Mitigation: add a 10-minute delay between the risk signal and the inbox rule to filter MFA setup scenarios (users do not create inbox rules within 10 minutes of changing their MFA method). Post-mitigation estimated FP rate: <10%.

Operational result: From 3-5 false positive incidents per day (45-75 min wasted daily) to approximately 2 alerts per month, both requiring investigation (likely true positives). The SOC trusts the rule. When it fires, they investigate.

The cost of not specifying

The cost of architectural mistakes is not theoretical. Each anti-pattern has a measurable operational cost:

Missed detections (#1, #4, #6): The CHAIN-HARVEST walkthrough (DE0.2) demonstrated the ultimate cost — a 4-hour undetected attack resulting in BEC wire fraud. If any of the 5 detection rules had existed with correct architecture, the attack would have been detected within minutes of Phase 1. The lookback gap (#1) means some events are permanently missed. The absent ATT&CK mapping (#4) means the gap is invisible to coverage reporting — nobody knows the rule is missing. The absent FP analysis (#6) means rules that exist are ignored (Layer 3 failure).

SOC overload (#3, #5, #7): NE’s L1 analysts (Tom and Priya) handle approximately 40-60 incidents per day across all rules. If 5 rules each generate 5 false positive incidents per day (due to arbitrary severity, excessive frequency, and one-per-alert grouping), 25 incidents are wasted — 42-62% of the daily volume is noise. The analysts develop coping strategies: close Medium alerts without investigation, batch-dismiss known noisy rules, and prioritize by rule name rather than severity. These coping strategies are rational responses to an irrational rule architecture — but they mean legitimate Medium-severity alerts are also closed without investigation.

Broken correlation (#2): Without entity mapping, CHAIN-HARVEST appears as 3 separate incidents (spray, token theft, inbox rule) rather than 1 correlated incident. The analyst investigating the spray does not see the token theft. The analyst dismissing the inbox rule alert does not know it is connected to the spray. The attacker’s complete activity is visible in the data but invisible in the incident queue because the alerts cannot link.

Trust damage (#8): When automation disables a legitimate user’s account, the impact extends beyond the helpdesk ticket. The affected user tells their team. The team tells their manager. The manager complains to IT. IT complains to the CISO. The CISO asks the detection engineer to turn off the automation. One false positive in an automated containment rule can set the automation program back months.

The NE anti-pattern audit

Northgate Engineering’s 23 existing rules exhibit several of these anti-patterns. This is typical for a Level 1 maturity organization — the rules were created without a specification discipline.

Current state of NE’s rules:

8 rules have lookback = frequency (zero overlap, potential ingestion-delay gaps — Mistake #1 variant)
14 rules have no entity mapping (65% of rules produce isolated alerts — Mistake #2)
19 rules are Medium severity (82% — Mistake #3)
9 rules have no ATT&CK mapping (39% invisible to coverage reporting — Mistake #4)
All 23 rules run at their template default frequency (Mistake #5 variant — not optimized for severity)
0 rules have documented FP analysis (100% — Mistake #6)
Template grouping configurations unchanged (various — not assessed against detection patterns)

The detection engineering program (DE3-DE11) replaces these rules with specification-driven rules that avoid all 8 anti-patterns. The existing rules are not deleted — they are reviewed, and those worth keeping are upgraded to meet the specification standard.

⚠ Compliance Myth: "If the KQL query works in Advanced Hunting, the analytics rule will work the same way"

The myth: Testing a query in Advanced Hunting validates the rule for production deployment. If the query returns correct results there, it will produce correct operational behavior as an analytics rule.

The reality: Advanced Hunting runs the query once against all available data with full KQL capability and no time boundary. An analytics rule runs the query on a schedule with a time-bounded lookback, entity mapping, trigger thresholds, severity assignment, and alert grouping. A query that returns 200 correct results in Advanced Hunting might: return 0 results in the analytics rule (lookback too short for the pattern — Mistake #1), generate 200 separate incidents (Mistake #7), produce alerts without entities (Mistake #2), or run successfully for 3 days then start generating FPs as new user behavior enters the baseline (Mistake #6). Advanced Hunting validates the QUERY LOGIC. The specification validates the OPERATIONAL ARCHITECTURE. Both are required.

Try it yourself

Exercise: Anti-pattern audit on your existing rules

Review your 5 most recently created analytics rules against the 8 anti-patterns. Score each rule: how many anti-patterns does it exhibit? The most common findings will be: no entity mapping (#2), arbitrary severity (#3), no FP analysis (#6). If a rule exhibits 3+ anti-patterns, it is a candidate for re-engineering using the specification template.

Check your understanding

A detection rule for "new inbox rule creation" runs every 30 minutes with a 30-minute lookback. It has no entity mapping, Medium severity, no ATT&CK mapping, and triggers one alert per event. In a typical week, it fires 140 times (20 per day). How many of the 8 anti-patterns does this rule exhibit?

Answer: Five anti-patterns. #1 variant: lookback equals frequency (no overlap — ingestion-delayed events may be missed). #2: no entity mapping (alerts cannot correlate with sign-in or identity alerts). #3: Medium severity without rationale (inbox rule creation may warrant High if correlated with risk signals). #4: no ATT&CK mapping (should map to T1137.005 Office Application Startup: Outlook Rules). #7: one-per-alert grouping on a rule that fires 20 times per day (the SOC sees 20 separate incidents instead of grouping by Account entity). The fix: complete the specification template, which addresses all five gaps in sections 7, 6, 8, 3, and 9 respectively.

Troubleshooting: “How do I fix existing rules?”

Prioritize by impact. Not all anti-patterns have equal operational cost. Fix entity mapping (#2) first — it is the highest-impact improvement because it enables incident correlation across all rules. Fix FP analysis (#6) second — identify the noisiest rules and tune them. Fix severity (#3) third — redistribute the SOC’s triage priorities. The remaining patterns improve efficiency but are less operationally urgent.

Do not rewrite all rules at once. Fix 2-3 rules per week during the monthly tuning cadence (DE9). Each fix takes 15-30 minutes — review the rule against the specification template, fill in the missing sections, and update the rule configuration. Over 3 months, all 23 NE rules are upgraded without disrupting SOC operations.

📋 Operational Artifact — Anti-Pattern Audit Scorecard

Rule name: ___
Anti-pattern check:
#1 Lookback ≥ frequency + buffer
#2 Entity mapping configured (Account, Host, IP)
#3 Severity assigned with confidence × impact rationale
#4 MITRE ATT&CK technique mapped (sub-technique level)
#5 Frequency aligned to severity tier
#6 FP analysis completed (estimated FP rate: ___%)
#7 Grouping strategy matches detection pattern
#8 Automation tier matches TP rate
Anti-patterns found: ___ of 8 | Priority fix: ___

References used in this subsection

Course cross-references: DE1.2-DE1.11 (each anti-pattern references its prevention subsection), DE9 (tuning cadence for fixing existing rules), DE10 (specification as governance standard)

Detecting common rule mistakes in your workspace

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// RULE HEALTH CHECK: Identify rules with common configuration problems
SecurityAlert
| where TimeGenerated > ago(30d)
| summarize
    AlertCount = count(),
    EmptyEntities = countif(Entities == "[]" or isempty(Entities)),
    NoClassification = countif(isempty(Classification))
    by AlertName
| extend EntityMappingGap = round(EmptyEntities * 100.0 / AlertCount, 1)
| where EntityMappingGap > 50 or AlertCount == 0
// Rules with >50% empty entities have broken entity mapping
// Rules with 0 alerts in 30 days may be misconfigured

You're reading the free modules of Detection Engineering

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus

← DE1.11 The Rule Specification Template DE1.13 Module Summary →