In this module

0.6 Lab Setup: Sample Data and Validation

45 minutes · Module 0 · Free
Operational Objective
This subsection covers lab setup: sample data and validation — a core operational skill for security teams working in Microsoft 365 environments. Every concept is demonstrated through practical scenarios from the Northgate Engineering environment.
Deliverable: Working proficiency with the techniques and operational patterns covered in this subsection.
Estimated completion: 25 minutes
OPERATIONAL FLOW Input Process Analyse Decide Output

Figure 0.6 — Operational workflow from input through documented output.

Figure — Lab Setup: Sample Data and Validation. Applied to security operations at Northgate Engineering.

Lab Setup: Sample Data and Validation

A lab environment with 6 test users and no activity generates minimal security data. To practice the investigation techniques in this course, you need realistic data. This subsection shows you how to generate it.

Generating sign-in data

The simplest way to populate your SigninLogs and AADNonInteractiveUserSignInLogs tables is to sign in as your test users.

Step 1: Open an InPrivate/Incognito browser window.

Step 2: Navigate to portal.office.com and sign in as j.morrison@yourdomain.onmicrosoft.com.

Expand for Deeper Context

Step 3: Browse Outlook, open a few emails (or send test emails between users), open a SharePoint site, send a Teams message. Each action generates audit log events.

Step 4: Repeat for 2-3 other test users over the course of a few days. This builds a baseline of normal activity that you can compare against during investigation exercises.

Generate "suspicious" activity for investigation practice

To simulate the anomalous sign-ins you will investigate in later modules, sign in as a test user from a different browser, a VPN connection (if you have one), or your mobile phone. This creates a sign-in from a different IP address and potentially different location — the pattern you will learn to detect in Module 1 and investigate in Module 12. Do NOT do this on a production tenant.

Installing the Microsoft 365 sample data packs

If you used the Instant Sandbox (Option 1), sample data is pre-installed. For Options 2 and 3, you can install sample data packs from the Developer Program dashboard, but this requires Developer Program membership.

Alternatively, generate your own:

Email data: Send 20-30 test emails between your users over several days. Include emails with links (to generate EmailUrlInfo data), emails with attachments (to generate EmailAttachmentInfo data), and external emails from your personal account (to generate external sender patterns).

Process data: If you onboard a device to Defender for Endpoint (covered in Module 4), every application you open, every PowerShell command you run, and every network connection generates DeviceProcessEvents and DeviceNetworkEvents. For lab purposes, a single Windows VM or your own workstation onboarded to the developer tenant provides rich data.

Content Hub solutions

Sentinel Content Hub provides pre-built detection rules, workbooks, and playbooks that populate your workspace with useful configurations.

Step 1: In Sentinel, navigate to Content management → Content Hub.

Step 2: Install these solutions (click each → Install):

  • Microsoft 365 — provides analytics rules for M365 threats
  • Microsoft Entra ID — provides sign-in analytics and workbooks
  • Microsoft Defender XDR — provides incident correlation rules
  • UEBA Essentials — provides behavioral analytics

These do not generate data, but they give you pre-built analytics rules and workbooks that you will explore and modify in Modules 7 and 9.

Final validation

Run these queries to confirm your lab environment is ready:

Check available tables:

search *
| where TimeGenerated > ago(7d)
| summarize EventCount = count(), LastEvent = max(TimeGenerated) by Type
| where EventCount > 0
| sort by EventCount desc
SigninLogs
| where TimeGenerated > ago(7d)
| take 5
| project TimeGenerated, UserPrincipalName, IPAddress, Location, ResultType
EmailEvents
| where TimeGenerated > ago(7d)
| take 5
| project TimeGenerated, SenderFromAddress, RecipientEmailAddress, Subject, DeliveryAction
Expand for Deeper Context

You should see at minimum: SigninLogs (or AADSignInEventsBeta), AuditLogs, and some Defender tables if the connector has been active for a few days.

Check sign-in data:

If this returns results, your Entra ID connector is working and you have sign-in data to investigate.

Check email data (if you generated test emails):

Your lab data will be sparse — and that is fine

A 6-user lab will never look like a 500-user production environment. Some course exercises show Expected Output blocks with hundreds of results — your lab may return 5 or 10. The learning is in the query construction and the analysis methodology, not in the volume of data. The course tells you what to look for in each Expected Output block regardless of your result count.

Environment summary

At this point, your lab environment includes:

ComponentStatus
M365 E5 tenantActive with 6+ test users
Microsoft Defender XDR portalAccessible at security.microsoft.com
Microsoft Entra admin centerAccessible at entra.microsoft.com
Azure subscriptionActive with free credit
Log Analytics workspaceCreated and configured
Microsoft SentinelEnabled on workspace
Defender XDR data connectorConnected with event tables
Entra ID data connectorConnected for sign-in and audit logs
Content Hub solutionsInstalled (M365, Entra, Defender XDR, UEBA)
Test users6+ users with E5 licenses and some activity

You are ready for Module 1. Every hands-on exercise in the course builds on this environment.

Detection depth: NE-specific implementation

This detection rule addresses a technique that directly threatens NE's operational environment. The implementation accounts for NE's specific infrastructure characteristics:

Telemetry source: The primary data table for this detection ingests approximately 0.5-3.2 GB/day depending on the activity volume. At NE's scale (810 users, 865 devices, 42 servers), the event volume generates a stable baseline that statistical detection methods (percentile analysis from DE9.4) can reliably characterize. Deviations from this baseline represent either environmental changes (new applications, infrastructure modifications) or attacker activity.

Threshold calibration: The threshold was selected using the percentile method: P99 of 30-day historical data establishes the upper bound of normal activity. The production threshold is set at 1.5x P99 to provide margin above normal fluctuation while maintaining detection sensitivity for attack patterns that typically generate 5-50x normal volume.

Expand for Deeper Context

False positive profile: The primary FP sources for this detection include: IT administrative activity (legitimate but anomalous-looking operations), automated tools and scripts (scheduled tasks, monitoring agents), and business events (quarterly reporting, annual audits, project deadlines). Each FP source is addressed through the watchlist architecture (DE9.6) — Corporate IPs (WL1), Service Accounts (WL2), IT Admin Accounts (WL3), and Known Applications (WL4) provide systematic exclusion without reducing the rule's detection scope below acceptable levels.

Attack chain integration: This detection maps to one or more of the 6 NE attack chains (CHAIN-HARVEST, CHAIN-MESH, CHAIN-ENDPOINT, CHAIN-FACTORY, CHAIN-PRIVILEGE, CHAIN-DRIFT). When this rule fires, the SOC analyst correlates with adjacent-phase alerts to determine whether the activity is isolated or part of a multi-phase attack. The correlation query from this module's cross-technique subsection provides the KQL pattern for this analysis.

Response procedure: On alert, the analyst: (1) checks the entity against the watchlists — is this a known benign source? (2) checks for correlated alerts from adjacent kill chain phases within 60 minutes, (3) classifies as TP/FP/BTP using the DE9.5 decision tree, and (4) escalates to Rachel if the alert correlates with other phases (potential active attack chain).

Validating the data pipeline

After loading sample data and enabling connectors, validate that data is flowing correctly before starting Module 1. Run these three verification queries in the Sentinel Logs blade:

The first query checks that sign-in data is present: query SigninLogs with a count for the last 24 hours. If the result is zero, the Entra ID connector is not configured correctly or has not had time to ingest (allow 15-30 minutes after connector enablement).

The second query checks device data: query DeviceProcessEvents for the last 24 hours. This table requires Defender for Endpoint onboarding — if no devices are onboarded in the developer tenant, this table will be empty. Onboard at least one device (a test VM or your workstation connected to the developer tenant) to populate this table.

Expand for Deeper Context

The third query checks email data: query OfficeActivity for Exchange events in the last 24 hours. This requires the Microsoft 365 connector and at least one mailbox in the developer tenant.

If all three queries return results, your lab environment is ready for Module 1. If any return zero, troubleshoot that specific connector before proceeding — investigating with missing data sources produces incomplete results and builds wrong habits about which tables to check during an investigation.

The lab environment configuration decisions you make here directly affect the quality of your learning experience in every subsequent module. Invest the setup time now — a properly configured lab with validated data pipelines means every exercise in the course returns real results. A misconfigured lab means debugging infrastructure instead of learning security operations. Verify each connector before moving to the next module.

Synthetic data limitations

Sample data packs provide realistic-looking user accounts, email activity, and device events. They do not provide: realistic alert volumes (sample environments generate fewer alerts than production), realistic baseline patterns (800 synthetic users do not behave like 800 real employees), or realistic false positive ratios (sample data is cleaner than production data). These limitations mean that detection rules tuned in the lab environment will require re-tuning when deployed to production.

The NE lab data — available as an alternative through the data generator in tools/lab-data-generator — addresses these limitations by providing 90 days of statistically realistic security data with embedded attack chains. The interactive labs embedded in each module provide the third option: practice investigation and triage decisions directly in the browser without any external data source. All three approaches work. The course content shows the query and the expected output regardless of which data source you use.

The three data paths

This course supports three data paths, each with trade-offs:

Path 1 — Interactive labs (zero setup). The embedded labs in each module provide scenario-based practice directly in your browser. Parameter sandboxes let you tune detection thresholds. Alert simulators present triage queues. Investigation engines walk you through multi-step investigations. This path requires no external environment and is sufficient to complete the course.

Path 2 — Developer tenant with sample data (30 minutes setup). The M365 E5 developer tenant with sample data packs provides a full Sentinel workspace where you can write and execute queries. The data is synthetic but structurally identical to production data. This path provides the query execution experience — you write KQL and see results.

Expand for Deeper Context

Path 3 — Production environment (existing access required). If you have Sentinel access in your day job, run every course query against your real data. Replace the NE user names and IP addresses with your environment's equivalents. This path provides the highest learning value because the results are operationally meaningful — you discover real findings in your own environment while learning the investigation technique.

Most learners use Path 1 for modules where the interactive lab covers the concept, and Path 2 or 3 for modules where they want to explore the data further. The course content shows every query with its expected output regardless of which path you choose.

Compliance Myth: "Default security settings are sufficient"

The myth: Default security settings are sufficient

The reality: Microsoft's security defaults provide a baseline — MFA for admins, blocking legacy authentication. But defaults do not configure: conditional access policies tailored to your risk profile, Defender for Office 365 anti-phishing policies for your specific impersonation targets, custom detection rules for your environment, or data loss prevention policies for your sensitive data. Defaults prevent the easiest attacks. Custom configuration prevents the attacks targeting YOUR organization.

You load the sample data pack into your Sentinel workspace. You run a KQL query from Module 6 and get zero results, even though the query works in the course screenshots. What troubleshooting steps should you follow?
The sample data is outdated — request an updated pack.
Three checks in order: (1) Verify the table exists — run the table name alone (e.g., SigninLogs | take 1). If "table not found," the data connector or ingestion is not configured. (2) Check the time range — the query may use "ago(24h)" but the sample data may have been ingested more than 24 hours ago. Expand to "ago(30d)" to verify data exists. (3) Check the filter values — the query may filter on a specific username or IP that exists in the course's NE data but not in your sample data. Replace the specific values with a broader filter to confirm the query logic works, then adapt to your sample data's values.
KQL queries are workspace-specific — the course queries only work in the author's workspace.
Clear the browser cache and retry — Sentinel caches query results.
Decision point

You manage NE's M365 security stack. Microsoft releases a new Defender feature in preview. The feature promises to reduce AiTM risk by 80%. Do you enable it immediately?

Not in production. Enable in a test tenant or for a pilot group first. Preview features may: change behavior before GA, have undocumented interactions with existing CA policies, or produce unexpected results in specific tenant configurations. The deployment sequence: (1) enable in a test tenant and validate against NE's CA policy set, (2) enable for a pilot group of 10 users for 2 weeks, (3) monitor for FPs and operational impact, (4) roll out to all users after successful pilot. Microsoft's '80% reduction' claim is based on their telemetry across all tenants — NE's specific configuration may produce different results.

Try it: Validate your lab environment

Complete the lab setup steps described in this sub. Verify: (1) you can sign in to your M365 tenant, (2) the Sentinel workspace is accessible, (3) at least one data connector shows 'Connected' status, and (4) a test KQL query returns results. Screenshot the successful query result — this confirms your lab is ready for the course exercises.

You've set up your M365 tenant and learned the Defender XDR unified portal.

Module 0 got your M365 developer tenant configured with sample data. Module 1 took you through the Defender XDR unified incident queue across endpoint, email, identity, and cloud apps. Now you investigate every major M365 attack type and deploy the detections that catch them next time.

  • 15 investigation and configuration modules — Defender for Endpoint, Purview, Defender for Cloud, Security Copilot, Sentinel workspace design, log ingestion, analytics rules, and threat hunting
  • 5 named attack investigations — AiTM credential phishing, BEC and financial fraud, consent phishing and OAuth grant abuse, token replay and session hijacking, insider threat
  • KQL from fundamentals through advanced hunting — dedicated modules on query language, cross-table joins, statistical analysis, and threat hunting queries
  • SC-200 exam objectives fully covered — every module maps to the January 2026 SC-200 update. The certification is the side effect of operational competence, not the goal
  • Production artefacts per module — detection rules, investigation playbooks, and hardening checklists you deploy to your own tenant
Unlock the full course with Premium See Full Syllabus