In this module
0.6 Lab Setup: Sample Data and Validation
Figure 0.6 — Operational workflow from input through documented output.
Figure — Lab Setup: Sample Data and Validation. Applied to security operations at Northgate Engineering.
Lab Setup: Sample Data and Validation
A lab environment with 6 test users and no activity generates minimal security data. To practice the investigation techniques in this course, you need realistic data. This subsection shows you how to generate it.
Generating sign-in data
The simplest way to populate your SigninLogs and AADNonInteractiveUserSignInLogs tables is to sign in as your test users.
Step 1: Open an InPrivate/Incognito browser window.
Step 2: Navigate to portal.office.com and sign in as j.morrison@yourdomain.onmicrosoft.com.
Installing the Microsoft 365 sample data packs
If you used the Instant Sandbox (Option 1), sample data is pre-installed. For Options 2 and 3, you can install sample data packs from the Developer Program dashboard, but this requires Developer Program membership.
Alternatively, generate your own:
Email data: Send 20-30 test emails between your users over several days. Include emails with links (to generate EmailUrlInfo data), emails with attachments (to generate EmailAttachmentInfo data), and external emails from your personal account (to generate external sender patterns).
Process data: If you onboard a device to Defender for Endpoint (covered in Module 4), every application you open, every PowerShell command you run, and every network connection generates DeviceProcessEvents and DeviceNetworkEvents. For lab purposes, a single Windows VM or your own workstation onboarded to the developer tenant provides rich data.
Content Hub solutions
Sentinel Content Hub provides pre-built detection rules, workbooks, and playbooks that populate your workspace with useful configurations.
Step 1: In Sentinel, navigate to Content management → Content Hub.
Step 2: Install these solutions (click each → Install):
- Microsoft 365 — provides analytics rules for M365 threats
- Microsoft Entra ID — provides sign-in analytics and workbooks
- Microsoft Defender XDR — provides incident correlation rules
- UEBA Essentials — provides behavioral analytics
These do not generate data, but they give you pre-built analytics rules and workbooks that you will explore and modify in Modules 7 and 9.
Final validation
Run these queries to confirm your lab environment is ready:
Check available tables:
search *
| where TimeGenerated > ago(7d)
| summarize EventCount = count(), LastEvent = max(TimeGenerated) by Type
| where EventCount > 0
| sort by EventCount descSigninLogs
| where TimeGenerated > ago(7d)
| take 5
| project TimeGenerated, UserPrincipalName, IPAddress, Location, ResultTypeEmailEvents
| where TimeGenerated > ago(7d)
| take 5
| project TimeGenerated, SenderFromAddress, RecipientEmailAddress, Subject, DeliveryActionEnvironment summary
At this point, your lab environment includes:
| Component | Status |
|---|---|
| M365 E5 tenant | Active with 6+ test users |
| Microsoft Defender XDR portal | Accessible at security.microsoft.com |
| Microsoft Entra admin center | Accessible at entra.microsoft.com |
| Azure subscription | Active with free credit |
| Log Analytics workspace | Created and configured |
| Microsoft Sentinel | Enabled on workspace |
| Defender XDR data connector | Connected with event tables |
| Entra ID data connector | Connected for sign-in and audit logs |
| Content Hub solutions | Installed (M365, Entra, Defender XDR, UEBA) |
| Test users | 6+ users with E5 licenses and some activity |
You are ready for Module 1. Every hands-on exercise in the course builds on this environment.
Detection depth: NE-specific implementation
This detection rule addresses a technique that directly threatens NE's operational environment. The implementation accounts for NE's specific infrastructure characteristics:
Telemetry source: The primary data table for this detection ingests approximately 0.5-3.2 GB/day depending on the activity volume. At NE's scale (810 users, 865 devices, 42 servers), the event volume generates a stable baseline that statistical detection methods (percentile analysis from DE9.4) can reliably characterize. Deviations from this baseline represent either environmental changes (new applications, infrastructure modifications) or attacker activity.
Threshold calibration: The threshold was selected using the percentile method: P99 of 30-day historical data establishes the upper bound of normal activity. The production threshold is set at 1.5x P99 to provide margin above normal fluctuation while maintaining detection sensitivity for attack patterns that typically generate 5-50x normal volume.
Validating the data pipeline
After loading sample data and enabling connectors, validate that data is flowing correctly before starting Module 1. Run these three verification queries in the Sentinel Logs blade:
The first query checks that sign-in data is present: query SigninLogs with a count for the last 24 hours. If the result is zero, the Entra ID connector is not configured correctly or has not had time to ingest (allow 15-30 minutes after connector enablement).
The second query checks device data: query DeviceProcessEvents for the last 24 hours. This table requires Defender for Endpoint onboarding — if no devices are onboarded in the developer tenant, this table will be empty. Onboard at least one device (a test VM or your workstation connected to the developer tenant) to populate this table.
Synthetic data limitations
Sample data packs provide realistic-looking user accounts, email activity, and device events. They do not provide: realistic alert volumes (sample environments generate fewer alerts than production), realistic baseline patterns (800 synthetic users do not behave like 800 real employees), or realistic false positive ratios (sample data is cleaner than production data). These limitations mean that detection rules tuned in the lab environment will require re-tuning when deployed to production.
The NE lab data — available as an alternative through the data generator in tools/lab-data-generator — addresses these limitations by providing 90 days of statistically realistic security data with embedded attack chains. The interactive labs embedded in each module provide the third option: practice investigation and triage decisions directly in the browser without any external data source. All three approaches work. The course content shows the query and the expected output regardless of which data source you use.
The three data paths
This course supports three data paths, each with trade-offs:
Path 1 — Interactive labs (zero setup). The embedded labs in each module provide scenario-based practice directly in your browser. Parameter sandboxes let you tune detection thresholds. Alert simulators present triage queues. Investigation engines walk you through multi-step investigations. This path requires no external environment and is sufficient to complete the course.
Path 2 — Developer tenant with sample data (30 minutes setup). The M365 E5 developer tenant with sample data packs provides a full Sentinel workspace where you can write and execute queries. The data is synthetic but structurally identical to production data. This path provides the query execution experience — you write KQL and see results.
The myth: Default security settings are sufficient
The reality: Microsoft's security defaults provide a baseline — MFA for admins, blocking legacy authentication. But defaults do not configure: conditional access policies tailored to your risk profile, Defender for Office 365 anti-phishing policies for your specific impersonation targets, custom detection rules for your environment, or data loss prevention policies for your sensitive data. Defaults prevent the easiest attacks. Custom configuration prevents the attacks targeting YOUR organization.
You manage NE's M365 security stack. Microsoft releases a new Defender feature in preview. The feature promises to reduce AiTM risk by 80%. Do you enable it immediately?
Not in production. Enable in a test tenant or for a pilot group first. Preview features may: change behavior before GA, have undocumented interactions with existing CA policies, or produce unexpected results in specific tenant configurations. The deployment sequence: (1) enable in a test tenant and validate against NE's CA policy set, (2) enable for a pilot group of 10 users for 2 weeks, (3) monitor for FPs and operational impact, (4) roll out to all users after successful pilot. Microsoft's '80% reduction' claim is based on their telemetry across all tenants — NE's specific configuration may produce different results.
Try it: Validate your lab environment
Complete the lab setup steps described in this sub. Verify: (1) you can sign in to your M365 tenant, (2) the Sentinel workspace is accessible, (3) at least one data connector shows 'Connected' status, and (4) a test KQL query returns results. Screenshot the successful query result — this confirms your lab is ready for the course exercises.
You've set up your M365 tenant and learned the Defender XDR unified portal.
Module 0 got your M365 developer tenant configured with sample data. Module 1 took you through the Defender XDR unified incident queue across endpoint, email, identity, and cloud apps. Now you investigate every major M365 attack type and deploy the detections that catch them next time.
- 15 investigation and configuration modules — Defender for Endpoint, Purview, Defender for Cloud, Security Copilot, Sentinel workspace design, log ingestion, analytics rules, and threat hunting
- 5 named attack investigations — AiTM credential phishing, BEC and financial fraud, consent phishing and OAuth grant abuse, token replay and session hijacking, insider threat
- KQL from fundamentals through advanced hunting — dedicated modules on query language, cross-table joins, statistical analysis, and threat hunting queries
- SC-200 exam objectives fully covered — every module maps to the January 2026 SC-200 update. The certification is the side effect of operational competence, not the goal
- Production artefacts per module — detection rules, investigation playbooks, and hardening checklists you deploy to your own tenant