In this module

EI1.11 Building a Sign-In Baseline

60-80 minutes ยท Module 1 ยท Free
Operational Objective
How do you know if a sign-in is anomalous if you do not know what normal looks like? A sign-in from Brazil is suspicious for a UK-only company but routine for a company with a Sรฃo Paulo office. A sign-in at 3 AM is suspicious for a 9-to-5 office worker but normal for a shift-based SOC analyst. Anomaly detection is only as good as the baseline it measures against. This subsection teaches you to build a documented, queryable baseline of normal sign-in behavior for your environment.
Deliverable: A comprehensive sign-in baseline for your environment covering geographic patterns, temporal patterns, device patterns, application usage, and authentication method distribution โ€” stored as queries that can be re-run to detect drift and compared against current activity for anomaly identification.
โฑ Estimated completion: 18 minutes
HARDENING LIFECYCLE Assess Benchmark Remediate Validate Monitor

Figure EI1.11 โ€” Security hardening lifecycle from assessment through continuous monitoring.

Figure โ€” Building a Sign-In Baseline.

Why baselines matter more than rules

Static detection rules use fixed thresholds: "alert if more than 10 failed sign-ins in 5 minutes" or "alert if sign-in from a blocked country." These rules catch known patterns but miss subtle anomalies โ€” a sign-in from Belgium is not from a "blocked country" but may be anomalous if your company has no employees in Belgium.

Baseline-driven detection compares current behavior against established normal patterns. Instead of "alert on sign-in from Russia" (which misses Belgium), the approach is "alert on sign-in from any country this user has never signed in from in the past 30 days." Instead of "alert on more than 10 failed sign-ins" (which misses a slow spray at 5 per hour), the approach is "alert when the hourly failure rate exceeds 200% of the historical average."

Building the baseline is the prerequisite. Without it, you have rules. With it, you have detection.

The baseline dimensions

A complete sign-in baseline covers five dimensions:

Geographic baseline โ€” which countries and cities do your users sign in from? What is the expected set? This baseline enables anomaly detection for unexpected locations and impossible travel.

// EI1.11 โ€” Geographic baseline: normal countries per user
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| where isnotempty(Country)
| summarize 
    Countries = make_set(Country),
    CountryCount = dcount(Country),
    SignInCount = count()
    by UserPrincipalName
| order by CountryCount desc
// Save this result. Any user who later signs in from a country
// not in their baseline set triggers an investigation
// EI1.11 โ€” Temporal baseline: sign-in hours per user
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| extend HourOfDay = hourofday(TimeGenerated)
| extend DayOfWeek = dayofweek(TimeGenerated) / 1d  // 0=Sun, 6=Sat
| extend IsWeekend = DayOfWeek in (0, 6)
| summarize 
    WeekdayHours = make_set_if(HourOfDay, not(IsWeekend)),
    WeekendActivity = countif(IsWeekend),
    TotalSignIns = count()
    by UserPrincipalName
| extend EarliestNormalHour = array_sort_asc(WeekdayHours)[0]
| extend LatestNormalHour = array_sort_desc(WeekdayHours)[0]
// Users who normally sign in 8-18 on weekdays but suddenly show
// a 3 AM weekend sign-in warrant investigation
// EI1.11 โ€” Device baseline: normal devices per user
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| extend DeviceOS = tostring(DeviceDetail.operatingSystem)
| extend DeviceId = tostring(DeviceDetail.deviceId)
| summarize 
    DeviceOSes = make_set(DeviceOS),
    DeviceCount = dcount(DeviceId),
    MostUsedOS = arg_max(count(), DeviceOS)
    by UserPrincipalName
// A user who normally uses Windows 11 appearing on Linux = anomaly
// A user who normally has 1-2 devices appearing with 5 = anomaly
// EI1.11 โ€” Application baseline: normal apps per user
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize 
    Apps = make_set(AppDisplayName),
    AppCount = dcount(AppDisplayName)
    by UserPrincipalName
| order by AppCount desc
// A finance user who normally uses Outlook, Teams, and SharePoint
// suddenly accessing "Azure Portal" or "Microsoft Graph Explorer"
// warrants investigation โ€” these are tools used for administration,
// not typical finance work
// EI1.11 โ€” IP baseline: normal IPs per user (last 30 days)
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize 
    KnownIPs = make_set(IPAddress, 50),
    IPCount = dcount(IPAddress)
    by UserPrincipalName
| order by IPCount desc
// Users with many distinct IPs: likely mobile workers or VPN users
// Users with 1-2 IPs: likely office-based โ€” new IP is anomalous
// Cross-reference with named locations in CA for trusted IP identification
Expand for Deeper Context

Temporal baseline โ€” when do your users sign in? What are the normal working hours? What does weekend activity look like?

Device baseline โ€” which devices and operating systems do your users normally use?

Application baseline โ€” which applications do your users access?

IP baseline โ€” which IP addresses do your users normally sign in from?

The composite baseline query

Combining all five dimensions into a single per-user profile:

// EI1.11 โ€” Comprehensive user baseline (30-day reference)
// Store this result as a saved query โ€” re-run monthly to update
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| extend 
    Country = tostring(LocationDetails.countryOrRegion),
    DeviceOS = tostring(DeviceDetail.operatingSystem),
    HourOfDay = hourofday(TimeGenerated)
| summarize 
    // Geographic
    Countries = make_set(Country, 10),
    CountryCount = dcount(Country),
    // Temporal
    ActiveHours = make_set(HourOfDay, 24),
    // Device
    DeviceTypes = make_set(DeviceOS, 5),
    DeviceOSCount = dcount(DeviceOS),
    // Application
    Apps = make_set(AppDisplayName, 20),
    AppCount = dcount(AppDisplayName),
    // IP
    IPCount = dcount(IPAddress),
    // Volume
    TotalSignIns = count(),
    AvgDailySignIns = count() / 30.0,
    // Risk
    RiskySignIns = countif(RiskLevelDuringSignIn in ("medium", "high")),
    // Time range
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by UserPrincipalName
| order by TotalSignIns desc

This query produces one row per user with their complete 30-day behavioral profile. Save it. Re-run it monthly. When you need to assess whether a specific sign-in is anomalous, compare the sign-in's properties against this baseline for that user.

Using the baseline for anomaly detection

The baseline enables a new class of detection โ€” deviations from established patterns. Here is the core pattern for baseline-driven anomaly detection:

// EI1.11 โ€” Detect sign-ins from new countries (not in 30-day baseline)
let baseline = SigninLogs
| where TimeGenerated between (ago(30d) .. ago(1d))
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize BaselineCountries = make_set(Country) by UserPrincipalName;
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| where isnotempty(Country)
| join kind=inner baseline on UserPrincipalName
| where not(Country in (BaselineCountries))
| project 
    TimeGenerated, UserPrincipalName, AppDisplayName,
    NewCountry = Country, IPAddress,
    BaselineCountries,
    RiskLevelDuringSignIn
| order by TimeGenerated desc
// Every result is a user signing in from a country they have never
// used in the past 30 days โ€” a strong anomaly signal

This pattern โ€” establish baseline, compare current activity, flag deviations โ€” is the foundation of the detection rules in EI13. The detection rules automate this comparison and fire alerts when deviations exceed defined thresholds.

Maintaining and refreshing baselines

A baseline is not a one-time exercise. User behavior changes legitimately: people travel, change roles, adopt new applications, and switch devices. A baseline that is never refreshed becomes increasingly inaccurate, producing false positives for legitimate behavior changes and potentially missing real anomalies because the baseline no longer reflects current patterns.

The recommended refresh cadence is monthly. Re-run the composite baseline query at the start of each month using a rolling 30-day window. Compare the new baseline against the previous month's baseline to identify drift โ€” users who have legitimately expanded their geographic footprint, adopted new applications, or changed their working hours.

// EI1.11 โ€” Baseline drift detection
// Compare this month's country set against last month's for each user
let currentBaseline = SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize CurrentCountries = make_set(Country) by UserPrincipalName;
let previousBaseline = SigninLogs
| where TimeGenerated between (ago(60d) .. ago(30d))
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize PreviousCountries = make_set(Country) by UserPrincipalName;
currentBaseline
| join kind=inner previousBaseline on UserPrincipalName
| extend NewCountries = set_difference(CurrentCountries, PreviousCountries)
| extend DroppedCountries = set_difference(PreviousCountries, CurrentCountries)
| where array_length(NewCountries) > 0 or array_length(DroppedCountries) > 0
| project UserPrincipalName, NewCountries, DroppedCountries,
    CurrentCountries, PreviousCountries
// Results: users whose geographic pattern changed between months
// New countries: investigate or update baseline as legitimate
// Dropped countries: may indicate role change or account compromise remediation

Organizational baseline vs per-user baseline

Individual user baselines are the most precise but are impractical to manage for organizations with thousands of users. An alternative is the organizational baseline โ€” a set of norms that apply across the tenant:

The geographic organizational baseline is the list of countries where the organization has employees, offices, or approved remote workers. Any sign-in from outside this list is anomalous at the organizational level, regardless of individual user history.

// EI1.11 โ€” Organizational baseline: expected countries
// Any sign-in from outside these countries is an organizational anomaly
let orgCountries = dynamic(["US", "GB", "CA", "DE"]);  // Your countries
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| where Country !in (orgCountries) and isnotempty(Country)
| summarize SignIns = count(), Users = make_set(UserPrincipalName, 10)
    by Country
| order by SignIns desc
// Fast organizational-level anomaly check โ€” no per-user baseline needed
// Useful as a complement to per-user baselines, not a replacement
Expand for Deeper Context

The temporal organizational baseline is the normal business hours pattern. While individual users may work different hours, the overall tenant should show a predictable daily pattern โ€” high activity during business hours, low activity overnight, reduced activity on weekends. Deviations at the organizational level (a spike in 3 AM sign-ins across many accounts) indicate an attack affecting multiple accounts simultaneously.

The application organizational baseline is the set of applications your organization uses. A new application appearing in sign-in logs that nobody in IT deployed or approved may indicate shadow IT adoption โ€” or consent phishing if the application was granted permissions through the consent flow.

Try it yourself

Try It โ€” Build Your Lab Baseline

Environment: Your M365 developer tenant with Sentinel workspace.

Exercise: Run the composite baseline query against your developer tenant. Because the tenant is new, the baseline will be limited โ€” but it will show the sign-in patterns from your lab setup activities.

Answer these questions from the baseline results: 1. How many distinct countries appear in your baseline? 2. How many distinct applications have you accessed? 3. What are your active hours? 4. How many distinct IP addresses have you used?

Save the composite baseline query in your Sentinel workspace (Logs โ†’ Save โ†’ Save as query). You will re-run this baseline at the end of each module to see how your sign-in patterns change as you configure the lab environment.

โš  Compliance Myth: "Our Identity Protection baseline is sufficient for anomaly detection"

The myth: Identity Protection builds its own baseline of user behavior and detects anomalies automatically. We do not need to build our own baseline.

The reality: Identity Protection's baseline is a black box โ€” you cannot query it, inspect it, or tune it beyond the coarse risk policy thresholds (low/medium/high). It detects generic anomalies (unfamiliar sign-in properties, atypical travel) but does not detect organization-specific anomalies (a finance user accessing the Azure Portal, a UK employee signing in from Belgium, an after-hours sign-in from a user who never works evenings). Your custom baseline captures the patterns specific to your environment. Custom detection rules built on this baseline catch anomalies that Identity Protection's generic model misses. Both are valuable โ€” Identity Protection as the broad detection layer and custom baselines as the precision layer.

Decision point

A sign-in log shows a successful authentication from an IP in a country where NE has no employees. MFA was satisfied by push notification. The user says they approved the MFA prompt while traveling. Do you accept this explanation?

Verify, do not accept at face value. Check: does the user have a travel request on file? Does the IP geo-location match the claimed travel destination? Does the device fingerprint match the user's enrolled device? Are there other sign-ins from NE's corporate IP in the same time window (which would indicate the user is NOT traveling)? An attacker who stole credentials and is bombarding the user with MFA prompts (MFA fatigue) gets the same 'I approved it' response from a confused user. The investigation confirms or refutes the travel explanation within 5 minutes.

A sign-in log shows: ResultType 0 (success), MfaDetail 'satisfied by claim', ConditionalAccessStatus 'success', IPAddress from a residential proxy in Nigeria. The user's normal sign-in pattern is UK corporate IPs only. What is the most likely explanation?
The user is traveling to Nigeria for business.
AiTM session token theft. The combination of MFA-by-claim (not interactive MFA challenge), residential proxy IP (not corporate), and successful CA evaluation (the token contains the MFA assertion) is the AiTM signature. The attacker captured the session token via an AiTM proxy, which includes the MFA claim. The token is then replayed from the attacker's infrastructure. Action: immediate session revocation, MFA method audit, and CA policy assessment (why did the compliant device policy not block this?).
The user's VPN is routing through a Nigerian exit node.
Identity Protection would have blocked this if it were truly malicious.

You've mapped the identity threat landscape and learned to read sign-in logs.

EI0 established that every cloud attack starts with identity. EI1 took you through the signal that matters most โ€” interactive, non-interactive, service principal, and managed identity sign-ins. Now you engineer the defences.

  • 17 engineering modules โ€” authentication methods, conditional access architecture, Identity Protection, PIM, token protection, application governance, and detection rules
  • The Defense Design Method โ€” the six-step framework applied to every identity control you'll build
  • EI18 Capstone โ€” Identity Security Architecture Design โ€” design complete identity architectures for three realistic organisations (SMB, mid-market, regulated enterprise)
  • Identity Security Toolkit lab pack โ€” deployable conditional access policies, PIM configurations, and Identity Protection risk rules
  • Cross-domain detection (EI16) โ€” email-to-identity correlation and the full phishing-to-inbox-rule attack chain
Unlock the full course with Premium See Full Syllabus