In this module

IAM1.3 Data Quality as a Governance Foundation

8 hours · Module 1 · Free

What you already know

IAM1.2 mapped the twelve governance-critical attributes and showed which governance mechanisms depend on each one. This section measures the actual coverage — what percentage of your identities have each attribute populated — and defines the remediation strategies for closing the gaps. Data quality isn't a Phase 2 problem you solve later. It's a Phase 1 gate that determines whether lifecycle workflows, dynamic groups, and access reviews work at all when you deploy them in Module 2.

You can't govern what you can't see

A lifecycle workflow that triggers 7 days before employeeHireDate works perfectly — for identities that have a hire date. A dynamic group rule that evaluates department -eq "Finance" populates correctly — for identities that have a department. An access review that routes to the identity's manager produces meaningful certification — for identities that have a manager assigned.

The question isn't whether these mechanisms work. They do. The question is how many of your identities they work for. If employeeHireDate is populated for 20% of your workforce, your joiner workflow automates onboarding for one in five new hires. The other four get whatever manual process Phil is running — which is the process the workflow was supposed to replace.

Data quality is the gap between having governance mechanisms and having governance mechanisms that cover your population. This section measures that gap, calculates the downstream impact, and produces a remediation plan with specific strategies and timelines.

Estimated time: 55 minutes.

Figure IAM1.3 — Data quality thresholds determine governance deployment readiness. Each governance mechanism has a minimum attribute coverage requirement. Deploy below the threshold and the mechanism silently excludes identities with missing attributes.

What we see in 90% of tenants (and why it fails)

The admin center doesn't require governance attributes during user creation. Phil creates an account with displayName, UPN, and department — the fields the form prompts for. employeeHireDate, employeeType, and employeeOrgData are blank because nobody told him lifecycle workflows depend on them. Six months later, Rachel deploys lifecycle workflows. They fire for 20% of the workforce and silently skip the rest. The data quality problem was created at provisioning time. It surfaces at governance deployment time.

The data quality audit — portal view first

Before running bulk queries, look at the data quality problem through the portal. This is what most administrators see day-to-day, and it's where you'll verify individual fixes after remediation.

Entra Admin Center

Identity → Users → All users

Select any user who's been in the organization for more than a year. Click Properties, then expand each section:

Identity: Check displayName, userPrincipalName, userType. These are almost always populated — they're required at account creation.

Job info: Check department, job title, employee ID, employee type, employee hire date. This is where governance data lives — and where gaps appear. In most tenants, department and job title are filled, but employee hire date, employee type, and employee ID are blank. The admin center doesn't require these fields during user creation, so they're only populated if someone deliberately sets them afterward.

Contact info: Check manager. Click Manager in the left nav (or the manager field in Properties). If it shows "No manager assigned," this identity is invisible to manager-based access reviews.

Sign-in activity: Scroll to the sign-in card or click Sign-in logs in the left nav. The card shows last interactive and last non-interactive sign-in timestamps. If both are blank, the account has never been used.

Now open 4–5 more users. Count how many have a blank employee hire date. Count how many have no manager. This is the data quality problem at human scale — each individual user looks like a minor omission. Across 810 accounts, those omissions become governance-breaking gaps.

The portal gives you the individual view. Now quantify the problem across your entire population.

The composite data quality score

Connect to Graph and pull the governance-critical attributes for all active members:

Connect-MgGraph -Scopes "User.Read.All", "AuditLog.Read.All",
  "User-LifeCycleInfo.Read.All"

$members = Get-MgUser -All -Property id, displayName, accountEnabled,
  userType, department, jobTitle, employeeHireDate, employeeLeaveDateTime,
  employeeType, employeeOrgData, companyName, signInActivity |
  Where-Object { $_.UserType -eq "Member" -and $_.AccountEnabled -eq $true }

$total = $members.Count

Now calculate coverage for each governance-critical attribute:

$coverage = [ordered]@{
  "department"             = ($members | Where-Object { $_.Department }).Count
  "jobTitle"               = ($members | Where-Object { $_.JobTitle }).Count
  "employeeHireDate"       = ($members | Where-Object { $_.EmployeeHireDate }).Count
  "employeeLeaveDateTime"  = ($members | Where-Object { $_.EmployeeLeaveDateTime }).Count
  "employeeType"           = ($members | Where-Object { $_.EmployeeType }).Count
  "division"               = ($members | Where-Object { $_.EmployeeOrgData.Division }).Count
  "costCenter"             = ($members | Where-Object { $_.EmployeeOrgData.CostCenter }).Count
  "companyName"            = ($members | Where-Object { $_.CompanyName }).Count
}

# Manager requires a separate call per user
$managerCount = 0
foreach ($m in $members) {
  if (Get-MgUserManager -UserId $m.Id -ErrorAction SilentlyContinue) {
    $managerCount++
  }
}
$coverage["manager"] = $managerCount

Write-Host "=== DATA QUALITY AUDIT ($total active members) ==="
Write-Host ""
$totalScore = 0
$totalPossible = 0

foreach ($attr in $coverage.Keys) {
  $pct = [math]::Round($coverage[$attr] / $total * 100)
  $status = if ($pct -ge 90) { "GOOD" }
            elseif ($pct -ge 70) { "ACCEPTABLE" }
            elseif ($pct -ge 50) { "POOR" }
            else { "CRITICAL" }
  Write-Host "  $($attr.PadRight(25)) $($coverage[$attr].ToString().PadLeft(4)) / $total  ($pct%)  [$status]"
  $totalScore += $coverage[$attr]
  $totalPossible += $total
}

$compositeScore = [math]::Round($totalScore / $totalPossible * 100)
Write-Host ""
Write-Host "  COMPOSITE DATA QUALITY SCORE: $compositeScore%"

=== DATA QUALITY AUDIT (15 active members) ===

  department                 12 / 15  (80%)  [GOOD]
  jobTitle                   14 / 15  (93%)  [GOOD]
  employeeHireDate            8 / 15  (53%)  [POOR]
  employeeLeaveDateTime       0 / 15  (0%)   [CRITICAL]
  employeeType                0 / 15  (0%)   [CRITICAL]
  division                    0 / 15  (0%)   [CRITICAL]
  costCenter                  0 / 15  (0%)   [CRITICAL]
  companyName                 0 / 15  (0%)   [CRITICAL]
  manager                    10 / 15  (67%)  [POOR]

  COMPOSITE DATA QUALITY SCORE: 33%

A composite score of 33%. That means two-thirds of the governance attribute surface is empty. The individual scores tell the operational story: department and job title are reasonably covered (they're visible in the admin center and Phil populates them). Everything else — the attributes that governance automation depends on — ranges from poor to absent.

The thresholds:

GOOD (≥90%): Deploy governance mechanisms that depend on this attribute now. Coverage is sufficient for production.
ACCEPTABLE (≥70%): Deploy with monitoring. Most identities are covered, but track the exceptions — they'll surface as governance gaps.
POOR (50–69%): Deploy for a targeted population only. Use the attribute to scope governance for the covered subset while remediating the rest.
CRITICAL (<50%): Do not deploy governance mechanisms that depend on this attribute until remediation raises coverage above 70%. Deploying a lifecycle workflow that fires for 20% of new hires creates a false sense of automation.

What breaks at each coverage level

The composite score is a summary. The governance impact is per-attribute. Walk through the cascade of failures that each CRITICAL attribute produces:

employeeHireDate at 53% (POOR) — lifecycle workflows that trigger on hire date automate onboarding for roughly half your workforce. The other half receives manual provisioning. The workflow runs successfully — the completion logs show zero errors. But half your new hires don't appear in the workflow's scope because they have no hire date. You discover this when a new employee reports they never received the welcome email, the TAP, or the group assignments that the workflow was supposed to provide.

Entra Admin Center — Seeing the Gap

Identity → Users → All users → click any user → Properties → Job info

Look at the Employee hire date field. If it's blank, this identity is invisible to every lifecycle workflow that uses hire date as a trigger. Now click the browser back button and check the next user. And the next. The field is blank more often than it's populated — and nobody notices because the admin center doesn't flag it as a problem. The field is optional. The governance mechanism that depends on it is not.

employeeType at 0% (CRITICAL) — lifecycle workflow scoping is impossible. You can't build a joiner workflow that treats employees differently from contractors because no identity has the attribute that distinguishes them. A contractor who should receive 90-day time-limited access receives the same permanent access as a full-time employee — or receives nothing at all if the workflow is scoped to employeeType -eq "Employee" and nobody meets the criteria.

manager at 67% (POOR) — access reviews that route to the identity's manager work for two-thirds of your population. The remaining third has no reviewer. Depending on the review configuration, those identities are either skipped entirely (the default behavior) or routed to a fallback reviewer who doesn't know what the identities do or what access they should hold. Either outcome is a governance gap: access unreviewed, or access rubber-stamped by a reviewer without context.

employeeLeaveDateTime at 0% (CRITICAL) — leaver workflows can't pre-schedule departure tasks. Account disabling happens when someone manually processes the termination — hours, days, or weeks after the employee's last day. In the gap between departure and account disabling, the former employee's credentials remain valid, their access remains active, and their group memberships continue granting permissions.

At Northgate Engineering: Elena Petrova runs the composite data quality audit and presents the 33% score to Rachel Okafor. Rachel's reaction: "So we can't launch lifecycle workflows, we can't scope entitlement management by employee type, and a quarter of our access reviews have no reviewer. What can we launch?" Elena's answer: department-scoped dynamic groups (80% coverage — ACCEPTABLE with monitoring) and job-title-based policies (93% — GOOD). Everything else requires data remediation first. The audit score isn't a failure of governance tooling. It's a failure of the provisioning process that creates identities without the attributes governance needs.

Remediation strategies

Four strategies exist for closing the data quality gap, each with different coverage, effort, and sustainability characteristics.

Strategy 1 — HR system integration (highest value, highest effort)

The gold standard. An HR system (Workday, SuccessFactors, BambooHR, or any system with an API) becomes the authoritative source for identity attributes. When HR processes a new hire, the HR system populates employeeHireDate, department, manager, employeeType, and employeeOrgData automatically. When HR processes a transfer, the attributes update. When HR processes a termination, employeeLeaveDateTime is set.

Entra Admin Center

Identity → Users → User settings → Manage external collaboration settings won't get you there. For HR integration, navigate to:

Identity → Applications → Enterprise applications → New application → search for your HR system (Workday, SuccessFactors)

Microsoft provides pre-built provisioning connectors for Workday and SuccessFactors that map HR attributes to Entra ID user properties. The provisioning configuration — attribute mapping, scoping filters, matching rules — is configured through the Enterprise application's Provisioning blade. This is Module 2 content — IAM2.1 walks through HR-driven provisioning architecture in detail. For now, note whether your organization has an HR system with an API. If it does, Strategy 1 is the target. If it doesn't (like NE, whose HR system has no API), you need Strategies 2–4.

Entra ID supports inbound provisioning from Workday, SuccessFactors, and any SCIM-compliant HR system. Microsoft is also transitioning provisioning connectors from OAuth authorization code grant to workload identity-based authentication (starting May 2026 for SuccessFactors, with a November 2026 deadline). If you're planning HR integration, verify the current authentication requirements in Microsoft Learn before configuring the connector.

Strategy 2 — CSV bulk import (immediate coverage, manual sustainability)

Export your current user list, enrich it with governance attributes from HR data (even if that data lives in spreadsheets), and bulk-update through PowerShell or the admin center.

Entra Admin Center — Bulk Edit (Preview)

Identity → Users → All users → select up to 60 users using the checkboxes → Edit properties (command bar)

The bulk edit panel lets you set a single value across all selected users — for example, setting department to "Engineering" for 40 selected engineering staff. This works well for attributes with a small set of values (department, employeeType) where you're applying the same value to a group of users.

For per-user attributes like employeeHireDate and manager (each user has a unique value), bulk edit doesn't help — you need either individual edits in the portal or a PowerShell CSV update. The portal supports editing one user's properties at a time through the Properties blade, but for 500+ users with individual hire dates, PowerShell is the practical path.

The PowerShell CSV approach for per-user attributes:

# Export current state to CSV for enrichment
$members | Select-Object Id, UserPrincipalName, DisplayName,
  Department, EmployeeHireDate, EmployeeType |
  Export-Csv -Path "governance-attributes-audit.csv" -NoTypeInformation

Write-Host "Exported $($members.Count) members to governance-attributes-audit.csv"
Write-Host "Add missing values in the CSV, then run the import below."

After enriching the CSV with data from HR (even a spreadsheet export), import the updates:

$updates = Import-Csv -Path "governance-attributes-enriched.csv"

foreach ($row in $updates) {
  $params = @{}
  if ($row.Department -and $row.Department -ne "") {
    $params.Department = $row.Department
  }
  if ($row.EmployeeType -and $row.EmployeeType -ne "") {
    $params.EmployeeType = $row.EmployeeType
  }
  if ($row.EmployeeHireDate -and $row.EmployeeHireDate -ne "") {
    $params.EmployeeHireDate = $row.EmployeeHireDate
  }

  if ($params.Count -gt 0) {
    Update-MgUser -UserId $row.Id @params
    Write-Host "Updated: $($row.DisplayName) — $($params.Keys -join ', ')"
  }
}

Important constraint: employeeHireDate cannot be updated through application-only (app-only) permissions. The Graph API requires delegated permissions with a signed-in user for this property. If your automation uses a service principal to update user attributes, employeeHireDate updates will fail silently or return a 403 error. You must run the hire date updates as a signed-in admin with delegated User.ReadWrite.All permission.

Strategy 3 — Graph API enrichment scripts (targeted, automatable)

For attributes with a known derivation rule — for example, setting employeeType to "Contractor" for all users in the "Contractors" OU in on-premises AD, or setting companyName based on the domain suffix — you can write targeted enrichment scripts.

Before running any bulk enrichment, check whether the target accounts are cloud-native or synced from on-premises AD. This matters because synced accounts have their authoritative attribute source in on-premises AD. If you update department through Graph API for a synced account, the change takes effect immediately in Entra ID — but the next Entra Connect sync cycle (typically every 30 minutes) overwrites your change with the on-premises value. For synced accounts, attribute remediation must happen in the on-premises AD source, not in Entra ID.

# Check which accounts are synced vs cloud-native
$synced = ($members | Where-Object { $_.OnPremisesSyncEnabled -eq $true }).Count
$cloud = ($members | Where-Object { -not $_.OnPremisesSyncEnabled }).Count

Write-Host "Cloud-native accounts: $cloud (update via Graph API or portal)"
Write-Host "Synced accounts:       $synced (update in on-prem AD only)"

For cloud-native accounts, the enrichment script works directly:

# Set employeeType for all members who don't have one
$noEmpType = $members | Where-Object { -not $_.EmployeeType }

Write-Host "Members without employeeType: $($noEmpType.Count)"
Write-Host "Setting all to 'Employee' (adjust for contractors/interns separately)"

foreach ($user in $noEmpType) {
  Update-MgUser -UserId $user.Id -EmployeeType "Employee"
  Write-Host "Set employeeType=Employee: $($user.DisplayName)"
}

Entra Admin Center — Individual Update

Identity → Users → All users → select a user → Properties → Edit properties

Expand Job info. Set Employee type to "Employee" (or "Contractor", "Intern", etc). Click Save.

The portal supports the standard values ("Employee", "Contractor", "Intern") and also accepts custom values. The value you set here is the value lifecycle workflows and dynamic groups evaluate. Consistency matters — if some users have "Employee" and others have "employee" (lowercase), your scoping rules need to account for case sensitivity (dynamic group rules in Entra ID are case-insensitive, but PowerShell string comparisons are case-sensitive by default).

Strategy 4 — Manual enrichment with governance process (last resort)

For small organizations without HR system integration, the remediation is manual: an admin updates each user's attributes through the portal or a CSV import, and a governance process ensures new accounts are created with complete attributes. This is the least scalable strategy, but it's what NE uses because Phil's HR system has no API.

The governance process addition is critical: a manual enrichment that runs once raises the score today but degrades tomorrow as new accounts are created without the attributes. The process must include a creation checklist — every new account requires department, manager, employeeHireDate, employeeType, and employeeOrgData before the account is considered provisioned.

Minimum viable data quality thresholds

Not every attribute needs 100% coverage before you can deploy governance. The threshold depends on the governance mechanism and the acceptable coverage gap:

Lifecycle workflows (Module 2): employeeHireDate ≥ 70% before deploying joiner workflows. Below 70%, more than a third of new hires miss onboarding automation — the manual process remains the primary path and the workflow is a supplement, not a replacement.

Dynamic groups (Module 3): department ≥ 80% before deploying department-scoped dynamic groups as access assignment mechanisms. Below 80%, the groups silently exclude too many identities to be reliable. employeeType ≥ 70% before scoping dynamic groups by employment category.

Access reviews (Module 5): manager ≥ 80% before deploying manager-based access reviews as the primary certification mechanism. Below 80%, configure a fallback reviewer for identities without managers and document the gap.

Entitlement management auto-assignment (Module 4): department AND employeeType both ≥ 70% before deploying attribute-based auto-assignment policies. Both attributes must meet the threshold — if department is 80% but employeeType is 0%, auto-assignment scoped by both attributes covers 0% of your population.

Record these thresholds. When you build each governance mechanism, the first step is re-running the data quality audit for the relevant attributes and verifying the coverage meets the threshold. If it doesn't, remediation comes before deployment.

Your first risk register entry

The data quality audit produces your first risk register entry for the IAM program package. Open the 03-Risk-Register/ folder from IAM0.7 and document the finding.

Every attribute below the ACCEPTABLE threshold (70%) is a risk register item. The format:

Risk ID: DQ-001 Risk: Lifecycle workflows cannot fire for identities missing employeeHireDate. Current coverage: [your percentage]%. Likelihood: High — every new hire created through manual provisioning without HR integration will be missing this attribute. Impact: Medium — affected identities receive manual onboarding instead of automated, creating delays and inconsistent access assignment. Current control: Manual provisioning by Phil as fallback. Target control: HR system integration (Strategy 1) or mandatory creation checklist (Strategy 4). Remediation target: Coverage ≥ 70% within 2 weeks (prioritized CSV enrichment), ≥ 90% within 3 months (process change for new hires). Status: Open — remediation in progress.

Create a DQ-series entry for each CRITICAL or POOR attribute. In a production tenant with 810 members, you might have 4–6 risk register entries from this section alone. That's normal — data quality is the most common governance gap and documenting it as risk is the first step toward closing it.

Entra Admin Center — Verifying Remediation

Identity → Users → All users → select a recently remediated user → Properties → Job info

After running a CSV enrichment or manual update, verify the changes landed. Check Employee hire date, Employee type, and Manager for the updated users. Changes to these fields take effect immediately in the directory but may take 15–30 minutes to propagate to other M365 services (Teams profile, Exchange address book). Run the composite data quality audit script again to confirm the score has improved. Track the score over time — it's the metric that tells you whether data quality is improving or degrading as new accounts are created.

At Northgate Engineering: Rachel Okafor's remediation plan: (1) Phil adds employeeType to the account creation checklist — every new account gets "Employee" or "Contractor" at creation. (2) Elena runs a one-time CSV enrichment to set employeeType for existing accounts using HR's spreadsheet export — 810 accounts, estimated 4 hours of data matching. (3) Phil starts populating employeeHireDate for new hires from the HR email notification. (4) Elena enriches employeeHireDate for existing accounts where the hire date is known — starting with the 200 most recent hires. Target: employeeType from 0% to 90% in one week (CSV import). employeeHireDate from 53% to 70% in two weeks (prioritized enrichment). manager from 67% to 80% in one week (Phil assigns managers for the 50 accounts missing them). These thresholds are the minimum viable data quality for Module 2 deployment. Rachel creates risk register entries DQ-001 through DQ-004 covering employeeHireDate, employeeType, manager, and employeeLeaveDateTime. Each entry has a remediation timeline and a specific target coverage percentage.

Reusable script — the data quality audit and remediation helpers from this section:

# IAM1.3 — Data Quality Audit
Connect-MgGraph -Scopes "User.Read.All", "AuditLog.Read.All",
  "User-LifeCycleInfo.Read.All"

$members = Get-MgUser -All -Property id, displayName, accountEnabled,
  userType, department, jobTitle, employeeHireDate, employeeLeaveDateTime,
  employeeType, employeeOrgData, companyName |
  Where-Object { $_.UserType -eq "Member" -and $_.AccountEnabled -eq $true }

$total = $members.Count

$coverage = [ordered]@{
  "department"            = ($members | Where-Object { $_.Department }).Count
  "jobTitle"              = ($members | Where-Object { $_.JobTitle }).Count
  "employeeHireDate"      = ($members | Where-Object { $_.EmployeeHireDate }).Count
  "employeeLeaveDateTime" = ($members | Where-Object { $_.EmployeeLeaveDateTime }).Count
  "employeeType"          = ($members | Where-Object { $_.EmployeeType }).Count
  "division"              = ($members | Where-Object { $_.EmployeeOrgData.Division }).Count
  "costCenter"            = ($members | Where-Object { $_.EmployeeOrgData.CostCenter }).Count
  "companyName"           = ($members | Where-Object { $_.CompanyName }).Count
}

$managerCount = 0
foreach ($m in $members) {
  if (Get-MgUserManager -UserId $m.Id -ErrorAction SilentlyContinue) {
    $managerCount++
  }
}
$coverage["manager"] = $managerCount

Write-Host "=== DATA QUALITY AUDIT ($total active members) ==="
$totalScore = 0; $totalPossible = 0

foreach ($attr in $coverage.Keys) {
  $pct = [math]::Round($coverage[$attr] / $total * 100)
  $status = if ($pct -ge 90) { "GOOD" } elseif ($pct -ge 70) { "ACCEPTABLE" }
            elseif ($pct -ge 50) { "POOR" } else { "CRITICAL" }
  Write-Host "  $($attr.PadRight(25)) $($coverage[$attr].ToString().PadLeft(4)) / $total  ($pct%)  [$status]"
  $totalScore += $coverage[$attr]; $totalPossible += $total
}
Write-Host "`n  COMPOSITE SCORE: $([math]::Round($totalScore / $totalPossible * 100))%"

# Export for CSV enrichment
$members | Select-Object Id, UserPrincipalName, DisplayName,
  Department, EmployeeHireDate, EmployeeType |
  Export-Csv -Path "governance-attributes-audit.csv" -NoTypeInformation
Write-Host "`nExported to governance-attributes-audit.csv for enrichment"

Decision-point simulation

Scenario 1. Your data quality audit shows 92% attribute coverage across all members. The remaining 8% (65 users) are missing department because they were created through a legacy onboarding process that didn't require it. The IT team says 92% is "good enough." Is it?

It depends on what consumes the attribute. If department drives dynamic group membership for access assignment, 65 users without a department are 65 users who don't receive their department's standard access — SharePoint sites, Teams channels, app assignments. They either have no access (and submit helpdesk tickets) or were granted access manually (which bypasses the governance model). 92% coverage means 8% ungoverned access. Whether that's acceptable depends on the business impact of those 65 users, not on the percentage.

Scenario 2. You implement a data quality monitoring script that runs weekly and flags users below the governance score threshold. After 3 weeks, the report shows the same 65 users every week — nobody is remediating them. What's wrong with the process?

The monitoring script detects the gap but doesn't assign accountability. A report that nobody acts on is diagnostic without governance. The fix: the script should assign each flagged user to their manager (from the manager property) and send an automated email with the specific missing attributes. Managers who don't remediate within 14 days get escalated to the IT Director. The governance cadence needs enforcement, not just detection. Module 13 builds this automated monitoring and escalation framework.

Scenario 3. HR pushes back on populating employeeHireDate for existing employees because "we'd have to backfill 810 records and we don't have the resources." The field is required for lifecycle workflow triggers. How do you handle this?

You need the field, but you don't need perfect data on day one. Prioritize: set employeeHireDate for all new hires going forward (zero additional HR effort — it's part of the new hire data package). For existing employees, backfill in batches by department over 90 days, starting with departments that will be first to use lifecycle workflows. For users where the original hire date is genuinely unknown, use the createdDateTime from Entra ID as a proxy — it's not the real hire date, but it gives lifecycle workflows a trigger value while the real dates are researched.

IAM1.4 — Group Architecture for IAM. Data quality determines what governance mechanisms work. Group architecture determines how access is assigned — and therefore what access reviews, entitlement management, and lifecycle workflows act on. You'll classify every group in your tenant by type, ownership, and governance state, audit for sprawl and redundancy, and design naming conventions that make groups discoverable and automatable.

You're reading the free modules of Identity and Access Management in Microsoft 365

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts.

View Pricing See Full Syllabus

← Previous Next →