In this section

1.6 Entity Extraction and Mapping

5 hours · Module 1 · Free
What you already know
Section 1.4 used the "Entities - Get Accounts" action to pull account entities from an incident and feed them into an enrichment query. The extraction worked because the underlying analytics rule had correct entity mapping. This section teaches the full entity model: how analytics rules create entities from query output, how identifiers determine whether Sentinel can uniquely resolve an entity, how to handle incidents containing dozens of entities, and how to extract entity types that have no native connector action.

Scenario

The enrichment playbook fires on a brute force incident. The "Entities - Get Accounts" action returns an empty collection. The For Each loop iterates zero times. The incident comment is blank. The analyst opens the incident, sees the empty comment, and assumes the investigation found nothing suspicious. The playbook ran without errors. Every action completed successfully. The problem is not the playbook. The analytics rule that generated the alert has no entity mapping configured. The playbook extracted exactly what the incident contained, which was nothing.

How entities enter an incident

Entities do not appear in incidents automatically. An analytics rule runs a KQL query that returns columns like UserPrincipalName, IPAddress, and HostName. Those columns are text strings. They become typed entities only when the rule author explicitly maps them in the Analytics wizard under Set rule logic → Alert enhancement → Entity mapping.

The mapping connects a query column to an entity type and identifier. You map UserPrincipalName to Account → Name, or more precisely, split it into Account → Name (the prefix before @) and Account → UPNSuffix (the domain after @). You map IPAddress to IP → Address. You map Computer to Host → HostName and DnsDomain. Each mapping creates one entity type from one or more query columns. A single analytics rule supports up to ten entity mappings, and each alert generated by that rule can contain up to 500 entities collectively, divided equally across all configured mappings.

The mapping is the upstream dependency every playbook relies on. If the analytics rule author skipped the entity mapping step, your playbook receives incidents with empty entity collections. If the author mapped the wrong columns, your playbook receives entities with null or incorrect fields. The playbook itself is architecturally correct. The data it receives is incomplete.

ENTITY FLOW: KQL OUTPUT → MAPPING → INCIDENT → PLAYBOOK KQL Query UserPrincipalName IPAddress, Computer Text strings only Entity Mapping UPN → Account (Name+Suffix) IPAddress → IP (Address) Max 10 mappings per rule Incident Entities Typed JSON array Account[], IP[], Host[] Max 500 per alert Sentinel Connector Get Accounts / Get IPs Get Hosts / Get URLs Parsed, typed output IF ENTITY MAPPING IS MISSING AT STEP 2 Playbook runs successfully. For Each iterates zero times. Enrichment comment is blank. No error reported. The analyst sees an empty comment and assumes the investigation found nothing.

Figure 1.6a: The entity pipeline from query output to playbook extraction. A missing mapping at Step 2 produces no errors downstream. The playbook completes successfully with empty results.

Strong identifiers, weak identifiers, and entity merging

Not all entity mappings produce the same quality of identification. Sentinel distinguishes between strong identifiers that uniquely resolve an entity and weak identifiers that are ambiguous without additional context.

For Account entities, AadUserId (the Entra object GUID) is a strong identifier on its own. It uniquely identifies one account across the entire tenant. The combination of Name + UPNSuffix is also strong because it reconstructs the full UserPrincipalName. But Name alone is weak. A username "jsmith" could be jsmith@northgateeng.com or jsmith@contractor.external.com. Without the domain suffix, the identity is ambiguous. Similarly, SID is strong for domain accounts, but built-in account SIDs (like S-1-5-18 for SYSTEM) are the same on every machine. For built-in accounts, SID + Host becomes the strong identifier combination because it ties the built-in SID to a specific machine.

For IP entities, Address is the only identifier and it is always strong. For Host entities, HostName alone is weak. HostName + DnsDomain is strong because it produces the fully qualified domain name. For URL entities, the Url field is always strong. For FileHash entities, the combination of Algorithm + Value (for example, SHA256 + the hash) is always strong.

This matters for your playbooks because Sentinel uses identifiers to merge entities. When two alerts correlate into a single incident, and both alerts contain an Account entity with AadUserId "a1b2c3d4-e5f6-7890-abcd-ef1234567890," Sentinel recognizes them as the same account and merges them into one entity. The incident shows one Account, not two duplicates. But if one alert maps only Name (weak) and the other maps AadUserId (strong), Sentinel cannot determine whether they represent the same user. The incident shows two separate Account entities, and your playbook's For Each loop processes them both, potentially sending duplicate enrichment queries for the same person.

The practical rule: always map strong identifiers. For Account entities in M365 log sources, map AadUserId when the log table provides it (SigninLogs, AADUserRiskEvents). Map Name + UPNSuffix when AadUserId is unavailable. For Host entities, always map both HostName and DnsDomain. This gives Sentinel the best chance of correctly merging entities across correlated alerts.

The entity payload your playbook receives

When the incident trigger fires, the trigger body contains an Entities field: a JSON array of every entity attached to the incident, regardless of type. The "Entities - Get Accounts" action filters this array to return only Account entities with their fields parsed into typed properties. The same pattern applies to Get IPs, Get Hosts, Get URLs, Get FileHashes, and Get DNS.

JSON
{
  "Accounts": [
    {
      "Name": "priya.sharma",
      "UPNSuffix": "northgateeng.com",
      "AadUserId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "DisplayName": "Priya Sharma",
      "Sid": null,
      "NTDomain": null,
      "IsDomainJoined": true,
      "Type": "account"
    }
  ],
  "IPs": [
    {
      "Address": "185.220.101.42",
      "Type": "ip"
    },
    {
      "Address": "10.1.50.23",
      "Type": "ip"
    }
  ],
  "Hosts": [
    {
      "HostName": "WS-SHARMA-01",
      "DnsDomain": "northgateeng.com",
      "OSFamily": "Windows",
      "Type": "host"
    }
  ]
}

The Account entity has both strong identifiers: AadUserId and the reconstructed UPN from Name + UPNSuffix. The IP collection contains two addresses, which is typical. One is the suspicious external address from a Tor exit node. The other is the legitimate internal address of the workstation where the sign-in event was recorded. Your enrichment playbook processes both through the For Each loop. The enrichment logic must distinguish between them. You will typically check whether the address falls in RFC 1918 private ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x) and skip internal IPs, or you route internal and external addresses to different enrichment paths.

The Host entity shows HostName and DnsDomain as separate fields. If you need the FQDN for a later action (DNS lookup, MDE device query), concatenate them: @{items('For_each_Host')?['HostName']}.@{items('For_each_Host')?['DnsDomain']}. Do not assume the HostName field contains the FQDN. Some analytics rules map only the short hostname.

The July 2026 UPN Name field change

Microsoft announced a standardization change effective July 1, 2026, that directly affects how Account entity properties are populated. Before this change, the Account Name field could contain either the UPN prefix (priya.sharma) or the full UPN (priya.sharma@northgateeng.com), depending on how the analytics rule mapped the source field. This inconsistency created unpredictable behavior in automation rules and playbook conditions that compared the Name field against a specific value.

After July 1, 2026, the Name field will consistently contain only the UPN prefix for all accounts. New dedicated fields provide the full UPN and UPN suffix separately. If your automation rules or playbooks compare Name against a full UPN value (like user@contoso.com), those comparisons will stop matching. Update them to reconstruct the full UPN from Name + UPNSuffix, or use the new FullUPN field when it becomes available.

Review every automation rule condition and every playbook Condition action that references the Account Name property. Any condition that expects a full email-format value in the Name field needs updating before July 2026. The change applies automatically with no opt-in required.

Multi-entity aggregation

A brute force detection targeting 30 accounts produces an incident with 30 Account entities. A lateral movement detection across 5 machines produces 5 Host entities and potentially 5 Account entities. Your playbook must handle all of these without producing 30 separate incident comments.

The aggregation pattern uses three steps. First, initialize an array variable before the For Each loop. Second, inside the loop, append each entity's enrichment result to the array using the Append to array variable action. Third, after the loop completes, use a Compose action with the expression join(variables('EnrichmentResults'), '\n\n') to concatenate all results into a single formatted string. Add one incident comment containing the joined output. One comment, all entities, clean formatting.

For performance on high-entity incidents, configure concurrency on the For Each loop. The default behavior is sequential: each iteration waits for the previous one to complete. If each iteration runs a KQL query (approximately 2 seconds) and a Graph API call (approximately 1 second), 30 sequential iterations take 90 seconds. Enable concurrency in the For Each action settings and set the degree of parallelism to 5. Now 5 iterations run simultaneously, and the loop completes in roughly 18 seconds. Start at 5 and increase cautiously. Graph API rate limits at approximately 130 requests per 20-second window per app identity, and a parallelism setting of 20 on a 30-entity incident will fire 20 simultaneous Graph calls that may trigger throttling.

Skipping the empty collection check

A playbook that enters a For Each loop without checking whether the entity collection contains any items will iterate zero times when entities are missing. The loop does not error. The Compose action after the loop joins an empty array into an empty string. The Add Comment action posts a blank comment to the incident. The analyst reads the empty comment and concludes the playbook found nothing suspicious. In reality, the playbook had no entities to work with because the upstream analytics rule lacked entity mapping. Add a Condition action before every For Each loop: check `length(body('Entities_-_Get_Accounts')?['Accounts'])` is greater than zero. In the False branch, post a diagnostic comment: "No Account entities found in this incident. Verify entity mapping on the analytics rule." This tells the analyst the extraction failed, not the investigation.

Non-native entity extraction

The Sentinel connector provides six native extraction actions: Get Accounts, Get IPs, Get Hosts, Get URLs, Get FileHashes, and Get DNS. These cover the most common entity types for M365 security automation. But analytics rules can also map entity types that have no native action: Mailbox, MailMessage, Malware, Process, RegistryKey, RegistryValue, SecurityGroup, and AzureResource. When you need to extract these, you build the extraction yourself.

The pattern has three steps. First, initialize an Array variable and set its value to the raw entity collection from the trigger: @triggerBody()?['object']?['properties']?['relatedEntities']. This gives you the complete entity array containing every entity type. Second, add a Filter Array action that filters where the kind property equals the type you need. For Process entities, the filter condition is @equals(item()?['kind'], 'Process'). For Malware entities, use @equals(item()?['kind'], 'Malware'). Third, add a Parse JSON action with the schema for that entity type. The schema defines the typed fields (for Process: processId, commandLine, elevationToken, imageFile; for Malware: malwareName, category). The parsed output feeds into a For Each loop the same way native extraction actions do.

Event Log
// Filter Array — extract Process entities from raw entity collection
// Input: @triggerBody()?['object']?['properties']?['relatedEntities']
//
Filter condition: @equals(item()?['kind'], 'Process')
//
// Filtered output (one Process entity from an endpoint detection):
{
  "kind": "Process",
  "properties": {
    "processId": "7284",
    "commandLine": "powershell.exe -enc SQBFAFgAIAAoA...",
    "elevationToken": "Full",
    "creationTimeUtc": "2026-05-22T09:14:33Z",
    "imageFile": {
      "fileName": "powershell.exe",
      "directory": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0"
    }
  }
}
//
// Parse JSON schema extracts typed fields:
//   processId  → "7284"
//   commandLine → "powershell.exe -enc SQBFAFgAIAAoA..."
//   elevationToken → "Full"
// These become dynamic content in subsequent actions.

You will not need non-native extraction for the enrichment playbooks in this module. Enrichment primarily works with accounts and IPs. But collection and investigation playbooks in later modules encounter Process entities from Defender for Endpoint detections and Mailbox entities from Defender for Office 365 alerts. When you reach those modules, the extraction pattern is the same. Only the entity kind and schema change.

To generate the Parse JSON schema for any entity type, run the playbook manually on a test incident that contains the entity type you need. Open the run history, navigate to the Initialize variable action, and copy the entity payload. Paste it into the Parse JSON action's "Use sample payload to generate schema" dialog. Logic Apps generates the correct schema from the sample data. This is faster and more accurate than writing schemas by hand, particularly for complex entity types with nested properties like Process (which includes an imageFile sub-object) or MailMessage (which includes sender, recipient, and attachment arrays).

Auditing entity mapping across your analytics rules

Entity mapping quality degrades silently. An analyst modifies a detection query, adds a new column, removes an old one, but forgets to update the entity mapping. The rule continues to fire. Incidents are created. Playbooks run. But the entity collection in each incident is missing the new column's data, and any playbook action that referenced the removed column receives null values. Nobody notices until an enrichment comment arrives blank on a real incident.

Defender Portal

Microsoft SentinelConfigurationAnalytics → select rule → EditSet rule logicAlert enhancementEntity mapping
For every analytics rule that triggers a playbook, verify that each entity type your playbook extracts is mapped with strong identifiers. Account entities should map both Name and UPNSuffix at minimum, with AadUserId when the log source provides it. IP entities should map the Address field. Host entities should map both HostName and DnsDomain. If any playbook-dependent rule is missing entity mappings, the playbook's Get actions return empty collections for that entity type.

Build this audit into your quarterly review from Section 1.5. For each analytics rule in the active rules list, open the entity mapping configuration and check three things. First, does the rule map every entity type that your connected playbooks expect to extract? If your enrichment playbook calls Get Accounts and Get IPs, the rule must map both Account and IP entities. Second, does the mapping use strong identifiers? Name alone is weak for Account. HostName alone is weak for Host. Third, do the mapped column names still match the current KQL query output? If the query was modified to rename Caller to ActorUPN, but the entity mapping still references Caller, the mapping produces null values.

Content Hub analytics rules deserve special attention. Rules imported from Content Hub solutions may have minimal entity mapping or use weak identifiers. After importing any Content Hub solution, review every analytics rule it installed and add or strengthen entity mappings before connecting playbooks. Content Hub rules are editable after import.

Automation Principle

Entity extraction is only as good as entity mapping. The best playbook produces empty results when the analytics rule does not map entities correctly. When a playbook produces blank enrichment, check entity mapping first, before debugging any playbook logic. The entity chain is: KQL query returns columns → analytics rule maps columns to entity types with identifiers → incident stores typed entity collections → playbook extraction actions filter by type and parse fields. A break at any point produces empty results with no error.

Next
Section 1.7 covers error handling and retry logic: what happens when an HTTP action fails mid-playbook, how Logic Apps retry policies work, how Scope actions group related steps for structured error handling, and how to ensure the playbook always writes a diagnostic comment to the incident even when enrichment partially fails.
Unlock the Full Course See Full Course Agenda