SA1.6 Entity Extraction and Mapping

5 hours · Module 1 · Free
ENTITY EXTRACTION — FROM INCIDENT TO ACTIONABLE DATAINCIDENT PAYLOADEntities JSON arrayMixed types: Account,IP, Host, URL, File0 to many per typeVaries by analytics ruleENTITY ACTIONSGet AccountsGet IPsGet HostsGet URLs / Get FileHashesReturns typed arraysDEFENSIVE CHECKCondition: length > 0?Yes → proceedNo → skip + comment"No entity found"Prevents null reference errorsUSE ENTITYfirst() for singleFor Each for multiProperties: UPN,IP, Hostname, etc.ENTITY PROPERTIES — WHAT EACH TYPE PROVIDESAccount:Name, UPNSuffix, AadUserId, Sid, NTDomain, DnsDomain, ObjectGuidIP:Address (IPv4/IPv6), Location (country, city, latitude, longitude, ASN)Host:HostName, DnsDomain, AzureID, OMSAgentID, OSFamily, OSVersionURL:Url (full URL string)FileHash:Algorithm (MD5, SHA1, SHA256), HashValueUPN construction: Account.Name + "@" + Account.UPNSuffix → d.chen@northgateeng.com

Figure SA1.6 — Entity extraction flow: incident payload → entity action → defensive check → use entity. The defensive check prevents the most common playbook failure.

Operational Objective
Entity extraction is the step that breaks most playbooks. The playbook builder tests against one incident type (AiTM — which always has an Account entity), deploys to production, and discovers that half the incidents from other analytics rules do not have Account entities at all. The playbook throws a null reference error and fails. This sub teaches entity extraction comprehensively: what entities exist, how to extract each type, how to handle missing entities, how to process multiple entities, and the expressions that reliably construct UPNs, IP addresses, and hostnames from entity properties.
Deliverable: Entity extraction patterns that handle every scenario: single entity, multiple entities, missing entities, and mixed entity types. Applied to the enrichment playbook from SA1.4.
⏱ Estimated completion: 30 minutes

Why entity extraction fails

When a Sentinel incident trigger fires, it provides the incident payload as a JSON object. Within that payload, entities are stored as a JSON array called Entities. The array contains zero or more entity objects, each with a kind property (Account, Ip, Host, Url, FileHash) and a properties object with entity-specific fields.

The array contents depend entirely on the analytics rule’s entity mapping configuration. An analytics rule that maps AccountName and AccountUPNSuffix from the KQL query results produces Account entities. A rule that maps IPAddress produces IP entities. A rule that maps neither produces an incident with zero entities.

This is the root cause of extraction failures: the playbook assumes the entity exists because it always existed in testing. In production, different analytics rules produce different entity mappings. The playbook must handle every combination.

Extracting Account entities

The Sentinel connector provides “Entities - Get Accounts” — an action that filters the entity array for Account-type entities and returns them as a structured array. Each Account object has properties: Name (the username portion), UPNSuffix (the domain portion), AadUserId (the Entra Object ID), Sid (the Windows SID), and NTDomain (the NetBIOS domain name).

To construct the User Principal Name (UPN) for Graph API queries:

@{concat(first(body('Entities_-_Get_Accounts'))?['properties']?['Name'], '@', first(body('Entities_-_Get_Accounts'))?['properties']?['UPNSuffix'])}

This expression takes the first Account entity, extracts the Name and UPNSuffix properties, and concatenates them with @. For d.chen@northgateeng.com: Name = “d.chen”, UPNSuffix = “northgateeng.com”.

The first() function takes the first element of the array. If the incident has multiple Account entities (a lateral movement alert with both source and target accounts), first() returns only the first one. For enrichment playbooks that should process all accounts, use a “For Each” loop instead of first().

If the array is empty, first() returns null. The concat() expression produces null@null, and the subsequent KQL query filters on a nonexistent UPN — returning zero results but not throwing an error. This is graceful degradation: the enrichment comment shows “no sign-ins found” rather than the playbook crashing.

For a more robust approach, add a Condition action before the enrichment steps:

length(body('Entities_-_Get_Accounts')) is greater than 0

If true, proceed with enrichment. If false, add a comment: “No Account entity mapped for this incident. Manual enrichment required. Check the analytics rule entity mapping.” This tells the analyst exactly what happened and why.

Extracting IP entities

“Entities - Get IPs” extracts IP-type entities. Each IP has properties: Address (the IP string, e.g., “203.0.113.45”) and Location (an object with country, city, state, latitude, longitude, and ASN).

IP extraction is simpler than Account extraction because the Address property is a single string — no concatenation needed. Reference it as:

@{first(body('Entities_-_Get_IPs'))?['properties']?['Address']}

Use the IP for TI feed queries (VirusTotal, AbuseIPDB), firewall log correlation (KQL against CommonSecurityLog), and geo-location enrichment (the Location property provides country and city directly from the entity mapping).

Multiple IPs are common. A multi-stage attack alert may include the attacker’s external IP, the VPN-assigned internal IP, and the target server IP. Use a “For Each” loop to enrich all IPs:

For Each: body('Entities_-_Get_IPs')
  → HTTP: VirusTotal lookup for current item's Address
  → Compose: aggregate results

Extracting Host entities

“Entities - Get Hosts” extracts Host-type entities. Properties include HostName (the device name, e.g., “DESKTOP-NGE042”), DnsDomain (e.g., “northgateeng.com”), AzureID (the Entra device ID for MDE integration), and OSFamily (Windows, Linux).

For MDE API calls (isolate device, collect investigation package), you need either the AzureID or you need to query MDE to resolve the HostName to a device ID. The AzureID is more reliable because it is a unique identifier, while HostName may have duplicates.

FQDN construction: @{concat(first(body('Entities_-_Get_Hosts'))?['properties']?['HostName'], '.', first(body('Entities_-_Get_Hosts'))?['properties']?['DnsDomain'])} — produces “DESKTOP-NGE042.northgateeng.com”.

The multi-entity pattern

Many incidents contain multiple entity types. An AiTM incident typically has: one Account (the compromised user), one or more IPs (the attacker IP and potentially the legitimate user IP), and one Host (the device where the phishing page was accessed). A comprehensive enrichment playbook processes all entity types:

Step 1: Extract all entity types in parallel (Get Accounts, Get IPs, Get Hosts — three parallel branches).

Step 2: For each entity type, check if entities exist (length > 0).

Step 3: For each existing entity, run the appropriate enrichment (Account → sign-in history and risk, IP → TI lookup and geo, Host → device compliance and MDE data).

Step 4: Compose all results into a single enrichment comment.

This multi-entity pattern is the foundation for the enrichment pipeline in SA2. The playbook from SA1.4 extracts only Account entities — SA2 expands it to process all types.

⚠ Compliance Myth: "Entity extraction requires custom coding — it's too complex for SOC analysts"

The myth: Entity extraction involves JSON parsing, expression syntax, and Logic App dynamic content that requires developer skills.

The reality: The Sentinel connector provides dedicated actions for each entity type (Get Accounts, Get IPs, Get Hosts). These actions handle the JSON parsing automatically — the output is a structured array that you reference using the dynamic content picker, not raw JSON you parse manually. The expressions in this sub (first(), concat(), length()) are three functions that every Logic App builder learns in 10 minutes. The complexity is not in the syntax — it is in knowing that different analytics rules produce different entity types. This sub teaches that knowledge.

Decision point: Your enrichment playbook processes Account entities. An incident arrives from a firewall analytics rule that maps only IP entities — no Account. The playbook’s defensive check catches the missing Account and adds “No Account entity found.” But the incident DOES have IP entities that the playbook does not process. The decision: should you build one playbook that handles all entity types, or separate playbooks per entity type? One playbook with multi-entity extraction (Account + IP + Host in parallel) is more maintainable and produces a single enrichment comment with all context. Separate playbooks per entity type means multiple enrichment comments and multiple automation rule triggers. The single-playbook approach is recommended — it is what SA2 builds.

Try it: Test entity extraction against different incident types

In your Sentinel workspace, identify three incidents from different analytics rules:

  1. An identity alert (should have Account entity)
  2. A network/firewall alert (should have IP entity)
  3. An endpoint alert (should have Host and possibly Account entity)

For each incident, open the incident JSON (Sentinel → Incidents → select incident → Inspect). Find the Entities array. What entity types are present? What properties does each entity have?

Then test your enrichment playbook against each incident type. Does it handle the missing Account entity gracefully on the firewall alert?

Your enrichment playbook uses `first(body('Entities_-_Get_Accounts'))` to extract the user. An incident contains two Account entities: the attacker (the source of lateral movement) and the victim (the target of lateral movement). Which account does the playbook enrich?
Both accounts — `first()` returns all elements. `first()` returns only the first element of the array. The second account is not enriched.
Only the first account in the array. The order depends on the analytics rule's entity mapping. If you need to enrich both accounts, replace `first()` with a "For Each" loop that iterates over the entire Entities_-_Get_Accounts array and enriches each account individually.
The attacker account — Sentinel puts the source entity first. Entity order is not guaranteed to be source-first. The order depends on how the KQL query projects the entity columns and how Sentinel processes the mapping. Never assume ordering.
Neither — the playbook fails on multi-entity incidents. `first()` works correctly on arrays with 1 or more elements. It fails only on empty arrays (returns null), which the defensive check handles.

Where this goes deeper. SA2 builds the multi-entity enrichment pipeline — Account + IP + Host extraction in parallel, with enrichment queries specific to each entity type. SA5-SA7 extract entities for containment: Account → session revocation, Host → endpoint isolation, IP → firewall block. The entity extraction patterns from this sub are the foundation for every playbook in the course.

You're reading the free modules of this course

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.

View Pricing See Full Syllabus