In this module

1.3 Data Types in Depth

3-4 hours · Module 1 · Free

Operational Objective

This subsection teaches data types in depth — a core KQL skill for security investigation and detection engineering in Microsoft Sentinel and Defender XDR. Every concept is demonstrated against security log data from the Northgate Engineering environment.

Deliverable: Working proficiency with the KQL operators and patterns covered in this subsection, demonstrated through security investigation exercises.

Estimated completion: 25 minutes

Figure 1.3 — Operational workflow from input through documented output.

Data Types in Depth

Introduction

Figure — Data Types in Depth. Applied to security investigation workflows at NE.

Every column in every KQL table has a fixed data type. Understanding these types is not academic — it determines which operators work on which columns, why comparisons fail silently, and why some queries produce unexpected results. This subsection covers every data type you encounter in security logs.

The KQL data types

Type	What it stores	Size	Security log examples
`string`	Text of any length	Variable	UserPrincipalName, IPAddress, ResultType, AppDisplayName
`datetime`	Date and time (UTC)	8 bytes	TimeGenerated, CreatedDateTime
`timespan`	Duration	8 bytes	Result of `datetime_diff()`, the `ago()` argument
`int`	32-bit signed integer	4 bytes	Rarely used directly in security logs
`long`	64-bit signed integer	8 bytes	ResultType in some tables, byte counts
`real`	64-bit floating point	8 bytes	Latitude, longitude, risk scores
`bool`	True/false	1 byte	IsInteractive, IsCompliant, IsManaged
`dynamic`	JSON object or array	Variable	LocationDetails, DeviceDetail, AuthenticationDetails
`guid`	128-bit unique identifier	16 bytes	CorrelationId, TenantId
`decimal`	128-bit high-precision decimal	16 bytes	Rare in security logs

The types that matter most for security work

String — the most common type and the most deceptive. ResultType in SigninLogs is a string, even though it contains numeric values like "0" and "50126". This means where ResultType == 0 fails silently (comparing string to int), while where ResultType == "0" works. Always check the data type before writing comparisons.

// These are all equivalent ways to filter to the last 24 hours:
| where TimeGenerated > ago(24h)
| where TimeGenerated > ago(1d)
| where TimeGenerated > datetime(2026-03-22T00:00:00Z)
| where TimeGenerated between (ago(24h) .. now())

{
  "city": "London",
  "state": "England",
  "countryOrRegion": "GB",
  "geoCoordinates": {
    "latitude": 51.5074,
    "longitude": -0.1278
  }
}

// Check the type of a specific column
SigninLogs
| getschema
| where ColumnName == "ResultType"

// Check the runtime type of a value
SigninLogs
| take 1
| project ResultType, TypeOfResult = gettype(ResultType)

// This returns ZERO rows even if there are successful sign-ins
SigninLogs
| where ResultType == 0    // comparing string "0" to int 0

// This works correctly
SigninLogs
| where ResultType == "0"  // comparing string "0" to string "0"

SigninLogs
| extend City = tostring(LocationDetails.city)
| extend Lat = toreal(LocationDetails.geoCoordinates.latitude)
| extend AuthMethod = tostring(
    AuthenticationDetails[0].authenticationMethod)

// If AdditionalDetails is a string containing JSON:
| extend Parsed = parse_json(AdditionalDetails)
| extend Value = tostring(Parsed.someField)

// Timespan literals
print 1d          // 1 day
print 12h         // 12 hours
print 30m         // 30 minutes
print 1d12h30m    // 1 day, 12 hours, 30 minutes

// Timespan from datetime arithmetic
SigninLogs
| take 1
| extend Duration = now() - TimeGenerated  // timespan: how long ago was this event?
| extend DurationHours = Duration / 1h     // convert to hours (real type)

// CorrelationId links related sign-in events
SigninLogs
| where CorrelationId == "a8f5b3c2-1d4e-4a5f-9c7b-3e8d2f1a6c9e"

// Find all events with the same correlation ID
SigninLogs
| where TimeGenerated > ago(1h)
| summarize EventCount = count() by CorrelationId
| where EventCount > 5  // correlations with many events — complex auth flows

SigninLogs
| take 1
| project 
    ResultType_Type = gettype(ResultType),            // "string"
    TimeGenerated_Type = gettype(TimeGenerated),      // "datetime"  
    LocationDetails_Type = gettype(LocationDetails),  // "dynamic"
    IPAddress_Type = gettype(IPAddress)               // "string"

// real is fine for percentages
| extend FailRate = round(100.0 * FailCount / TotalCount, 2)

// real can lose precision with very large integers
print real(9007199254740993)   // → 9007199254740992 (last digit lost)
print long(9007199254740993)   // → 9007199254740993 (exact)

print typeof(1 + tolong(2))        // long
print typeof(tolong(1) + 2.0)      // real
print typeof(2.0 + todecimal(3))   // decimal

// ConditionalAccessPolicies is a dynamic array
// Each element is an object with displayName, result, etc.
SigninLogs
| take 1
| project array_length(ConditionalAccessPolicies)  // how many CA policies evaluated

// LocationDetails is a dynamic object
// Keys: city, state, countryOrRegion, geoCoordinates
SigninLogs
| take 1
| project bag_keys(LocationDetails)  // lists all keys in the object

// DeviceDetail contains objects with nested values
// DeviceDetail.operatingSystem, DeviceDetail.browser, DeviceDetail.isCompliant
SigninLogs
| take 1
| extend 
    OS = tostring(DeviceDetail.operatingSystem),
    Browser = tostring(DeviceDetail.browser),
    TrustType = tostring(DeviceDetail.trustType)

// Each sign-in evaluates multiple CA policies
// mv-expand creates one row per policy
SigninLogs
| where TimeGenerated > ago(1h)
| where ResultType == "0"
| mv-expand CAPolicy = ConditionalAccessPolicies
| extend PolicyName = tostring(CAPolicy.displayName)
| extend PolicyResult = tostring(CAPolicy.result)
| summarize count() by PolicyName, PolicyResult
| sort by count_ desc

Expand for Deeper Context

Datetime — every security event has a TimeGenerated column of type datetime. KQL has rich datetime support: ago() for relative time, between() for ranges, datetime_diff() for intervals, format_datetime() for display, bin() for grouping into time buckets.

Dynamic — the type that contains the most valuable data and causes the most confusion. A dynamic column stores a JSON object or array. LocationDetails in SigninLogs is dynamic:

Accessing nested fields uses dot notation: LocationDetails.city. But the result is also dynamic — you must cast it to string for comparison: tostring(LocationDetails.city) == "London".

Bool — straightforward but with a KQL-specific behavior. In many languages, you write where IsInteractive == true. In KQL, you can write where IsInteractive (the == true is implicit). For false: where not(IsInteractive) or where IsInteractive == false.

Checking data types in practice

When a query produces unexpected results, the first diagnostic is always: are the data types what you expect?

The gettype() function returns the runtime type of a value. Use it when you are not sure whether a column is string, int, or dynamic.

Type mismatches — the silent failures

KQL does not always throw errors on type mismatches. Some comparisons silently return false:

The query engine does not warn you. The first query runs successfully, returns zero rows, and you might conclude there are no successful sign-ins in the time range — a dangerous incorrect conclusion during an investigation.

Rule: always match the literal type to the column type. Check with getschema if unsure.

The dynamic type deep dive

Dynamic columns appear in almost every security table. They contain the richest data — device details, location coordinates, authentication methods, conditional access policy results, risk details.

Accessing nested JSON fields:

Note the difference: object fields use dot notation (LocationDetails.city), array elements use bracket notation (AuthenticationDetails[0]).

When dynamic fields are actually strings:

Some columns that look dynamic are actually strings containing JSON. Before you can use dot notation, you must parse the string:

Module 5 covers parse_json in depth. For now, remember: if dot notation does not work on a column that looks like JSON, try parse_json() first.

The `timespan` type

Timespan represents a duration — the result of subtracting two datetimes or specifying an interval directly:

The division Duration / 1h converts a timespan to a real number representing hours. Similarly, Duration / 1m gives minutes, Duration / 1s gives seconds. This is essential for calculations like "average session duration" or "time between events."

The `guid` type

GUIDs (globally unique identifiers) appear throughout security logs as correlation IDs, tenant IDs, and session IDs. They are 128-bit values displayed as hex strings with dashes: a8f5b3c2-1d4e-4a5f-9c7b-3e8d2f1a6c9e.

GUIDs are compared with == (exact match). They are case-insensitive by nature. Do not use has or contains with GUIDs — use == for exact matches or in for lists.

Type detection in practice — the `gettype()` function

When debugging unexpected query behavior, gettype() reveals the actual runtime type of a value:

Run this on any table to verify your assumptions about column types. The output resolves ambiguity instantly — if you thought ResultType was an integer and gettype shows "string," you have found the source of your comparison bug.

The `real` and `decimal` types — when precision matters

real (also called double) is a 64-bit floating-point number. It handles most numeric calculations but has limited precision — approximately 15 significant digits. For security operations, real is sufficient for percentages, ratios, and scores.

decimal provides 128-bit precision (28-29 significant digits). Use it when calculating financial values or when comparing very large numbers where rounding errors would produce incorrect results. In practice, security log analysis rarely needs decimal — real handles percentages, averages, and ratios without meaningful precision loss.

For counting (events, users, devices), always use long. For ratios and percentages, real is correct. If you encounter a calculation where the last digit matters (comparing exact transaction IDs, for example), use long or decimal.

Type hierarchy and implicit conversion

KQL has an implicit type hierarchy for arithmetic operations. When two different numeric types appear in an expression, KQL automatically promotes the narrower type to the wider type:

int → long → real → decimal

This means: - int + long produces long - long + real produces real - real + decimal produces decimal

The promotion is safe (no data loss going from narrow to wide) but be aware that going from long to real can lose precision for very large integers. If you need to preserve a large integer value (like a correlation ID stored as a number), keep it as long — do not mix it into a real expression.

Dynamic arrays and objects — the deep dive

Dynamic is the most complex type because it contains structured data — arrays, objects, or nested combinations of both. Security logs use dynamic extensively:

Arrays contain ordered lists of values:

Objects contain key-value pairs:

Nested structures combine both:

mv-expand is the operator that unpacks arrays into individual rows:

This shows which conditional access policies are firing most frequently and what their outcomes are. A policy that evaluates thousands of times with result "notApplied" is either misconfigured (scope is wrong) or redundant. A policy with result "failure" is actively blocking access — verify those blocks are intentional.

Understanding dynamic is not optional. The most important investigation data — location, device, authentication method, conditional access outcome — lives inside dynamic columns. If you cannot extract it, you cannot investigate.

Common type pitfalls in security queries

Pitfall 1: Numeric strings in categorisation fields.

Many Entra ID fields store numeric codes as strings: ResultType, ErrorCode, RiskLevelDuringSignIn. Always check with getschema before writing numeric comparisons.

Pitfall 2: Dynamic fields that look like strings.

Some columns appear to contain simple strings but are actually dynamic type. The ConditionalAccessPolicies column in SigninLogs is a dynamic array of objects — it looks like a string in the query results but requires array access operators ([0], mv-expand) to work with.

Pitfall 3: Mixed types in union queries.

When you union two tables, columns with the same name must have compatible types. If SigninLogs.IPAddress is string and OfficeActivity.ClientIP is also string, they are compatible. If one is string and the other is dynamic, the union may fail or produce unexpected results. Always verify types with getschema on both tables before union operations.

Pitfall 4: Implicit type widening.

KQL automatically widens numeric types in expressions: int + long produces long, long + real produces real. This is usually harmless, but be aware that precision can change — real has limited precision for very large integers.

Try it yourself

Run getschema on SigninLogs and identify eve...

Run getschema on SigninLogs and identify every dynamic column. Pick three of them, retrieve a sample row (take 1), and examine the JSON structure. For each, write a query that extracts one nested field using dot notation and tostring().

Common dynamic columns in SigninLogs: LocationDetails, DeviceDetail, ConditionalAccessPolicies, AuthenticationDetails, MfaDetail.

Check your understanding

1. You run SigninLogs | where ResultType == 50126 | count and get 0. You know there are failed sign-ins in the data. What is wrong?

ResultType is a string column. The value "50126" is stored as a string, not an integer. Comparing it to the integer 50126 produces a type mismatch that silently returns false for every row. The correct query is where ResultType == "50126". This is the most common type-related error in KQL security queries — ResultType looks numeric but is stored as string. Always verify column types with getschema before writing comparisons.

There are no sign-ins with ResultType 50126 in the time range

The count operator is not compatible with where filters

String-to-integer comparison silently fails. ResultType is string in SigninLogs. Use "50126" not 50126.

Compliance Myth: "You can learn data types in depth from documentation alone"

The myth: Reading the Microsoft KQL documentation is sufficient to learn this concept. The documentation explains the syntax — what more do you need?

The reality: Documentation teaches SYNTAX. This course teaches APPLICATION. The KQL docs show how summarize works with generic examples. This course shows how summarize reveals a password spray pattern hiding in 45,000 sign-in events. The difference between knowing the syntax and knowing WHEN and WHY to use it in a security context is the difference between an analyst who copies queries and one who writes them. Every operator in this course is taught through a security investigation scenario — not abstract data manipulation.

Troubleshooting

"The query returns an error I do not understand." KQL error messages reference the specific line and operator that failed. Read the error message from left to right: it names the operator, the expected input type, and the actual input type. Most errors are type mismatches (passing a string where a datetime is expected) or field name typos. The getschema operator shows every field name and type for any table: TableName | getschema.

"The query runs but returns unexpected results." Add | take 10 after each operator in the pipeline and examine the intermediate output. This reveals WHERE the data transforms in a way you did not expect. Debug the pipeline stage by stage, not the entire query at once.

Decision point

You are writing a KQL query and cannot remember the exact operator syntax. Do you check the Microsoft documentation or ask Claude?

Check the documentation first for SYNTAX questions (exact parameter names, return types, operator behavior). The KQL documentation at learn.microsoft.com/kusto is the authoritative source. Claude may generate plausible but incorrect syntax — especially for newer operators or recently changed behavior. Use Claude for DESIGN questions: 'I need to detect accounts with sign-ins from more than 3 countries in 24 hours — what is the best KQL approach?' Claude excels at designing the query strategy; the documentation excels at providing the exact syntax.

You've learned how KQL processes data.

K0 gave you the query language's place in the Microsoft security stack. K1 took you through the semantics — tables, operators, the pipe model, and why KQL isn't SQL. Now you write the queries that find what attackers hope you miss.

12 modules of query craft — filtering and shaping, joins and unions, time-series analysis, summarisation, string manipulation, and geospatial analysis
68 KQL exercises — every one with a realistic dataset, a reference solution, and a discussion of alternative approaches
K11 — Threat Hunting with KQL — the course's flagship module. Hypothesis-driven methodology, MITRE ATT&CK-aligned hunting across 7 techniques, UEBA composite risk scoring, and retroactive IOC sweeps
K13 Capstone — The Hunting Lab — three complete investigation scenarios requiring every query skill from the course
Hunt management and ROI metrics — the operating model that justifies KQL hunt programs to leadership

Unlock the full course with Premium See Full Syllabus

← Previous Next →