10.7 Search Jobs and Archived Data

14-18 hours · Module 10

Search Jobs and Archived Data

Introduction

Standard KQL queries in the Logs blade search data in the Analytics tier — data that is immediately queryable. But some data lives in the Archive tier (Module 7.4) — retained for compliance at lower cost but not directly queryable. When a hunting hypothesis requires searching 12 months of historical data and the last 9 months are archived, a standard query fails. Search jobs solve this: they run asynchronous KQL queries against archived data and store the results in a new table that you can query at your leisure.

When to use search jobs

Historical IOC hunting. A threat advisory reports IOCs from a campaign that was active 6 months ago. Your Analytics tier retains 90 days. The remaining data is in the Archive tier. A search job queries the archived data for the IOCs.

Extended timeline reconstruction. An investigation reveals a compromised account. The initial compromise may have occurred months before detection. A search job searches archived sign-in logs for the earliest sign-in from the attacker’s infrastructure.

Compliance investigations. Legal or regulatory investigations may require searching data that is older than the Analytics tier retention period.

Post-incident scope extension. After closing an incident, new intelligence suggests the attacker was active earlier than initially assessed. Search jobs extend the investigation timeline into archived data.

Creating a search job

Navigate to Sentinel → Logs → “Search” (the search job interface, distinct from the standard query interface).

Search job configuration:

Table: the table to search (e.g., SigninLogs, SecurityEvent, CommonSecurityLog).

Time range: the period to search. Can span the full retention period including archived data — up to 7 years if configured.

KQL query: the search criteria. Search job KQL supports a subset of operators — primarily where, extend, project, and search. Complex operators like join, summarize, and union are not supported in the search phase (but are available when querying the results table).

Example: search archived sign-in logs for a specific IP.

1
2
3
4
5
// Search job query — runs against archived data
SigninLogs
| where IPAddress == "203.0.113.47"
| project TimeGenerated, UserPrincipalName, IPAddress,
    AppDisplayName, ResultType, Location

Submit the search job. Sentinel runs the query asynchronously against all data (Analytics + Archive tiers) for the specified time range. This can take minutes to hours depending on the data volume.

Accessing search job results

Search job results are stored in a new table: SigninLogs_SRCH (the original table name with _SRCH suffix). This table appears in the workspace and can be queried with full KQL — including join, summarize, and all other operators.

1
2
3
4
5
6
7
8
9
// Query search job results — full KQL available
SigninLogs_SRCH
| summarize
    EarliestSignin = min(TimeGenerated),
    LatestSignin = max(TimeGenerated),
    TotalEvents = count(),
    UniqueUsers = dcount(UserPrincipalName),
    Users = make_set(UserPrincipalName, 20)
    by IPAddress

Results retention: Search job results tables are retained for 30 days, then automatically deleted. If you need the results longer, export them or copy key findings to bookmarks.

Advanced search job patterns

Pattern 1: Historical IOC sweep. Search 12 months of archived data for a list of IOCs from a threat advisory.

1
2
3
4
5
6
// Search job: sweep for campaign IOCs across 12 months
SigninLogs
| where IPAddress in ("198.51.100.47", "203.0.113.89", "192.0.2.156")
    or UserPrincipalName has_any ("attacker@evil.com", "phish@fake.com")
| project TimeGenerated, UserPrincipalName, IPAddress,
    AppDisplayName, ResultType, Location

Set the time range to 12 months. The search job scans all archived SigninLogs for any match. Results in the _SRCH table tell you exactly when (if ever) these IOCs appeared in your environment — even if the activity occurred 11 months ago.

Pattern 2: Historical timeline reconstruction. Determine when a compromised service principal was first abused.

1
2
3
4
// Search job: find earliest activity from compromised service principal
AADServicePrincipalSignInLogs
| where ServicePrincipalId == "compromised-sp-id"
| project TimeGenerated, IPAddress, ResourceDisplayName, ResultType

Set the time range to the service principal’s entire lifetime. The _SRCH results show the complete activity timeline — revealing when the attacker first obtained access and what resources they accessed over the entire compromise period.

Pattern 3: Cross-table historical correlation. Search archived email data alongside archived sign-in data.

Submit two separate search jobs (one for EmailEvents, one for SigninLogs) with the same IOC filter. When both complete, join the _SRCH result tables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Join search job results from two tables
SigninLogs_SRCH
| project SigninTime = TimeGenerated, UserPrincipalName, IPAddress
| join kind=inner (
    EmailEvents_SRCH
    | project EmailTime = TimeGenerated, RecipientEmailAddress,
        SenderMailFromAddress, Subject
) on $left.UserPrincipalName == $right.RecipientEmailAddress
| where abs(datetime_diff('hour', SigninTime, EmailTime)) < 24
| project UserPrincipalName, EmailTime, SenderMailFromAddress,
    Subject, SigninTime, IPAddress

This finds users who received a phishing email AND subsequently signed in from the attacker’s IP — correlating email delivery with credential compromise across archived data.

Search jobs vs data restore

Sentinel also supports restoring archived data to the Analytics tier for a temporary period. Understanding the difference between search jobs and restore helps you choose the right tool.

Search jobs: Run a specific KQL filter against archived data. Results stored in a _SRCH table. Cheaper for targeted queries (you pay for data scanned, not data restored). Best for: IOC hunting, specific entity investigation, focused historical queries.

Data restore: Temporarily moves archived data back to the Analytics tier, making it fully queryable with all KQL operators including joins and complex aggregations. More expensive (you pay for the data volume restored at Analytics tier rates). Best for: broad exploratory analysis of historical data, complex multi-table investigations that require the full KQL language against a large time window, and compliance investigations that need unrestricted access to historical records.

Decision rule: If you know what you are looking for (specific IOCs, specific entity, specific event type), use a search job. If you need to explore historical data without a specific filter, use restore.

Practical search job workflow: step by step

Step 1: Determine whether a search job is needed. Run the query against the Analytics tier first: SigninLogs | where TimeGenerated > ago(90d) | where IPAddress == "203.0.113.47" | count. If results exist, no search job needed — the data is in the Analytics tier. If zero results and the hypothesis requires a longer time range, proceed to Step 2.

Step 2: Estimate the scan volume. Check the Usage table: Usage | where DataType == "SigninLogs" | where TimeGenerated > ago(365d) | summarize MonthlyGB = sum(Quantity) / 1024 by bin(TimeGenerated, 30d). Multiply monthly GB by the number of months to search. If the cost is acceptable, proceed.

Step 3: Write the narrowest possible search query. Every additional filter reduces scan volume and cost. Instead of searching all SigninLogs for 12 months (potentially hundreds of GB), filter by the specific IP, user, or result type you are investigating.

Step 4: Submit the job. Navigate to Sentinel → Logs → Search tab. Enter the table, time range, and KQL query. Submit. Note the job ID.

Step 5: Monitor job progress. Check the Results tab in the Hunting blade. Large search jobs can take 30 minutes to several hours. Do not wait — continue other work and return when the job completes.

Step 6: Analyse _SRCH results. When the job completes, query the results table with full KQL. Apply the analysis techniques from subsection 10.3 (statistical outliers, stacking, temporal analysis) to the historical data.

Step 7: Bookmark and close. Create bookmarks for significant findings. Promote to incident if a threat is confirmed. Document in the hunt record.

Analysing search job results effectively

The _SRCH results table supports full KQL — use it to perform analysis that the search job’s limited KQL could not.

Timeline reconstruction from search results:

1
2
3
4
5
6
7
8
// Reconstruct the complete activity timeline from search job results
SigninLogs_SRCH
| where IPAddress == "203.0.113.47"
| project TimeGenerated, UserPrincipalName, IPAddress,
    AppDisplayName, ResultType, Location
| order by TimeGenerated asc
| extend PreviousEvent = prev(TimeGenerated)
| extend GapMinutes = datetime_diff('minute', TimeGenerated, PreviousEvent)

The gap analysis reveals activity patterns: regular 5-minute intervals suggest automated tool usage. Sporadic access with long gaps suggests manual attacker operation. Activity concentrated in a specific time zone reveals the attacker’s likely location.

User enumeration analysis from search results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Analyse which users the attacker targeted over the full history
SigninLogs_SRCH
| summarize
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    AttemptCount = count(),
    SuccessCount = countif(ResultType == "0"),
    FailCount = countif(ResultType != "0")
    by UserPrincipalName
| extend CampaignDays = datetime_diff('day', LastSeen, FirstSeen)
| order by FirstSeen asc

This reveals: which accounts the attacker targeted first (reconnaissance phase), which accounts were successfully compromised (successful entries), and the total campaign duration (first-seen to last-seen).

Scheduling recurring search jobs

For hypotheses that require periodic historical searches (e.g., monthly IOC sweep against the last 6 months of archived data), automate the search job submission.

Logic App approach: Create a Logic App that runs monthly. The Logic App calls the Sentinel REST API to submit a search job with the IOCs from the latest threat intelligence. When the job completes, the Logic App checks the _SRCH results table: if results exist, it creates a Sentinel incident.

This automates the IOC-driven hunting approach for archived data — ensuring historical IOC sweeps happen monthly without manual analyst intervention.

Search job cost management

Search jobs scan archived data — the cost is proportional to the volume scanned.

Minimise scan volume: Use the narrowest possible time range. If the threat advisory says the campaign was active in November 2025, search November 2025 — not the entire year. Use specific filters (IP, user, event type) to reduce the rows processed.

Estimate cost before submitting: Check the Usage table for the historical volume of the target table: Usage | where TimeGenerated > ago(365d) | where DataType == "SigninLogs" | summarize MonthlyGB = sum(Quantity) / 1024 by bin(TimeGenerated, 30d). Multiply the monthly volume by the number of months you plan to search to estimate the data scanned.

Budget alerting: If your organisation has a monthly budget for search jobs, track cumulative search job costs alongside regular ingestion costs in the monthly cost report (Module 8.10).

Search job management

Navigate to Sentinel → Hunting → Results tab to view active and completed search jobs.

Job states: Running (query executing against archived data), Completed (results available in the _SRCH table), Failed (query error or timeout), Cancelled (manually stopped).

Cost considerations: Search jobs incur charges for scanning archived data. The cost is proportional to the volume of data scanned. Narrow your search criteria (specific table, specific time range, specific filter conditions) to minimise the data scanned and therefore the cost.

Practical limits: Search jobs can take hours for very large time ranges (searching 12 months of SigninLogs for a 10,000-user organisation). Plan accordingly — submit the job and check results later, rather than waiting for completion.

Search jobs in the hunting workflow

Search jobs integrate into the hunting cycle at Step 3 (Execute and analyse) when the hypothesis requires historical data beyond the Analytics tier.

Workflow:

Step 1: Formulate hypothesis (subsection 10.4). Determine the required time range.

Step 2: Check whether the required time range is within the Analytics tier. If yes → standard KQL query. If no → search job.

Step 3: Create the search job with the narrowest possible filter (specific IP, user, or event type). Submit.

Step 4: When the job completes, query the results table with full KQL. Analyse as you would any hunting query results.

Step 5: Bookmark interesting findings. Promote to incident if a threat is confirmed.

Try it yourself

If your workspace has data in the Archive tier (requires retention configuration from Module 7.5), create a search job for a specific table with a time range that extends into the archived period. If no archived data is available, create a search job against the Analytics tier as practice — the workflow is identical. After the job completes, query the _SRCH results table and examine the results.

What you should observe

The search job appears in the Results tab with status "Running" then "Completed." The results table (TableName_SRCH) appears in the workspace. Full KQL is available against the results. In a lab without archived data, the search job still executes against Analytics tier data — the results will match what a standard query returns.

Knowledge check

Check your understanding

1. A threat advisory reports IOCs from a campaign active 8 months ago. Your Analytics tier retains 90 days. How do you search for the IOCs?

Create a search job. The IOCs are from 8 months ago — beyond the 90-day Analytics tier. A search job queries archived data asynchronously and stores results in a _SRCH table. Use the IOCs (IPs, domains, hashes) as the filter criteria in the search job query to minimise the data scanned and cost.

Run a standard KQL query with ago(8months)

The data is gone — cannot be searched

Restore the archived data to Analytics tier first

Search job against archived data. Standard KQL only queries Analytics tier. Archived data is searchable through search jobs without restoring it.

← 10.6 Livestream: Real-Time Hunting 10.8 Hunt Management and Collaboration →