Working with Files & Context

15 min · F4

Working with Files & Context

Claude’s power scales with the data you give it. A prompt without data produces generic advice. A prompt with your actual sign-in logs, your IR report draft, or your policy framework produces operational output. This module teaches you to get data into Claude effectively and manage the context window so Claude processes it reliably.

File upload — what works

In claude.ai: Drag and drop files onto the chat input, or click the attachment icon. Multiple files can be uploaded in a single message.

File Type	How Claude Processes It	Best For
CSV	Parsed as tabular data — Claude reads columns and rows	Log exports, alert lists, user inventories
JSON	Parsed structurally — Claude understands nested objects	API responses, Sentinel incident exports, configuration files
PDF	Extracted as text (images in PDFs are also processed)	Policies, compliance frameworks, vendor reports
DOCX	Extracted as text with basic structure preserved	IR reports, procedures, meeting notes
TXT/MD	Read as-is	KQL query files, notes, documentation
PNG/JPG	Processed as images — Claude describes and analyses visual content	Screenshots of portals, architecture diagrams, phishing emails
Code files	Read as text with syntax awareness	KQL files, PowerShell scripts, Python, YAML

The practical limit is context window size, not file type. A 50-page PDF (roughly 15,000 words) uses approximately 20,000 tokens — 10% of the context window. You can comfortably upload 5-10 documents of this size in a single conversation. A 500-page document may exceed the window — upload only the relevant sections.

Uploading log data for investigation

Log data is the most common file upload for security work. The format matters for quality of analysis.

CSV is the best format. Claude reads CSV as structured tabular data with column headers. When you export from Sentinel (Advanced Hunting → Export → CSV), the file preserves column names, data types, and formatting. Claude can then query specific columns: “Show me all rows where IPAddress is not in the corporate range.”

JSON works for API exports. If you are exporting Sentinel incidents via the API or Microsoft Graph, the output is JSON. Claude handles nested JSON well — it can traverse object hierarchies and extract specific fields.

Raw paste works for small datasets. For under 100 rows, pasting directly into the prompt is faster than creating a file. Format as a table or paste the raw output — Claude handles both.

What to do before uploading log data:

Sanitise. Remove PII, real usernames, real IP addresses, and tenant identifiers if you are using a non-Team plan. Replace with fictional values (Module 0 sanitisation methodology from the M365 Security Operations course applies here).
Pre-filter. If your Sentinel query returned 50,000 rows, do not upload all 50,000. Filter to the relevant subset — the suspicious sign-ins, the specific user’s activity, or the specific time window. Claude processes 2,000 rows well. At 50,000, you hit context limits and analysis quality degrades.
Include column headers. Always. Claude without column headers guesses at field meanings — sometimes correctly, sometimes not. With headers, Claude references fields accurately.

Sanitisation is not optional

On any plan below Team: assume Anthropic staff can read your input during safety reviews. Production sign-in logs contain real usernames, real IP addresses, and real tenant identifiers. Before uploading: replace usernames with fictional names, replace IPs with RFC 5737 documentation ranges (192.0.2.x, 198.51.100.x, 203.0.113.x), replace tenant domains with a fictional domain. The M365 Security Operations course (Module 0) provides a complete sanitisation methodology.

Try it yourself

Export a small dataset from your Sentinel workspace (50-100 rows of SigninLogs, CSV format). Sanitise it: replace real usernames, IPs, and domains with fictional values. Upload the sanitised CSV to Claude and ask: "Identify any anomalous sign-in patterns in this dataset. Group findings by category: unusual IPs, unusual times, unusual applications, authentication anomalies." This exercise practises the full workflow: export → sanitise → upload → analyse.

Claude reads the CSV headers, understands the column structure, and produces a categorised analysis. The quality depends on your data — a dataset with no anomalies produces a "no significant findings" result (which is valid). A dataset with a mix of corporate and external IPs, varied time zones, or different authentication methods gives Claude more to work with. The exercise validates that Claude handles your data format correctly and that the sanitisation preserved analytical value while removing identifiers.

Uploading images — portal screenshots

Claude processes images and can describe, analyse, and extract information from them. For security work, this means you can upload screenshots of:

Defender XDR alert details
Sentinel incident graphs
Sign-in log entries from the Entra portal
Phishing email screenshots (for analysis of visual elements, sender info, URLs)
Architecture diagrams

Practical limitation: Claude reads images but does not have pixel-perfect accuracy on dense UI screenshots. If a screenshot contains a table with 20 columns of small text, Claude may misread some values. For data extraction from dense screenshots, copy-paste the text data directly instead.

Effective image prompting:

I have uploaded a screenshot of a Sentinel incident.
1. What is the incident severity?
2. How many alerts are included?
3. What entities are involved (users, IPs, devices)?
4. Based on the alert titles, what is the likely attack technique?

This focused questioning produces better results than “what do you see?” — Claude analyses the image against specific questions rather than generating a generic description.

Context window management

The context window holds everything: your system prompt, uploaded documents, conversation history, and Claude’s responses. As a conversation progresses, earlier content gets pushed further back in the window. Understanding this determines when to start a new conversation vs continuing an existing one.

When to continue a conversation:

You are iterating on a single document or query (refining an IR report, tuning a KQL query)
You are building on previous analysis in the same investigation
The conversation is under 50 messages — context is still fresh

When to start a new conversation:

You are switching topics (from KQL writing to policy drafting)
The conversation exceeds 50+ messages — Claude may lose track of early content
You notice Claude contradicting earlier statements or forgetting constraints you set
You are starting a new investigation, even if it is related to a previous one

The “lost in the middle” problem: In very long conversations or with very long uploaded documents, Claude processes content at the beginning and end of the input more reliably than content in the middle. If you are uploading a 100-page document and asking about content on page 50, Claude may miss it. Mitigation: break the document into sections and upload the relevant section, or reference the specific section in your prompt: “On page 47, there is a table showing firewall rules. Analyse that table.”

Context recovery — when Claude loses track

Symptoms of context loss: Claude forgets your system prompt constraints (starts adding preamble you told it to skip), contradicts analysis from earlier in the conversation, or asks for information you already provided. This happens in long conversations as earlier content falls out of the effective attention window.

Recovery strategies:

Repeat the key constraint. If Claude starts adding preamble: “Remember: no preamble, deliver output directly.” This costs a few tokens and immediately re-establishes the behaviour.

Summarise and restart. If the conversation is heavily degraded: start a new conversation in the same Project. The project context (system prompt, documents) carries over. Paste a brief summary of where you left off: “We were analysing sign-in logs for j.morrison. The key findings so far: [paste summary]. Continue from here.”

Use the Project, not the conversation, for persistence. Upload reference documents to the Project, not into individual conversations. Documents in the Project are available to every new conversation — they do not degrade with conversation length.

Knowledge checks

Check your understanding

1. You have a Sentinel export with 50,000 rows of sign-in data. Should you upload the entire file to Claude?

No — pre-filter in Sentinel first. 50,000 rows exceeds Claude's effective processing range. Filter to the relevant subset (suspicious IPs, specific user, anomalous time window) and upload that subset. Claude processes 2,000-5,000 rows well. For larger datasets, use Sentinel for the initial analysis and bring the filtered results to Claude for deeper investigation.

Yes — the 200K context window can handle it

Only if you convert it to JSON first

Pre-filter first. Claude handles 2,000-5,000 rows well. Larger datasets should be filtered in Sentinel before uploading.

2. You are 60 messages into a conversation and Claude starts forgetting your "no preamble" instruction. What is the best fix?

Start a new conversation in the same Project. The Project system prompt and documents carry over automatically. Paste a brief summary of your progress so far. This gives Claude a fresh context window with all persistent context intact. Continuing to fight context degradation in a 60-message conversation is less effective than starting clean.

Repeat the instruction in every message

Switch to a more powerful model

New conversation in the same Project. Persistent context carries over. Paste a summary of progress. Clean context window = reliable output.

Key takeaways

CSV is the best format for log data. Column headers, structured rows, clean parsing.

Pre-filter before uploading. Claude handles 2,000-5,000 rows well. Pre-filter in Sentinel for larger datasets.

Sanitise before uploading. On non-Team plans, treat Claude as an external service. Remove PII and tenant identifiers.

Use Projects for persistent context. Upload reference documents to the Project, not individual conversations.

Start new conversations when context degrades. 50+ messages is the typical threshold. The Project context carries over automatically.

Multi-file investigation patterns

Complex investigations involve multiple data sources. Claude handles multiple files in a single conversation — but the prompting approach matters.

Pattern: Cross-source correlation

Upload two files (e.g., sign-in logs and email events) and ask Claude to correlate:

I am uploading two files:
1. signins.csv — sign-in log data for j.morrison, last 7 days
2. emails.csv — email events for j.morrison, last 7 days

Cross-correlate:
- Are there sign-in events from IPs that also appear as email
  sender IPs? (would indicate compromised account sending email)
- Are there sign-in events from IPs not in the email data?
  (would indicate token replay without email activity)
- Timeline: what happened first — the suspicious sign-in or
  the suspicious email?

Present findings as a chronological timeline.

Claude processes both files and produces the correlation. This is the analytical step that takes 30-60 minutes in Sentinel (writing the join query, waiting for results, interpreting the output). Claude does it from the raw data in the conversation.

Limitation: Claude correlates based on field values you describe. If your CSV files use different column names for the same concept (e.g., “IPAddress” in one and “SenderIPv4” in the other), tell Claude explicitly: “IPAddress in file 1 corresponds to SenderIPv4 in file 2.”

Pattern: Sequential file uploads for large investigations

For investigations that exceed a single conversation’s effective capacity, use sequential conversations in the same Project:

Conversation 1: Upload sign-in data. Ask for anomaly analysis. End with: “Summarise your findings in 5 bullet points.”
Conversation 2: Paste the 5-bullet summary. Upload email data. Ask for email analysis with the sign-in context.
Conversation 3: Paste summaries from both conversations. Ask for the combined investigation timeline and conclusions.

Each conversation gets a fresh context window. The summaries carry the essential findings forward without the raw data bulk. This pattern handles investigations with 10,000+ rows across multiple data sources — far beyond what fits in a single context window.

Structured output for downstream use

When Claude’s output feeds into another system (Sentinel, a report template, a ticketing system), request structured output that is directly pasteable.

For Sentinel custom logs:

Analyse this data. Return your findings as a JSON array
matching this schema:
{
  "timestamp": "ISO 8601",
  "finding": "description",
  "severity": "High|Medium|Low",
  "evidence": "specific log reference"
}
I will import this into Sentinel as a custom watchlist.

For ticketing systems:

Summarise this investigation for a ServiceNow incident ticket.
Fields: Short Description (1 line), Description (3-5 sentences),
Impact (business impact), Urgency (1-3), Assignment Group,
Category, Subcategory.

For executive dashboards:

Produce a metrics summary from this data as a markdown table
with columns: Metric | This Week | Last Week | Trend (↑ ↓ →)

Specifying the output format eliminates the reformatting step between Claude’s output and your destination system. The output goes directly where it needs to go.

Check your understanding

3. You need to correlate sign-in logs (15,000 rows) with email events (8,000 rows) and endpoint events (12,000 rows) for a complex investigation. How do you handle this with Claude?

Sequential conversations in the same Project. Pre-filter each dataset in Sentinel to the relevant subset (suspicious IPs, target user, anomalous time window). Upload each dataset in a separate conversation, ask for analysis, and summarise findings. Then combine summaries in a final conversation for the integrated timeline. This handles the volume while keeping each conversation within effective context limits.

Upload all 35,000 rows in one conversation

Only use Claude for the smallest dataset

Sequential conversations with pre-filtering. Each conversation analyses one data source. Summaries carry forward to the combined analysis. Project context persists across conversations.

← Safety, Limitations & Responsible Use Prompt Engineering Fundamentals →