Working with Files, Data, and Context

20 min · F4

Module Objective

Claude's power scales with the data you give it. A prompt without data produces generic advice. A prompt with your actual sign-in logs, your IR report draft, or your compliance framework produces operational output tailored to your specific situation. But uploading data to an AI tool in a security context requires understanding file handling capabilities, context window limits, sanitization requirements, and the data privacy implications of each plan tier. This module teaches you to get data into Claude effectively and manage the context so Claude processes it reliably.

Deliverable: The ability to upload and analyze files effectively (CSV log exports, PDF policies, code files, portal screenshots), manage the context window for large datasets, sanitize data before upload, and choose the right Claude surface (Claude.ai, Cowork, or Claude Code) for data-heavy tasks.

⏱ Estimated completion: 20 minutes

File upload capabilities

Claude.ai accepts files through drag-and-drop onto the chat input or the attachment icon. Multiple files can be uploaded in a single message. The practical limit is the context window — not the file type.

CSV is the best format for log data. Claude reads CSV as structured tabular data with column headers. When you export from Sentinel (Advanced Hunting → Export → CSV) or from Defender XDR, the file preserves column names and formatting. Claude can then reference specific columns accurately: “Show me all rows where IPAddress is not in the corporate range and ResultType indicates a failure.”

JSON works for API exports. Sentinel incident exports via the API, Microsoft Graph responses, and configuration files are typically JSON. Claude handles nested JSON structures well — it can traverse object hierarchies and extract specific fields without you flattening the data first.

PDF documents are extracted as text with basic structure preserved. Images embedded in PDFs are also processed. Upload compliance frameworks, vendor security assessments, policy documents, and audit reports. A 50-page PDF (roughly 15,000 words) uses approximately 20,000 tokens — about 10% of the context window for Sonnet, and 2% for Opus.

Images (PNG, JPG, WEBP) are processed visually. Upload screenshots of Defender XDR alert details, Sentinel incident graphs, sign-in log entries from the Entra portal, phishing email screenshots (for analysis of visual elements, sender information, and URLs), and architecture diagrams. Claude reads images but does not have pixel-perfect accuracy on dense UI screenshots with small text — for data extraction from complex screenshots, copy-paste the text data directly.

Code files in any language are read with syntax awareness. Upload KQL files, PowerShell scripts, Python modules, YAML configurations, and JSON schemas. Claude understands the structure and can analyze, modify, explain, or debug the code.

Uploading log data for investigation

Log data is the most common file upload for security work. The format and preparation matter significantly for analysis quality.

Pre-filter before uploading. If your Sentinel query returned 50,000 rows, do not upload all 50,000. Filter to the relevant subset — the suspicious sign-ins, the specific user’s activity, or the specific time window. Claude processes 2,000 rows well. At 50,000 rows, you exceed the context window and analysis quality degrades. Pre-filter in Sentinel or Excel before upload.

Always include column headers. Claude without column headers guesses at field meanings — sometimes correctly, sometimes not. With headers, Claude references fields accurately. If you paste data directly into the prompt (rather than uploading a file), include the header row.

For small datasets (under 100 rows), paste directly into the prompt. This is faster than creating and uploading a file. Format as a table or paste the raw output — Claude handles both.

For large datasets (100-2,000 rows), upload as a CSV file. This preserves structure better than pasting, and the file attachment does not consume your message character limit.

For very large datasets (over 2,000 rows), consider using Claude Code or Cowork instead of Claude.ai. Claude Code can process files from your local filesystem without the upload step, and Cowork can work with files in shared folders. Both surfaces handle larger datasets more effectively because they can process files incrementally rather than loading everything into the conversation context at once.

Sanitization — the non-negotiable step

Before uploading any data from a production environment, sanitize it. This applies to every plan tier below Enterprise with zero data retention.

What to sanitize: replace real usernames with fictional names (j.morrison@northgateeng.com), replace real IP addresses with RFC 5737 documentation ranges (192.0.2.x, 198.51.100.x, 203.0.113.x), replace real tenant domains with a fictional domain (northgateeng.com instead of your real domain), remove or replace device names, incident numbers, and any data that could identify your organization or individuals.

Why sanitization matters per plan tier: on Free and Pro plans, Anthropic may review your input for safety purposes and your data may be used for model training (opt-out available on Pro). On Team plans, data is not used for training by default but staff may access it for limited safety review. On Enterprise with zero data retention, data is not retained. Regardless of plan tier, treat sanitization as a defense-in-depth practice — even on Enterprise plans, sanitizing removes the risk entirely rather than relying on a vendor’s data handling commitments.

Maintain analytical value. Good sanitization replaces identifying data with realistic fictional values — not with redaction marks or “[REDACTED]” strings. Claude analyzes data more effectively when the fictional values look like real data. Use consistent fictional replacements across all data from the same investigation so that patterns remain visible (the same fictional user name for the same real user across multiple log exports).

Worked artifact — sanitization checklist for Claude uploads:
Before uploading any production data to Claude, verify each item:
Replace (do not blur or redact — use realistic fictional values):
All user principal names and email addresses → fictional names at fictional domain
All IP addresses → RFC 5737 ranges (192.0.2.x, 198.51.100.x, 203.0.113.x)
Tenant name and domain → fictional organization name and domain
Device names and device IDs → generic names (DESKTOP-NGE001)
Incident numbers → fictional case numbers
Application names that reveal internal systems → generic names
Preserve (safe to keep as-is):
Timestamps (essential for timeline analysis)
Column headers and field names (public Microsoft schema)
Result types and error codes (public values)
Authentication methods and protocol details
Alert severity classifications
MITRE ATT&CK technique references
Adapt this checklist for your organization. Print it and keep it next to your forensic workstation.

Context window management

The context window is the total amount of text Claude can process in a single conversation — your input plus Claude’s output combined. All current models support 200,000 tokens on standard plans, with Opus 4.6 and Sonnet 4.6 supporting up to one million tokens in beta configurations.

200,000 tokens is roughly 150,000 words. In practice, this means you can upload a substantial amount of data — multiple log exports, a compliance framework, and several reference documents — in a single conversation. But the context window is shared between input and output. A conversation with a large uploaded dataset leaves less room for Claude’s response.

The “lost in the middle” problem is a known limitation of large language models. Claude’s attention is strongest at the beginning and end of the context window. Information in the middle of a very long input is processed less reliably. For critical analysis: place the most important content (the specific log entries you want analyzed, the key evidence) at the beginning or end of your prompt. Put supplementary context (reference data, background information) in the middle.

When the conversation gets too long, start a new conversation within the same Project. The Project’s system prompt and reference documents carry forward — you do not lose your environment context. But the conversation history resets, freeing the context window for new data.

Choosing the right surface for data work

Different Claude surfaces handle data differently. Choosing the right one for the task improves both speed and quality.

Claude.ai is best for interactive data analysis — when you want to upload a dataset and have a back-and-forth conversation about what it contains. Upload the CSV, ask Claude to identify anomalies, follow up with specific questions about individual entries, ask for a summary of findings. The conversation flow is essential when the analysis is exploratory.

Claude Code is best for scripted data processing — when you have a defined transformation or analysis to apply. Claude Code can read files directly from your filesystem, process them with Python or PowerShell, and write the output. For security work: processing KAPE output, parsing EZTools CSV files, transforming log exports between formats, or running automated checks across multiple files.

Cowork is best for delegated data work — when you want to hand Claude a folder of files and get back finished analysis without a conversation. Give Cowork access to your evidence folder, describe the analysis you need, and check back when it is done. Cowork handles multi-file operations, parallel subtask coordination, and delivers results directly to your filesystem.

Compliance Myth

"Sanitizing data before uploading to Claude removes all risk."

Production reality: Sanitization significantly reduces risk but does not eliminate it. Temporal patterns in sanitized data can still reveal organizational rhythms (shift changes, business hours, meeting schedules). Statistical properties of the data (alert volumes, user counts, geographic distribution) can reveal organizational characteristics. And sanitization failures — a single unsanitized field in a thousand-row export — happen. Sanitization is a defense-in-depth layer, not a guarantee. Combine it with appropriate plan tier selection (Team or Enterprise for sensitive data), organizational approval for Claude use, and the principle of minimum necessary data (upload only the rows and columns needed for the specific analysis, not the entire export).

Try it: Upload and analyze sanitized log data

Export a small dataset from your Sentinel workspace (50-100 rows of SigninLogs, CSV format). Sanitize it using the checklist above — replace real usernames, IPs, and domains with fictional values. Upload the sanitized CSV to your Security Operations project in Claude.ai. Ask: "Identify any anomalous sign-in patterns in this dataset. Group findings by category: unusual IPs, unusual times, unusual applications, authentication anomalies." This exercises the full workflow: export → sanitize → upload → analyze. Verify Claude's findings against your own analysis of the same data.

Knowledge checks

Check your understanding

1. You need to analyze 15,000 rows of sign-in log data. What is the best approach?

Pre-filter in Sentinel to the relevant subset (the specific user, the suspicious time window, the non-corporate IPs) and upload the filtered dataset to Claude.ai. 15,000 rows exceeds what Claude processes effectively in a conversation. Pre-filtering to the 200-500 most relevant rows produces better analysis than uploading everything. Alternatively, use Claude Code to process the full dataset with a script.

Upload all 15,000 rows as a CSV — Claude can handle it

Paste the data directly into the prompt

Pre-filter to the relevant subset for Claude.ai analysis. Large datasets are processed more effectively when focused. For full-dataset processing, use Claude Code with a scripted approach.

2. A colleague uploads production sign-in logs to Claude's Free tier without sanitizing the data. What are the risks?

No risk — Claude does not store data

On the Free tier, input data may be used for model training and may be reviewed by Anthropic staff for safety purposes. Unsanitized production logs contain real usernames, IP addresses, tenant identifiers, and potentially sensitive authentication details. This creates a data exposure risk and likely violates your organization's data handling policies. The colleague should delete the conversation, sanitize the data, and re-upload — or use a Team/Enterprise plan with appropriate data governance.

The risk is minimal if the data is only sign-in logs

Free tier data may be used for training and reviewed by staff. Unsanitized production logs create data exposure risk and likely violate organizational policy. Sanitize before uploading on any plan, and use Team/Enterprise for sensitive data.

3. You have a folder of KAPE output (multiple CSV files from EZTools parsing) that you need processed into a unified investigation timeline. Which Claude surface is most appropriate?

Claude.ai — upload each file individually

Claude Code or Cowork. Both can access local files directly (no upload needed), process multiple files, and produce unified output. Claude Code is better if you want a script that can be rerun on future cases. Cowork is better if you want to delegate the task and check back when it is done. Claude.ai is designed for conversational analysis of individual files — multi-file batch processing is better suited to the agentic surfaces.

Claude in Chrome — read the files in the browser

Claude Code (for scripted, repeatable processing) or Cowork (for delegated batch processing) — both access local files directly and handle multi-file operations effectively.

Key takeaways

Sanitize → pre-filter → upload → analyze. This is the workflow for every data analysis task. Sanitization is non-negotiable on any plan below Enterprise with zero data retention.

CSV with headers is the best format for log data. Claude reads structured columns accurately when headers are present.

Pre-filter large datasets. Claude.ai handles 200-2,000 rows well. For larger datasets, use Claude Code or Cowork.

Choose the right surface. Claude.ai for interactive analysis. Claude Code for scripted processing. Cowork for delegated batch work.

The context window is shared. Large uploads leave less room for Claude’s response. Start new conversations within the same Project when the context fills up.

← Safety, Limitations & Responsible Use Prompt Engineering for Security Professionals →