Working with Files, Data, and Context
File upload capabilities
Claude.ai accepts files through drag-and-drop onto the chat input or the attachment icon. Multiple files can be uploaded in a single message. The practical limit is the context window — not the file type.
CSV is the best format for log data. Claude reads CSV as structured tabular data with column headers. When you export from Sentinel (Advanced Hunting → Export → CSV) or from Defender XDR, the file preserves column names and formatting. Claude can then reference specific columns accurately: “Show me all rows where IPAddress is not in the corporate range and ResultType indicates a failure.”
JSON works for API exports. Sentinel incident exports via the API, Microsoft Graph responses, and configuration files are typically JSON. Claude handles nested JSON structures well — it can traverse object hierarchies and extract specific fields without you flattening the data first.
PDF documents are extracted as text with basic structure preserved. Images embedded in PDFs are also processed. Upload compliance frameworks, vendor security assessments, policy documents, and audit reports. A 50-page PDF (roughly 15,000 words) uses approximately 20,000 tokens — about 10% of the context window for Sonnet, and 2% for Opus.
Images (PNG, JPG, WEBP) are processed visually. Upload screenshots of Defender XDR alert details, Sentinel incident graphs, sign-in log entries from the Entra portal, phishing email screenshots (for analysis of visual elements, sender information, and URLs), and architecture diagrams. Claude reads images but does not have pixel-perfect accuracy on dense UI screenshots with small text — for data extraction from complex screenshots, copy-paste the text data directly.
Code files in any language are read with syntax awareness. Upload KQL files, PowerShell scripts, Python modules, YAML configurations, and JSON schemas. Claude understands the structure and can analyze, modify, explain, or debug the code.
Uploading log data for investigation
Log data is the most common file upload for security work. The format and preparation matter significantly for analysis quality.
Pre-filter before uploading. If your Sentinel query returned 50,000 rows, do not upload all 50,000. Filter to the relevant subset — the suspicious sign-ins, the specific user’s activity, or the specific time window. Claude processes 2,000 rows well. At 50,000 rows, you exceed the context window and analysis quality degrades. Pre-filter in Sentinel or Excel before upload.
Always include column headers. Claude without column headers guesses at field meanings — sometimes correctly, sometimes not. With headers, Claude references fields accurately. If you paste data directly into the prompt (rather than uploading a file), include the header row.
For small datasets (under 100 rows), paste directly into the prompt. This is faster than creating and uploading a file. Format as a table or paste the raw output — Claude handles both.
For large datasets (100-2,000 rows), upload as a CSV file. This preserves structure better than pasting, and the file attachment does not consume your message character limit.
For very large datasets (over 2,000 rows), consider using Claude Code or Cowork instead of Claude.ai. Claude Code can process files from your local filesystem without the upload step, and Cowork can work with files in shared folders. Both surfaces handle larger datasets more effectively because they can process files incrementally rather than loading everything into the conversation context at once.
Sanitization — the non-negotiable step
Before uploading any data from a production environment, sanitize it. This applies to every plan tier below Enterprise with zero data retention.
What to sanitize: replace real usernames with fictional names (j.morrison@northgateeng.com), replace real IP addresses with RFC 5737 documentation ranges (192.0.2.x, 198.51.100.x, 203.0.113.x), replace real tenant domains with a fictional domain (northgateeng.com instead of your real domain), remove or replace device names, incident numbers, and any data that could identify your organization or individuals.
Why sanitization matters per plan tier: on Free and Pro plans, Anthropic may review your input for safety purposes and your data may be used for model training (opt-out available on Pro). On Team plans, data is not used for training by default but staff may access it for limited safety review. On Enterprise with zero data retention, data is not retained. Regardless of plan tier, treat sanitization as a defense-in-depth practice — even on Enterprise plans, sanitizing removes the risk entirely rather than relying on a vendor’s data handling commitments.
Maintain analytical value. Good sanitization replaces identifying data with realistic fictional values — not with redaction marks or “[REDACTED]” strings. Claude analyzes data more effectively when the fictional values look like real data. Use consistent fictional replacements across all data from the same investigation so that patterns remain visible (the same fictional user name for the same real user across multiple log exports).
Worked artifact — sanitization checklist for Claude uploads:
Before uploading any production data to Claude, verify each item:
Replace (do not blur or redact — use realistic fictional values):
- All user principal names and email addresses → fictional names at fictional domain
- All IP addresses → RFC 5737 ranges (192.0.2.x, 198.51.100.x, 203.0.113.x)
- Tenant name and domain → fictional organization name and domain
- Device names and device IDs → generic names (DESKTOP-NGE001)
- Incident numbers → fictional case numbers
- Application names that reveal internal systems → generic names
Preserve (safe to keep as-is):
- Timestamps (essential for timeline analysis)
- Column headers and field names (public Microsoft schema)
- Result types and error codes (public values)
- Authentication methods and protocol details
- Alert severity classifications
- MITRE ATT&CK technique references
Adapt this checklist for your organization. Print it and keep it next to your forensic workstation.
Context window management
The context window is the total amount of text Claude can process in a single conversation — your input plus Claude’s output combined. All current models support 200,000 tokens on standard plans, with Opus 4.6 and Sonnet 4.6 supporting up to one million tokens in beta configurations.
200,000 tokens is roughly 150,000 words. In practice, this means you can upload a substantial amount of data — multiple log exports, a compliance framework, and several reference documents — in a single conversation. But the context window is shared between input and output. A conversation with a large uploaded dataset leaves less room for Claude’s response.
The “lost in the middle” problem is a known limitation of large language models. Claude’s attention is strongest at the beginning and end of the context window. Information in the middle of a very long input is processed less reliably. For critical analysis: place the most important content (the specific log entries you want analyzed, the key evidence) at the beginning or end of your prompt. Put supplementary context (reference data, background information) in the middle.
When the conversation gets too long, start a new conversation within the same Project. The Project’s system prompt and reference documents carry forward — you do not lose your environment context. But the conversation history resets, freeing the context window for new data.
Choosing the right surface for data work
Different Claude surfaces handle data differently. Choosing the right one for the task improves both speed and quality.
Claude.ai is best for interactive data analysis — when you want to upload a dataset and have a back-and-forth conversation about what it contains. Upload the CSV, ask Claude to identify anomalies, follow up with specific questions about individual entries, ask for a summary of findings. The conversation flow is essential when the analysis is exploratory.
Claude Code is best for scripted data processing — when you have a defined transformation or analysis to apply. Claude Code can read files directly from your filesystem, process them with Python or PowerShell, and write the output. For security work: processing KAPE output, parsing EZTools CSV files, transforming log exports between formats, or running automated checks across multiple files.
Cowork is best for delegated data work — when you want to hand Claude a folder of files and get back finished analysis without a conversation. Give Cowork access to your evidence folder, describe the analysis you need, and check back when it is done. Cowork handles multi-file operations, parallel subtask coordination, and delivers results directly to your filesystem.
Try it: Upload and analyze sanitized log data
Export a small dataset from your Sentinel workspace (50-100 rows of SigninLogs, CSV format). Sanitize it using the checklist above — replace real usernames, IPs, and domains with fictional values. Upload the sanitized CSV to your Security Operations project in Claude.ai. Ask: "Identify any anomalous sign-in patterns in this dataset. Group findings by category: unusual IPs, unusual times, unusual applications, authentication anomalies." This exercises the full workflow: export → sanitize → upload → analyze. Verify Claude's findings against your own analysis of the same data.
Knowledge checks
Check your understanding
1. You need to analyze 15,000 rows of sign-in log data. What is the best approach?
2. A colleague uploads production sign-in logs to Claude's Free tier without sanitizing the data. What are the risks?
3. You have a folder of KAPE output (multiple CSV files from EZTools parsing) that you need processed into a unified investigation timeline. Which Claude surface is most appropriate?
Key takeaways
Sanitize → pre-filter → upload → analyze. This is the workflow for every data analysis task. Sanitization is non-negotiable on any plan below Enterprise with zero data retention.
CSV with headers is the best format for log data. Claude reads structured columns accurately when headers are present.
Pre-filter large datasets. Claude.ai handles 200-2,000 rows well. For larger datasets, use Claude Code or Cowork.
Choose the right surface. Claude.ai for interactive analysis. Claude Code for scripted processing. Cowork for delegated batch work.
The context window is shared. Large uploads leave less room for Claude’s response. Start new conversations within the same Project when the context fills up.