Safety, Limitations & Responsible Use

15 min · F5

Safety, Limitations & Responsible Use

Every previous module taught you what Claude can do. This module teaches you what Claude gets wrong, when to distrust it, and the operational discipline required to use AI tools safely in security operations. This is the module that separates professionals from amateurs — because an amateur uses Claude until it fails. A professional knows where the failures are before they happen.


Hallucination patterns in security contexts

Claude generates statistically probable output, not factual output. When the correct answer is well-represented in training data (common KQL patterns, standard MITRE techniques), Claude is reliable. When the correct answer is rare, ambiguous, or requires current information, Claude generates plausible-sounding output that may be completely wrong.

Pattern 1: Invented table and column names. Claude generates a KQL query referencing AADUserRiskEvents — a table that does not exist. The real table is AADRiskyUsers or IdentityInfo. The query is syntactically perfect. The table name is statistically probable (it sounds like a real Microsoft table). It does not exist in any Sentinel workspace.

How to catch it: Run every query in your lab environment first. If the query fails with “table not found,” the table name is hallucinated. Check the Sentinel schema browser for the correct table.

Pattern 2: Outdated feature descriptions. Claude describes a Defender for Endpoint feature using its pre-May 2025 name or location. Microsoft moves features, renames portals, and restructures navigation frequently. Claude’s description may have been accurate when it was trained but is no longer correct.

How to catch it: When Claude describes a portal location (“Navigate to Settings → Endpoints → Advanced Features”), verify it in the actual portal. If the menu path does not match, check current Microsoft documentation for the updated location.

Pattern 3: Fabricated CVE details. Claude generates a CVE number (CVE-2025-XXXXX) with a description that sounds plausible but does not correspond to a real CVE. This is especially common for hypothetical or illustrative examples where Claude invents a CVE to make the example more concrete.

How to catch it: Verify every CVE number against the National Vulnerability Database (nvd.nist.gov) or MITRE’s CVE list. If the number does not return a result: it is fabricated.

Pattern 4: Confident but wrong analysis. Claude analyses a sign-in log entry and concludes “this is definitely token replay” when the evidence is ambiguous. The analysis reads authoritatively. The confidence level is not justified by the evidence. Claude does not say “I’m not sure” often enough — it tends toward confident conclusions regardless of evidence strength.

How to catch it: For any analytical conclusion, ask: “What evidence supports this conclusion? What alternative explanations exist?” If Claude cannot provide specific evidence or dismisses alternatives too quickly, the conclusion may be overconfident.


Knowledge boundaries — what Claude does not know

Your environment. Claude does not know your Sentinel table names, your watchlist contents, your conditional access policies, or your organisation’s network topology. Every query and recommendation must be adapted to your specific environment.

Current events. Claude’s training cutoff is May 2025. Threat actor TTPs, new vulnerabilities, Microsoft product changes, and regulatory updates from the past year are unknown unless Claude uses web search — and web search is not guaranteed to find the most current or accurate information.

Operational context. Claude does not know that the CEO is travelling, that the finance team processes payments on Tuesdays, that the new VPN was deployed last week and changed everyone’s external IP, or that three employees just returned from parental leave and are triggering anomalous sign-in patterns. This operational context — the context that determines whether an alert is a true positive or a false positive — must come from you.


Data privacy and handling

What happens to your data in Claude:

PlanData used for training?Data visible to Anthropic staff?
FreeYes (by default)Yes (for safety review)
ProYes (by default, can opt out)Yes (for safety review)
TeamNo (by default)Limited (safety review only)
EnterpriseNoNo (zero data retention available)

For security operations: If you are pasting sign-in logs, alert details, or investigation evidence into Claude, the data handling matters. On Free and Pro plans, assume Anthropic can see your input. This means: sanitise all data before uploading (replace real usernames, IPs, tenant identifiers), never paste credentials, API keys, or secrets, and never upload classified or legally privileged documents.

On Team and Enterprise plans: The data handling is more restrictive, but you should still follow the sanitisation discipline. Defence in depth applies to AI tools just as it applies to any other external service.

Shadow AI risk: If you use Claude for security work without your organisation’s knowledge, you are creating a shadow AI data flow. Log data, investigation details, and incident information flowing to a third-party service without organisational approval is a policy violation in most environments — and potentially a compliance violation. Get organisational approval before using Claude for work-related security analysis. Module S6 covers AI governance frameworks in detail.


When NOT to use Claude

Do not use Claude for real-time incident response decisions. Claude is a tool for analysis and documentation, not a decision-maker during an active incident. The decision to disable a VIP’s account, isolate a server, or trigger a major incident response requires human judgment that accounts for business context Claude cannot access.

Do not use Claude as your sole source of truth. Every Claude output in a security context must be verified. Use Claude to accelerate your work — generate the first draft of a query, the initial structure of a report, the preliminary analysis of log data — then verify and refine with human expertise.

Do not use Claude for tasks requiring legal certainty. If the output will be used in legal proceedings (employment tribunal, regulatory filing, law enforcement referral), every statement must be verified against primary sources. Claude can draft the structure; a human must verify every fact.

Do not paste unredacted production data on Free/Pro plans. If your organisation has not approved Claude for security operations, treat it as an unauthorized external service. The convenience of quick analysis does not justify the data handling risk.


The verification discipline

The single most important habit for AI-assisted security work:

Output → Verify → Deploy. Never Output → Deploy.

This applies to: KQL queries (run in lab first), report sections (verify facts against evidence), detection rules (test against historical data), containment recommendations (assess blast radius), and policy drafts (review against framework requirements).

Claude accelerates the creation step. It does not eliminate the verification step. The professionals who get the most value from Claude are not the ones who use it the most — they are the ones who verify the most consistently.


Building a verification checklist

For each category of Claude output, the verification steps are different:

KQL queries:

  1. Check table names against your Sentinel schema browser
  2. Check column names against the table’s actual schema
  3. Verify the time range matches your investigation window
  4. Run the query in your lab tenant before production
  5. Spot-check 5-10 results — does the data match what you expect?
  6. Verify the watchlist names match your environment

IR report sections:

  1. Every timestamp must match your evidence logs
  2. Every IP address must appear in your investigation data
  3. Every feature description must be verified against the current Defender portal
  4. Every recommendation must be operationally feasible in your environment
  5. No inferences presented as facts (Module S2 covers this in depth)

Detection rules:

  1. All verification steps from KQL queries (above)
  2. Run against 30 days of historical data — how many alerts would it generate?
  3. Review the false positive results — are they manageable?
  4. Verify entity mapping fields exist in the query output
  5. Confirm the MITRE technique mapping against the ATT&CK matrix

PowerShell scripts:

  1. Verify cmdlet names against your installed module version
  2. Check API permissions match your app registration
  3. Run in dev tenant first — never production
  4. Test error handling by deliberately causing failures (disconnect mid-run, use invalid credentials)
  5. Review for hardcoded secrets or credentials
The 5-minute rule

Verification typically takes 5 minutes per Claude output. The output itself takes 30 seconds to generate. The temptation is to skip the 5-minute verification because the 30-second generation felt effortless. Resist this temptation. The 5-minute investment prevents the 5-hour recovery from deploying a broken query, a wrong report claim, or a misconfigured rule. The cost of verification is always lower than the cost of correction.


Real-world failure examples

These are patterns that have caused real problems when Claude output was deployed without verification:

Failure 1: The phantom table join. Claude generated a KQL query that joined SigninLogs with a table called “UserDeviceRegistration.” The query syntax was perfect. The table does not exist in Microsoft Sentinel. The analyst deployed the query as a Sentinel analytics rule. The rule never fired — not because there were no matches, but because the query errored silently on the non-existent table. For two weeks, the analyst believed there were no suspicious device registrations. There were.

Failure 2: The confident CVE. Claude included a CVE reference (CVE-2025-29831) in a threat briefing for the CISO. The CVE sounded plausible — “Microsoft Entra ID token validation bypass.” The CISO presented it to the board. A board member looked it up. The CVE does not exist. Trust damage.

Failure 3: The overconfident triage. An analyst pasted an alert into Claude and asked for a triage assessment. Claude concluded: “This is a false positive — the sign-in is from a known VPN provider.” The analyst closed the alert. The IP was not a VPN provider — it was a residential proxy used by the attacker. Claude generated a confident-sounding explanation that happened to be wrong. The attacker maintained access for 3 additional days.

The common thread: Each failure occurred because the human trusted Claude’s output without verification. Not because Claude’s capability was insufficient, but because the verification step was skipped.

Try it yourself

Ask Claude to generate a KQL query for a security scenario you know well — a query pattern you have written before. Compare Claude's output to what you know is correct. Does Claude get the table names right? The column names? The filter logic? This exercise calibrates your sense of where Claude is reliable and where it needs correction in your specific environment.

For common query patterns (SigninLogs filtering, EmailEvents analysis), Claude is typically 85-95% correct. The remaining 5-15% is usually: wrong column names, slightly incorrect filter syntax, or missing your specific watchlist references. This calibration exercise teaches you what to check vs what to trust — which is the foundation of effective AI-assisted security work.

Try it yourself

Ask Claude: "List the 5 most common MITRE ATT&CK techniques used in AiTM phishing campaigns, with their technique IDs." Then verify each technique ID against the ATT&CK website (attack.mitre.org). Are all 5 correct? Are the technique IDs real? This tests Claude's factual accuracy on a topic where you can verify instantly.

Claude typically gets 4 out of 5 correct. The most common error: citing a parent technique when a sub-technique is more accurate (e.g., T1078 instead of T1078.004 for cloud accounts), or citing a technique that is tangentially related rather than directly applicable. This is the verification pattern for any Claude output that references specific identifiers — CVEs, MITRE techniques, Microsoft feature names, compliance control numbers.


Knowledge checks

Check your understanding

1. Claude generates a KQL query referencing a table called "DeviceAlertEvents." You have never seen this table in your Sentinel workspace. What do you do?

Check your Sentinel schema browser for the table. If it does not exist: it is a hallucination. The actual table may be "DeviceAlertEvents" (which does not exist), "AlertEvidence" (which does), or "SecurityAlert" (which does). Claude generated a statistically probable table name that does not correspond to a real table. This is the most common hallucination pattern in security KQL generation.
Deploy the query — Claude would not reference a fake table
Ask Claude if the table exists

2. You want to use Claude to analyse sign-in logs from a live incident. Your organisation is on a Pro plan and has not approved Claude for security operations. Should you proceed?

No. On a Pro plan, Anthropic may use your input for training. Pasting production sign-in logs — which contain real usernames, IPs, and tenant identifiers — into an unapproved external service creates a shadow AI data flow. This is likely a policy violation and potentially a compliance issue. Either get organisational approval first, use a Team/Enterprise plan with appropriate data handling, or sanitise the data before uploading.
Yes — the speed benefit outweighs the risk
Yes — sign-in logs are not sensitive data

3. During an active incident at 2am, the attacker is actively exfiltrating data. Claude recommends disabling the compromised account immediately. Should you follow Claude's recommendation?

Use your own judgment. Claude's recommendation may be technically correct, but it cannot assess: is this a VIP account? Is there a business-critical process running under this account? Is the attacker's access actually through this account or through an OAuth application? Will disabling the account trigger a bigger outage than the exfiltration? These are human judgment calls that require operational context Claude does not have. Claude accelerates analysis. Humans make containment decisions.
Yes — disable immediately, Claude is correct
Ignore Claude during active incidents

Key takeaways

Hallucinations are predictable. Table names, feature names, CVE numbers, and overconfident analysis. Know the patterns and verify accordingly.

Data privacy is your responsibility. Sanitise before uploading. Get organisational approval. Use the right plan for the sensitivity level.

Output → Verify → Deploy. This is the discipline. Claude accelerates creation. Humans verify before deployment. No exceptions in security operations.

Claude is a force multiplier, not a replacement. It makes you faster. It does not make decisions for you. The judgment — when to contain, when to escalate, when to trust the analysis — remains human.