Safety, Limitations & Responsible Use

20 min · F5

Module Objective

Every previous module taught what Claude can do. This module teaches what Claude gets wrong, when to distrust it, and the operational discipline required to use AI tools safely in security operations. This is the module that separates professionals from amateurs — because an amateur uses Claude until it fails. A professional knows where the failures are before they happen and builds the verification habit that catches them.

Deliverable: Understanding of Claude's four hallucination patterns in security contexts, knowledge boundaries, data privacy implications per plan tier, situations where Claude should not be used, and the organizational governance required before deploying AI tools in a security team.

⏱ Estimated completion: 20 minutes

Hallucination patterns in security contexts

Claude generates statistically probable output, not factual output. When the correct answer is well-represented in training data (common KQL patterns, standard MITRE techniques), Claude is reliable. When the correct answer is rare, ambiguous, or requires current information, Claude generates plausible-sounding output that may be completely wrong.

Invented table and column names. Claude generates a KQL query referencing a table name that sounds like it should exist — it follows Microsoft’s naming conventions, it relates to the right security domain — but the table does not exist in any Sentinel workspace. The query is syntactically perfect. The table name is a hallucination. This happens because Claude learned the pattern of Microsoft table naming (noun + noun, camelCase) and generated a new name that fits the pattern but does not correspond to a real table. The same applies to column names, PowerShell cmdlets, and API endpoints. How to catch it: run every query in your lab environment first. If the query fails with a “table not found” or “column not found” error, the reference is hallucinated.

Outdated feature descriptions. Claude describes a Defender feature using its pre-May 2025 name, location, or behavior. Microsoft moves features, renames portals, and restructures navigation frequently. A portal path that was correct when Claude’s training data was compiled may no longer exist. How to catch it: when Claude describes a portal location or feature behavior, verify it in the actual portal or against Microsoft’s current documentation. Use web search to check for recent changes.

Fabricated CVE details. Claude generates a CVE number with a description that sounds plausible — correct format, reasonable severity, believable affected product — but the CVE does not exist. This is especially common when Claude is generating illustrative examples or when it is asked about CVEs outside its training data. How to catch it: verify every CVE number against the National Vulnerability Database (nvd.nist.gov) or MITRE’s CVE list.

Overconfident analysis. Claude analyzes ambiguous evidence and reaches a definitive conclusion that is not justified by the data. It declares “this is definitely token replay” when the evidence could support multiple explanations. The analysis reads authoritatively. The confidence is not warranted. Claude tends toward confident conclusions regardless of evidence strength — it does not say “I’m not sure” often enough. How to catch it: for any analytical conclusion, ask Claude “What alternative explanations exist?” and “What evidence would disprove this conclusion?” If Claude dismisses alternatives too quickly or cannot articulate what would change its mind, the conclusion may be overconfident.

Data privacy per plan tier

The data handling implications of each Claude plan tier directly affect what data you can upload and how it is processed. This is not theoretical — it determines whether your use of Claude complies with your organization’s data handling policies.

Free tier: input data may be used for model training. Anthropic staff may review your conversations for safety purposes. Do not upload production security data, investigation evidence, or any data containing PII on the Free tier.

Pro ($20/month): input data is retained. You can opt out of training use in settings. Anthropic staff may access conversations for safety review. Suitable for individual professional use with sanitized data. Not suitable for unsanitized production data without organizational approval.

Team ($30/user/month): data is not used for training by default. Admin controls allow organizational oversight. SOC 2 Type II compliance. Staff access is limited to safety review. This is the minimum tier for organizational deployment where analysts may handle sensitive data.

Enterprise: zero data retention available. SSO integration. Dedicated support. Custom deployment. Private plugin marketplaces. Required for organizations with strict data sovereignty, regulatory requirements, or classified data handling needs.

The practical guidance: use Free for learning and evaluation (including the exercises in this course). Use Pro with sanitized data for individual professional work. Require Team or Enterprise for organizational deployment where analysts handle production security data.

When NOT to use Claude

Do not use Claude for real-time incident response decisions. The decision to disable a VIP’s account during a board meeting, isolate a production server, or invoke the major incident response plan requires human judgment that accounts for business context Claude cannot access. Claude is a tool for analysis and documentation — not a decision-maker during an active incident.

Do not use Claude as your sole source of truth. Every Claude output in a security context must be verified. Use Claude to accelerate your work — generate the first draft of a query, the initial structure of a report, the preliminary analysis of log data — then verify and refine with human expertise and primary sources.

Do not use Claude for tasks requiring legal certainty. If the output will be used in legal proceedings, regulatory filings, or law enforcement referrals, every statement must be verified against primary evidence. Claude can draft the structure. A human must verify every fact.

Do not paste unredacted production data on Free or Pro plans. If your organization has not approved Claude for security operations, treat it as an unauthorized external service. The convenience of quick analysis does not justify the data handling risk.

Do not trust Claude’s self-verification. If you ask Claude “is this query correct?” it will almost always say yes — even if the query contains hallucinated references. Claude cannot verify its own output against your environment. Only you can do that, by running the query, checking the documentation, or testing the script.

Shadow AI and organizational governance

If you use Claude for security work without your organization’s knowledge or approval, you are creating a shadow AI data flow. Investigation details, log data, incident information, and policy content flowing to a third-party service without organizational approval is a policy violation in most environments — and potentially a compliance violation under frameworks that require data inventory and third-party risk assessment.

The responsible path is to get organizational approval before using Claude for work-related security analysis. This means presenting the use case to your security leadership, proposing the appropriate plan tier (Team or Enterprise for organizational use), defining what data can and cannot be uploaded, and establishing a review process for AI-generated output before it enters official records.

Module S6 covers AI governance frameworks in detail — including shadow AI detection, acceptable use policies, and the five-component governance framework that security teams need.

Worked artifact — AI tool authorization request template:
To: [CISO / Security Director] Subject: Authorization to use Claude AI for security operations
Purpose: Request approval to use Anthropic’s Claude AI (Team plan, $30/user/month) as an operational tool for the SOC team.
Use cases: KQL query generation, IR report drafting, detection rule documentation, log analysis (sanitized data only), compliance gap analysis, policy drafting.
Data handling: Team plan provides SOC 2 Type II compliance, no training on our data by default, admin controls for organizational oversight. All production data will be sanitized before upload using the established sanitization checklist. No credentials, API keys, or classified information will be uploaded.
Risk mitigation: All AI-generated output will be verified before deployment or inclusion in official reports. The verification discipline (Output → Verify → Deploy) will be documented in the SOC operating procedures. Usage will be restricted to the approved Team workspace with admin visibility.
Cost: $30/user/month per analyst. Estimated time savings: 5-10 hours/week per analyst on documentation, query generation, and report drafting.
Adapt this template for your organization. The goal is to move Claude from shadow IT to approved tooling with appropriate governance.

Compliance Myth

"Using Claude on a paid plan means our data is safe and we do not need organizational approval."

Production reality: A paid Claude subscription is a contract between you (the individual subscriber) and Anthropic. It is not an organizational authorization. Your organization's data handling policies, third-party risk assessment requirements, and compliance obligations still apply. Using Claude for security work without organizational approval creates a shadow AI data flow — regardless of the plan tier. The Team and Enterprise plans provide the data governance controls organizations need (admin oversight, no training use, SOC 2 compliance), but deploying them requires organizational procurement and approval, not individual signup. Get approval first. The data governance features of the Team plan are the tools your CISO needs to say yes.

Try it: Test Claude's hallucination patterns

Open Claude.ai. Ask Claude to write a KQL query that identifies Entra ID sign-in risk events. Examine the output: what table name did Claude use? Check whether that exact table name exists in your Sentinel workspace schema. If it does not, you have witnessed the most common hallucination pattern firsthand. Then ask Claude: "Are you sure that table name is correct?" — observe that Claude will likely confirm the hallucinated name with confidence. This demonstrates why self-verification does not work and why the human verification step is non-negotiable.

Knowledge checks

Check your understanding

1. Claude generates an IR report that cites CVE-2026-18742 as the vulnerability exploited in the attack you are investigating. You do not recognize the CVE. What should you do?

Verify the CVE against the National Vulnerability Database (nvd.nist.gov). Claude fabricates plausible CVE numbers — correct format, reasonable description, believable affected product — that do not correspond to real vulnerabilities. If the CVE does not appear in NVD, it is a hallucination. Remove it from the report before submission.

Include it — Claude would not generate a fake CVE number

Ask Claude to verify the CVE number

Always verify CVE numbers against NVD. Claude fabricates plausible CVEs. Asking Claude to verify its own output does not work — it will typically confirm the hallucinated CVE with confidence.

2. Your SOC team wants to start using Claude for investigation work. Analysts will be uploading (sanitized) sign-in logs and drafting IR reports. What is the minimum appropriate plan tier?

Pro ($20/month) per analyst with sanitization

Team ($30/user/month). For organizational SOC deployment, Team provides admin controls (management visibility into usage), no training on your data by default, SOC 2 Type II compliance, and shared Projects. The admin controls are what the CISO needs to approve the deployment — individual Pro accounts lack organizational oversight.

Free tier with careful sanitization is sufficient

Team is the minimum for organizational SOC deployment. Admin controls, no-training default, and SOC 2 compliance are the governance features that enable organizational approval.

3. Claude analyzes a set of sign-in events and concludes "this is definitely a token replay attack." The evidence shows sign-ins from two different IPs within a short time window. What should your next step be?

Accept the conclusion — Claude analyzed the evidence

Challenge the confidence level. Ask Claude "What alternative explanations exist for this pattern?" and "What evidence would disprove token replay?" Claude tends toward overconfident conclusions from ambiguous evidence. Multiple IPs in a short window could indicate token replay, but could also indicate a VPN change, a mobile device switching networks, or legitimate travel. The investigation must evaluate alternatives before concluding.

Enable Extended Thinking to improve the analysis

Challenge the confidence level. Claude's overconfidence on ambiguous evidence is one of its most dangerous failure modes. Ask for alternative explanations and evaluate each against the evidence before reaching a conclusion.

Key takeaways

Four hallucination patterns affect security work. Invented names (tables, columns, cmdlets), outdated features (portal paths, product names), fabricated CVEs, and overconfident analysis. Know the patterns. Build the verification habits.

Data privacy varies by plan tier. Free and Pro have training and review implications. Team provides organizational governance. Enterprise provides maximum data protection. Choose the tier that matches your data sensitivity.

Self-verification does not work. Claude cannot check its own output against your environment. Only you can verify, by running the query, checking the documentation, or testing the script.

Organizational approval comes before individual use. Using Claude without approval creates shadow AI. Get approval first — present the use case, propose the appropriate tier, define data boundaries.

The verification discipline is the foundation. Output → Verify → Deploy. This is the thread that runs through every module in this course. It is what makes AI-assisted security work trustworthy.

Working with Files, Data, and Context →