Safety, Limitations & Responsible Use
Hallucination patterns in security contexts
Claude generates statistically probable output, not factual output. When the correct answer is well-represented in training data (common KQL patterns, standard MITRE techniques), Claude is reliable. When the correct answer is rare, ambiguous, or requires current information, Claude generates plausible-sounding output that may be completely wrong.
Invented table and column names. Claude generates a KQL query referencing a table name that sounds like it should exist — it follows Microsoft’s naming conventions, it relates to the right security domain — but the table does not exist in any Sentinel workspace. The query is syntactically perfect. The table name is a hallucination. This happens because Claude learned the pattern of Microsoft table naming (noun + noun, camelCase) and generated a new name that fits the pattern but does not correspond to a real table. The same applies to column names, PowerShell cmdlets, and API endpoints. How to catch it: run every query in your lab environment first. If the query fails with a “table not found” or “column not found” error, the reference is hallucinated.
Outdated feature descriptions. Claude describes a Defender feature using its pre-May 2025 name, location, or behavior. Microsoft moves features, renames portals, and restructures navigation frequently. A portal path that was correct when Claude’s training data was compiled may no longer exist. How to catch it: when Claude describes a portal location or feature behavior, verify it in the actual portal or against Microsoft’s current documentation. Use web search to check for recent changes.
Fabricated CVE details. Claude generates a CVE number with a description that sounds plausible — correct format, reasonable severity, believable affected product — but the CVE does not exist. This is especially common when Claude is generating illustrative examples or when it is asked about CVEs outside its training data. How to catch it: verify every CVE number against the National Vulnerability Database (nvd.nist.gov) or MITRE’s CVE list.
Overconfident analysis. Claude analyzes ambiguous evidence and reaches a definitive conclusion that is not justified by the data. It declares “this is definitely token replay” when the evidence could support multiple explanations. The analysis reads authoritatively. The confidence is not warranted. Claude tends toward confident conclusions regardless of evidence strength — it does not say “I’m not sure” often enough. How to catch it: for any analytical conclusion, ask Claude “What alternative explanations exist?” and “What evidence would disprove this conclusion?” If Claude dismisses alternatives too quickly or cannot articulate what would change its mind, the conclusion may be overconfident.
Data privacy per plan tier
The data handling implications of each Claude plan tier directly affect what data you can upload and how it is processed. This is not theoretical — it determines whether your use of Claude complies with your organization’s data handling policies.
Free tier: input data may be used for model training. Anthropic staff may review your conversations for safety purposes. Do not upload production security data, investigation evidence, or any data containing PII on the Free tier.
Pro ($20/month): input data is retained. You can opt out of training use in settings. Anthropic staff may access conversations for safety review. Suitable for individual professional use with sanitized data. Not suitable for unsanitized production data without organizational approval.
Team ($30/user/month): data is not used for training by default. Admin controls allow organizational oversight. SOC 2 Type II compliance. Staff access is limited to safety review. This is the minimum tier for organizational deployment where analysts may handle sensitive data.
Enterprise: zero data retention available. SSO integration. Dedicated support. Custom deployment. Private plugin marketplaces. Required for organizations with strict data sovereignty, regulatory requirements, or classified data handling needs.
The practical guidance: use Free for learning and evaluation (including the exercises in this course). Use Pro with sanitized data for individual professional work. Require Team or Enterprise for organizational deployment where analysts handle production security data.
When NOT to use Claude
Do not use Claude for real-time incident response decisions. The decision to disable a VIP’s account during a board meeting, isolate a production server, or invoke the major incident response plan requires human judgment that accounts for business context Claude cannot access. Claude is a tool for analysis and documentation — not a decision-maker during an active incident.
Do not use Claude as your sole source of truth. Every Claude output in a security context must be verified. Use Claude to accelerate your work — generate the first draft of a query, the initial structure of a report, the preliminary analysis of log data — then verify and refine with human expertise and primary sources.
Do not use Claude for tasks requiring legal certainty. If the output will be used in legal proceedings, regulatory filings, or law enforcement referrals, every statement must be verified against primary evidence. Claude can draft the structure. A human must verify every fact.
Do not paste unredacted production data on Free or Pro plans. If your organization has not approved Claude for security operations, treat it as an unauthorized external service. The convenience of quick analysis does not justify the data handling risk.
Do not trust Claude’s self-verification. If you ask Claude “is this query correct?” it will almost always say yes — even if the query contains hallucinated references. Claude cannot verify its own output against your environment. Only you can do that, by running the query, checking the documentation, or testing the script.
Shadow AI and organizational governance
If you use Claude for security work without your organization’s knowledge or approval, you are creating a shadow AI data flow. Investigation details, log data, incident information, and policy content flowing to a third-party service without organizational approval is a policy violation in most environments — and potentially a compliance violation under frameworks that require data inventory and third-party risk assessment.
The responsible path is to get organizational approval before using Claude for work-related security analysis. This means presenting the use case to your security leadership, proposing the appropriate plan tier (Team or Enterprise for organizational use), defining what data can and cannot be uploaded, and establishing a review process for AI-generated output before it enters official records.
Module S6 covers AI governance frameworks in detail — including shadow AI detection, acceptable use policies, and the five-component governance framework that security teams need.
Worked artifact — AI tool authorization request template:
To: [CISO / Security Director] Subject: Authorization to use Claude AI for security operations
Purpose: Request approval to use Anthropic’s Claude AI (Team plan, $30/user/month) as an operational tool for the SOC team.
Use cases: KQL query generation, IR report drafting, detection rule documentation, log analysis (sanitized data only), compliance gap analysis, policy drafting.
Data handling: Team plan provides SOC 2 Type II compliance, no training on our data by default, admin controls for organizational oversight. All production data will be sanitized before upload using the established sanitization checklist. No credentials, API keys, or classified information will be uploaded.
Risk mitigation: All AI-generated output will be verified before deployment or inclusion in official reports. The verification discipline (Output → Verify → Deploy) will be documented in the SOC operating procedures. Usage will be restricted to the approved Team workspace with admin visibility.
Cost: $30/user/month per analyst. Estimated time savings: 5-10 hours/week per analyst on documentation, query generation, and report drafting.
Adapt this template for your organization. The goal is to move Claude from shadow IT to approved tooling with appropriate governance.
Try it: Test Claude's hallucination patterns
Open Claude.ai. Ask Claude to write a KQL query that identifies Entra ID sign-in risk events. Examine the output: what table name did Claude use? Check whether that exact table name exists in your Sentinel workspace schema. If it does not, you have witnessed the most common hallucination pattern firsthand. Then ask Claude: "Are you sure that table name is correct?" — observe that Claude will likely confirm the hallucinated name with confidence. This demonstrates why self-verification does not work and why the human verification step is non-negotiable.
Knowledge checks
Check your understanding
1. Claude generates an IR report that cites CVE-2026-18742 as the vulnerability exploited in the attack you are investigating. You do not recognize the CVE. What should you do?
2. Your SOC team wants to start using Claude for investigation work. Analysts will be uploading (sanitized) sign-in logs and drafting IR reports. What is the minimum appropriate plan tier?
3. Claude analyzes a set of sign-in events and concludes "this is definitely a token replay attack." The evidence shows sign-ins from two different IPs within a short time window. What should your next step be?
Key takeaways
Four hallucination patterns affect security work. Invented names (tables, columns, cmdlets), outdated features (portal paths, product names), fabricated CVEs, and overconfident analysis. Know the patterns. Build the verification habits.
Data privacy varies by plan tier. Free and Pro have training and review implications. Team provides organizational governance. Enterprise provides maximum data protection. Choose the tier that matches your data sensitivity.
Self-verification does not work. Claude cannot check its own output against your environment. Only you can verify, by running the query, checking the documentation, or testing the script.
Organizational approval comes before individual use. Using Claude without approval creates shadow AI. Get approval first — present the use case, propose the appropriate tier, define data boundaries.
The verification discipline is the foundation. Output → Verify → Deploy. This is the thread that runs through every module in this course. It is what makes AI-assisted security work trustworthy.