In this module
AD5.5 Classifying Alerts: TP, FP, and BTP
Figure AD5.5 — The three classification categories. True positive: real threat, investigate and respond. False positive: legitimate activity misidentified, close and submit feedback. Benign true positive: detection is correct but the activity is authorized, close and consider a suppression rule if it recurs.
The classification decision tree
When you open an incident and read the attack story (AD5.4), follow this decision tree:
Step 1 — Is the detected activity real? Did the event actually happen? A phishing email was actually delivered. A sign-in from an unusual IP actually occurred. A DLP match actually detected credit card numbers. If the event didn't happen (rare — usually a system error or telemetry glitch), the incident is invalid and can be closed with a note.
Step 2 — Is the activity malicious? This is where most classification decisions are made. A sign-in from Germany for a user who lives in Germany and is currently traveling is legitimate (FP). A sign-in from Germany for a user who lives in the UK and is in the office today is suspicious (potential TP). The context determines the classification — and the context usually comes from checking with the user.
Step 3 — If malicious, was it blocked? A blocked attack is still a true positive — the attack was real, the classification is TP. The difference between a blocked TP and a successful TP is the response: a blocked TP needs a password reset (the credentials were compromised even though access was denied). A successful TP needs the full incident response procedure (AD1.9, AD2.10).
Step 4 — If legitimate, was it expected? A phishing simulation by the security team triggers real phishing detections. This is a BTP — the detection is correct (it IS a phishing email), but the activity is authorized. A legitimate admin using a remote management tool that triggers a lateral movement detection is also BTP.
Practical classification examples
Example 1 — Impossible travel (FP). Alert: "Impossible travel activity" for user s.chen. Sign-in from London at 09:00, then from New York at 09:30. Investigation: check s.chen's calendar — she's traveling to New York this week. The London sign-in was from her VPN (which exits through a London server). The New York sign-in was from the hotel WiFi after she arrived. Classification: False positive — the VPN exit node created the appearance of impossible travel. Close with comment: "FP — user traveling, VPN exit in London, actual location New York. Confirmed by calendar."
Example 2 — Phishing delivered (TP, blocked). Alert: "Phishing email delivered to inbox." Investigation: the email contained a credential harvesting link. ZAP removed it after 45 minutes. URL trace shows no clicks. Classification: True positive — the phishing email was real and was delivered. The attack was blocked (ZAP removed, no clicks). Response: check if other users received the same email, block the sender domain, submit to Microsoft. Close with comment: "TP — phishing email delivered, ZAP removed after 45 min, no clicks. Sender domain blocked."
Example 3 — Bulk download (BTP). Alert: "Unusual volume of file downloads" for user m.thompson. Investigation: m.thompson downloaded 200 files from the project archive SharePoint site. Check with m.thompson's manager — the download was authorized for a project handover to a new team member. Classification: Benign true positive — the detection is correct (200 file downloads IS unusual), but the activity is authorized. Close with comment: "BTP — authorized project handover download, confirmed by manager."
Example 4 — Credential compromise (TP, successful). Alert: "Suspicious sign-in activity" for user r.williams. Sign-in at 01:15 from IP in Eastern Europe. MFA: satisfied by claim (token replay). Post-sign-in activity: inbox rules created forwarding "payment" emails externally. Investigation: r.williams was asleep at 01:15 and didn't approve any MFA prompt. Classification: True positive — confirmed AiTM credential compromise with post-compromise BEC activity. Response: execute AD1.9 immediately — revoke sessions, reset password, remove inbox rules, check for forwarded emails. This is the incident you're monitoring for.
The feedback loop
Every classification you submit feeds back to Microsoft's detection engine. When you classify an incident as FP, Microsoft uses that feedback to tune the detection rule — reducing similar false positives for your tenant and for all M365 customers. When you classify as TP, Microsoft confirms the detection is working correctly and may increase confidence scores for similar patterns.
This feedback loop means your Monday review isn't just monitoring your environment — it's improving the detection system. The more accurately you classify, the fewer false positives you see over time. An incident queue full of unclassified incidents provides zero feedback — the detection engine never learns from your environment.
To submit explicit feedback on email detections, navigate to security.microsoft.com → Email & collaboration → Submissions. Submit emails that were incorrectly classified (phishing marked as clean, legitimate email marked as phishing) directly to Microsoft for review. The submission queue is checked by Microsoft's threat analysis team and the feedback is incorporated into the global detection models.
Building a classification audit trail
Your classification comments are your audit trail. Write them for your future self and for anyone who reviews your incident history (auditor, manager, replacement admin during your holiday).
A good classification comment has three parts: the classification, the evidence, and the action taken:
Bad comment: "FP - closed" Good comment: "FP — user a.patel confirmed travel to Frankfurt. Sign-in at 14:30 from Deutsche Telekom IP matches hotel WiFi. Calendar shows Frankfurt meetings 14-16 Apr. No suspicious post-sign-in activity."
Bad comment: "TP - password reset" Good comment: "TP — confirmed AiTM compromise of r.williams. Phishing email received Mon 23:47, token replayed Tue 01:15 from 198.51.100.22 (Eastern Europe). Inbox rules created forwarding 'payment' emails. Actions: sessions revoked 09:15, password reset 09:17, inbox rules deleted 09:20, MFA methods reviewed 09:25. 3 emails forwarded to external address before rules deleted — vendors notified."
The good comment is longer but takes only 2 minutes to write. It's invaluable 3 months later when you're writing the quarterly report, or when an auditor asks "how did you handle the incident in April?"
Tracking classification metrics
After 3 months of consistent classification, calculate your metrics:
# Classification summary for the quarterly report
# (Manual count from your incident log)
$tp = 3 # True positives
$fp = 12 # False positives
$btp = 2 # Benign true positives
$total = $tp + $fp + $btp
Write-Host "=== Q2 2026 CLASSIFICATION METRICS ==="
Write-Host "Total incidents reviewed: $total"
Write-Host "True positives: $tp ($([math]::Round($tp/$total*100))%)"
Write-Host "False positives: $fp ($([math]::Round($fp/$total*100))%)"
Write-Host "Benign true positives: $btp ($([math]::Round($btp/$total*100))%)"
Write-Host ""
Write-Host "FP rate: $([math]::Round($fp/$total*100))% — target: below 70%"The FP rate tells you about detection quality. A high FP rate (80%+) means the detection system is generating too much noise — consider alert tuning or suppression rules (AD5.6). A very low FP rate (under 20%) means either you have very few detections (good — or is something misconfigured?) or you're misclassifying FPs as TPs (audit your classifications). A typical healthy rate for an E3 environment is 50-70% FP — most detections are false alarms, but the true positives you catch are the ones that matter.
Include the FP rate trend in your quarterly report: "Q2 classification metrics: 17 incidents reviewed, 3 TP (18%), 12 FP (70%), 2 BTP (12%). FP rate stable vs Q1 — detection tuning has reduced impossible travel FPs from 6/quarter to 2/quarter after suppression rules were added." The trend demonstrates that your classification feedback and alert tuning are improving detection quality over time.
An alert fires: "Anomalous token" for user a.patel. The sign-in log shows a sign-in at 14:30 from a.patel's usual office IP with MFA completed normally. However, the token properties show an unusual claim — the token was issued by a different session than expected. a.patel confirms they were at their desk at 14:30 and signed in normally. What classification?
Option A: True positive — anomalous token is always suspicious.
Option B: False positive — a.patel confirmed the sign-in, usual IP, usual time, MFA completed. The "anomalous" token property may be caused by a session refresh, a browser update, or an Outlook app reconnection that generated a token with slightly different properties. Classify as FP with comment: "FP — user confirmed sign-in, usual location and time, anomalous token likely from session refresh."
Option C: Investigate further — check for any post-sign-in suspicious activity (inbox rule changes, unusual email sends, file downloads) before classifying.
The correct answer is Option C, leading to Option B if no suspicious post-sign-in activity is found. The user confirmation is reassuring but not definitive — AiTM attacks can capture tokens during a legitimate user sign-in. Check the post-sign-in activity: if there are no inbox rule changes, no unusual emails, and no data access anomalies in the hours after the sign-in, the anomalous token was likely a benign session event. Classify as FP. If you DO find suspicious post-sign-in activity, reclassify as TP and respond.
Try it: Practice classification on real incidents
Open your Defender incident queue. Find 3 resolved incidents (filter Status: Resolved) that were previously classified. For each one:
1. Read the attack story WITHOUT looking at the existing classification 2. Apply the classification decision tree from this subsection 3. Make your own classification decision 4. Check the existing classification — does your decision match?
If your decisions consistently match the existing classifications, your classification skills are calibrated. If they differ, read the classification comments to understand the reasoning — the difference may reveal context you didn't consider (user was traveling, admin was testing, detection was a known false positive pattern).
If your queue has no resolved incidents (fresh tenant), use the examples in this subsection as practice: read each scenario, make the classification decision, and check your answer against the provided classification.
You're reading the free modules of M365 Security: From Admin to Defender
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts.