1.5 Operational Metrics and KPIs
Operational Metrics and KPIs
Introduction
Metrics answer the question every CISO asks: “Is our SOC getting better?” Without metrics, the answer is a feeling. With metrics, the answer is a trend line with evidence.
SOC metrics serve two purposes. Internally, they identify what needs improvement — which detection rules are too noisy, which incident types take too long to contain, where capacity is consumed. Externally, they demonstrate the SOC’s value to the business — how many threats were detected, how quickly they were contained, and what the trend looks like over time.
This subsection defines the core metrics, explains how to measure each from Sentinel data, and establishes targets.
The six core SOC metrics
1. Mean Time to Detect (MTTD)
How long between an adversary’s first action and the SOC’s first alert.
How to measure: From the technical timeline (Module S8, subsection 8.3), calculate: first detection alert timestamp minus first adversary action timestamp. Average across all incidents per month.
Target: Under 15 minutes for high-severity detections. Under 60 minutes for medium-severity.
What it tells you: MTTD reflects detection rule quality and coverage. A high MTTD means either the detection rules are not catching the initial attack stages or the log ingestion pipeline has excessive latency.
2. Mean Time to Respond (MTTR)
How long between the first detection alert and the completion of containment.
How to measure: From the incident record (Module S8), calculate: containment completion timestamp minus detection alert timestamp. Average across all incidents per month.
Target: Under 30 minutes for Critical, under 2 hours for High, under 8 hours for Medium.
What it tells you: MTTR reflects response process efficiency. A high MTTR means either the triage process is too slow, the escalation path has bottlenecks, or containment decisions are delayed by unclear authority.
3. Dwell Time
How long the adversary had access from initial compromise to complete containment.
How to measure: MTTD + MTTR = Dwell time. Also calculable from the technical timeline: last containment action minus first adversary access.
Target: Under 60 minutes for BEC and credential compromise. Under 4 hours for complex incidents.
What it tells you: Dwell time is the adversary’s window of opportunity. Every reduction in dwell time directly reduces potential damage — fewer emails read, fewer accounts compromised, less data exfiltrated.
4. Alert Volume and Signal-to-Noise Ratio
How many alerts are generated and what proportion are actionable.
How to measure:
| |
Target: SNR above 30%. Below 30% means the team is spending more than 70% of triage time on noise.
What it tells you: Alert volume shows workload. SNR shows whether that workload is productive. A high-volume, low-SNR environment creates alert fatigue — the number one cause of missed detections.
5. SLA Compliance
What percentage of incidents meet their severity-appropriate triage and containment SLAs.
How to measure: From incident records, compare actual triage and containment times against the SLAs defined in subsection 7.1.
Target: Above 90% for triage SLAs. Above 85% for containment SLAs.
What it tells you: SLA compliance reflects staffing adequacy and process efficiency. Persistent SLA misses indicate either the SLAs are unrealistic (wrong targets) or the team is under-resourced (right targets, insufficient capacity).
6. Detection Coverage
What percentage of in-scope MITRE ATT&CK techniques have active, tested detection rules.
How to measure: From the coverage map (Module S2, subsection 2.3): techniques with active detections divided by total techniques in scope.
Target: Context-dependent. Focus on coverage of high-priority techniques rather than a blanket percentage.
What it tells you: Coverage shows the breadth of your detection program. Gaps in coverage represent techniques an adversary can use without detection.
The monthly SOC dashboard
Consolidate these metrics into a monthly dashboard for SOC leadership and the CISO:
| Metric | This month | Last month | Trend | Target |
|---|---|---|---|---|
| MTTD (median) | 22 min | 35 min | ↓ Improving | <15 min |
| MTTR (median) | 45 min | 52 min | ↓ Improving | <30 min (Crit) |
| Dwell time (median) | 67 min | 87 min | ↓ Improving | <60 min |
| Alert volume | 412 | 380 | ↑ | — |
| SNR | 34% | 28% | ↑ Improving | >30% |
| SLA compliance (triage) | 91% | 88% | ↑ | >90% |
| SLA compliance (contain) | 87% | 82% | ↑ | >85% |
| Detection coverage | 32/42 (76%) | 28/42 (67%) | ↑ | Prioritized |
| PIR actions completed | 8/10 (80%) | 5/12 (42%) | ↑ | >80% |
One page. Trend arrows. Red/amber/green against targets. The CISO can read this in 2 minutes and know whether the SOC is improving.
Metrics drive behavior — choose carefully
If you measure alert closure speed, analysts will close alerts faster — including by closing them without adequate investigation. If you measure incidents per analyst, analysts will avoid opening incidents. Metrics must be paired: measure closure speed AND reopen rate (incidents that were closed prematurely and had to be reopened). Measure incidents per analyst AND SLA compliance. The paired metric prevents gaming the primary metric.
Try it yourself
Run the SNR query above against your Sentinel workspace. What is your current signal-to-noise ratio? If it is below 30%, identify the top 3 detection rules by false positive count — these are your highest-priority tuning targets.
Check your understanding
1. Your monthly dashboard shows MTTD improved from 35 minutes to 22 minutes, but MTTR worsened from 45 minutes to 65 minutes. What is the most likely explanation?
You're reading the free modules of SOC Operations
The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts. Premium subscribers get access to all courses.