In this module

OD1.5 Passive Reconnaissance — What's Visible Before the Attack

6-8 hours · Module 1 · Free

What you already know

You've heard of OSINT. You may have run Shodan queries or checked your organisation's exposure through Have I Been Pwned. This sub goes further — it walks the full passive reconnaissance process from the attacker's perspective, with specific examples of what each source reveals, how the attacker uses it to shape their operational plan, and how you can use an LLM to accelerate your own reconnaissance self-assessment.

Operational Objective

Before an attacker sends the first phish or scans the first port, they've spent hours reading your publicly available data. They know your email format, your security stack, your org chart, and potentially your employees' reused passwords — all from sources your SIEM will never see.

The attacker builds their operational plan from this data. If you don't know what's visible about your organisation, you can't assess what attack methods are viable against you. This sub walks you through the five categories of publicly available information that attackers collect, explains what each enables operationally, and gives you a methodology for running the same assessment against yourself — including AI-accelerated interpretation.

Learning Objectives

By the end of this sub you will be able to:

Enumerate the five categories of passive reconnaissance data (infrastructure, people, technology, credentials, documents) and explain what each reveals to the attacker and how it shapes the operational plan. The same categories drove the reconnaissance phase that preceded the 2020 SolarWinds compromise — the attackers knew Orion's architecture, customer base, and build pipeline from public sources before writing any code. This matters because understanding what's publicly visible about your organisation tells you what attack methods are viable against you before the attacker demonstrates them.

Run a passive reconnaissance self-assessment against your own organisation using DNS records, certificate transparency, LinkedIn, job postings, and breach databases, then use an LLM to interpret the findings at attacker-relevant speed. This matters because the gap between what you think is exposed and what's actually exposed is consistently wider than security teams expect — and every finding is something the attacker already knows.

Prioritize credential exposure as the single highest-impact passive reconnaissance finding and explain why breached passwords and infostealer session tokens provide direct access without any detectable technique. This matters because credential exposure bypasses your entire detection stack — the attacker authenticates as a legitimate user because they have a legitimate password.

THE PASSIVE RECONNAISSANCE LANDSCAPE

INFRASTRUCTURE

DNS records (MX, SPF, TXT)

Certificate transparency

Shodan / Censys

→ architecture, providers

PEOPLE

LinkedIn + org chart

Email format

Conference talks

→ targets, roles, trust

TECHNOLOGY

Job postings

GitHub repos

HTTP headers

→ stack, defences, gaps

CREDENTIALS

Breach databases

Infostealer logs

Paste sites

→ passwords, tokens

DOCUMENTS

PDF/DOCX metadata

Cached pages

Public filings

→ internal details

All of this before a single packet touches your infrastructure. Your SIEM sees none of it.

Figure OD1.5 — Five categories of publicly available information that attackers collect before the operation begins. Every piece shapes the operational plan. Every piece is also available to you — if you look first.

Your attack surface is public whether you want it to be or not

The information attack surface is everything an attacker can learn without touching your infrastructure — and it's almost always larger than you think.

Passive reconnaissance is invisible to your SIEM. There's no alert for someone reading your LinkedIn page. No log entry for someone querying certificate transparency. No firewall event for someone browsing your job postings. The attacker builds a detailed operational picture in complete silence, and the first time you see them is when the phishing email arrives. By that point, they already know your email format, your security stack, the name of your CFO's executive assistant, and potentially the password your IT director used on a breached third-party service.

You can't prevent the attacker from reading public data. What you can do is know what's exposed, reduce what doesn't need to be, and assume the attacker has already read the rest.

Infrastructure reconnaissance — your DNS tells a story

DNS records are public by design and reveal far more about your infrastructure than most organisations realise.

MX records reveal your email provider immediately. company.com MX → company-com.mail.protection.outlook.com tells the attacker you use Microsoft 365. That single record determines the phishing strategy: the attacker builds an AiTM proxy targeting the Microsoft login page. If the MX points to a third-party gateway (Proofpoint, Mimecast, Barracuda), the attacker knows which gateway to test against before sending.

SPF records reveal every third-party service that sends email on your behalf. A typical SPF includes spf.protection.outlook.com, _spf.google.com, mail.zendesk.com, servers.mcsv.net. The attacker now knows your M365, Google, Zendesk, and Mailchimp usage. Each is a potential phishing vector — a spoofed Zendesk ticket, a fake Mailchimp campaign update.

Certificate transparency logs are the hidden goldmine. Query crt.sh for %.yourdomain.com and you see every certificate ever issued — including for internal hostnames. vpn.company.com, jenkins.internal.company.com, staging-api.company.com. Each reveals an internal service. jenkins means CI/CD. staging-api means a staging environment with probably weaker controls than production.

Defensive translation: run the same queries against yourself. Query crt.sh for your domains. Audit your DNS TXT records. Read your SPF and ask whether every included sender is still in use. Identify certificates for internal hostnames that shouldn't be public.

People reconnaissance — LinkedIn is the attacker's org chart

Job titles map to access privileges. The reporting structure reveals trust relationships. Job postings reveal your security stack.

"Senior Finance Manager" has access to payment systems. "Executive Assistant to the CEO" has delegated access to the CEO's mailbox and calendar. The attacker doesn't need to compromise the CEO — they compromise the assistant who has the same data access with less security training.

Job postings are a technical specification of your environment. A posting for "SOC Analyst — Experience with Microsoft Sentinel, CrowdStrike Falcon, and Splunk required" tells the attacker your exact detection stack. They load a CrowdStrike Falcon trial, test their payload against it, and iterate until it evades — before they've touched your environment.

Email format discovery is trivially easy. One employee's email address appearing anywhere public — a conference bio, a GitHub commit, a breach database — reveals the format for the entire organisation. first.last@company.com means the attacker can generate a valid address for any person whose name they know from LinkedIn.

Defensive translation: audit your job postings — do they name specific security tools? Consider whether "experience with enterprise SIEM and EDR platforms" communicates the same requirement without naming products. Review LinkedIn exposure for IT and security staff.

Credential reconnaissance — the single highest-impact finding

If passive reconnaissance has one finding that changes the operation more than any other, it's valid credentials.

Breached passwords provide direct access without exploitation, without phishing, without any technique your detection rules were built to catch. The attacker authenticates as a legitimate user because they are using a legitimate password.

Breach databases are the primary source. When a third-party service is breached, credentials of every user who registered with their work email become available. Query for *@company.com, receive a list of employees with breached credentials, attempt credential stuffing against M365. If any employee reused their password, the attacker gets a valid session without triggering a single failed-login alert.

Infostealer malware has created a secondary market that's arguably more dangerous. Infostealers run on employees' personal devices and capture everything stored in the browser: passwords, cookies, session tokens, autofill data. An infostealer log might contain the employee's M365 password, a valid session cookie (bypasses MFA), VPN credentials, and password manager master password. One infected personal device produces enough material to compromise multiple organisational services.

The infection happened on a device you don't manage, don't monitor, and can't detect. The first sign is when those credentials are used against your environment — and if the attacker replays a valid session cookie, there may be no sign at all.

Defensive translation: monitor breach databases for your domain. When breached credentials are found, force password resets. More importantly: enforce phishing-resistant MFA (FIDO2/passkeys) so valid passwords are insufficient. Implement continuous access evaluation so stolen tokens expire rapidly.

Using AI to accelerate reconnaissance self-assessment

An LLM compresses the interpretation phase from hours to minutes.

Running a passive reconnaissance assessment produces a lot of raw data — DNS records, certificate logs, LinkedIn profiles, job postings, breach results, HTTP headers. Interpreting that data manually takes hours.

Feed your DNS records, SPF record, and certificate transparency results into Claude and ask: "Based on this data, what can an attacker infer about our infrastructure, email providers, SaaS services, and internal applications? What are the highest-risk findings?" The LLM identifies the operationally significant details faster than manual review.

# STEP 1 — Infrastructure reconnaissance (10 minutes)

# Query MX records — reveals email provider
dig MX yourdomain.com +short

# Query TXT records — reveals SaaS integrations
dig TXT yourdomain.com +short

# Query SPF — reveals authorized email senders
dig TXT yourdomain.com +short | grep "v=spf1"

# Search certificate transparency — reveals internal hostnames
# Visit: https://crt.sh/?q=%25yourdomain.com%25
# Note any hostnames that reveal internal services
# (vpn, jenkins, staging, dev, internal, admin, etc.)

# STEP 2 — Credential exposure check (5 minutes)

# Check Have I Been Pwned for your domain:
# Visit: https://haveibeenpwned.com/DomainSearch
# Note: number of breached accounts, which breaches,
# how recently the credentials were exposed.

# STEP 3 — Technology stack from job postings (10 minutes)

# Find 3 current job postings from your organisation on
# LinkedIn, Indeed, or your careers page.
# Copy the "requirements" or "skills" section from each.

# STEP 4 — AI-accelerated interpretation (15 minutes)

# Paste your DNS records, CT results, and job postings
# into Claude with this prompt:
#
# "I'm conducting a passive reconnaissance self-assessment.
#  Here are the findings:
#  [paste DNS records]
#  [paste CT log results]
#  [paste job posting requirements]
#
#  Based on this data:
#  1. What infrastructure and services can an attacker identify?
#  2. Which findings are highest risk — most directly enabling
#     a specific attack method?
#  3. What should I remediate first?
#  4. What additional recon would an attacker likely conduct?"

# STEP 5 — Document findings
# For each finding, note:
# - What it reveals to the attacker
# - What attack method it enables
# - Whether the exposure is reducible

Hands-on Exercise — Passive Reconnaissance Self-Assessment

Objective: Run a passive reconnaissance assessment against your own organisation and use an LLM to interpret the findings.

Prerequisites: Access to DNS lookup tools (dig or web-based equivalents), a browser, and access to Claude or another LLM for interpretation. No lab VMs required.

Success criteria: You've identified at least five pieces of publicly visible information that are operationally useful to an attacker, with a concrete assessment of what each enables.

Challenge: Check whether your organisation's email format is discoverable from public sources. Search for any employee's email address in Google ("@yourdomain.com" site:linkedin.com OR site:github.com). If you find the format, an attacker can generate valid addresses for your entire company from LinkedIn names. Is there anything you can do about it?

OD1.6 — Active Reconnaissance: Probing Without Being Caught. Where passive reads public data, active touches your infrastructure. The attacker's challenge: gather useful data without triggering your alerts — and why most active reconnaissance succeeds because your thresholds are designed for a different problem.

Operational Artifact — Passive Reconnaissance Reference

Five categories and what they enable:

Infrastructure (DNS, CT logs, Shodan). Reveals email provider, SaaS footprint, internal hostnames, exposed services. Enables: targeted phishing, service exploitation, infrastructure mapping.

People (LinkedIn, org charts, email format). Reveals targets, roles, trust relationships, delegation patterns. Enables: targeted social engineering, phishing recipient selection, pretext construction.

Technology (job postings, GitHub, HTTP headers). Reveals SIEM, EDR, email gateway, identity stack, CI/CD pipeline. Enables: pre-tested evasion, targeted tool selection.

Credentials (breach DBs, infostealer logs). Reveals reused passwords and session tokens. Enables: direct access without exploitation — highest impact, lowest attacker effort. Check weekly.

Documents (metadata, cached pages, filings). Reveals internal details, author names, OS versions, internal project names. Enables: targeted content, metadata-based attacks.

Priority action. Check breach databases and infostealer feeds weekly. Enforce phishing-resistant MFA (FIDO2/passkeys). Audit job postings for named security tools. Query crt.sh for internal hostname exposure.

Checkpoint — before moving on

You should be able to do the following without referring back to this sub. If you can't, the sections to re-read are noted.

1. Explain why passive reconnaissance is invisible to your SIEM and enumerate the five categories of publicly available data an attacker collects. (§ Your attack surface is public)

2. Run a basic passive reconnaissance self-assessment against your own organisation (DNS, CT logs, breach check, job postings) and use an LLM to interpret the findings. (§ Hands-on Exercise)

3. Explain why credential exposure is the single highest-priority finding and what defense (phishing-resistant MFA + breach monitoring) addresses it. (§ Credential reconnaissance)

You're reading the free modules of offensive-security-for-defenders

The full course continues with advanced topics, production detection rules, worked investigation scenarios, and deployable artifacts.

View Pricing See Full Syllabus

← Previous Next →