In this module

$DATA Attribute — Resident and Non-Resident

14 hours · Module 1 · Free
Operational Objective
The $DATA attribute is where the file's actual content lives — the document text, the executable code, the image pixels, the spreadsheet data. For the forensic examiner, the $DATA attribute presents two fundamentally different analysis scenarios depending on the file's size. Small files (roughly under 700 bytes) are stored resident — the entire file content is embedded directly within the MFT record, alongside the header and other attributes. Larger files are stored non-resident — the MFT record contains only a data run list that points to clusters on disk where the content is stored. This distinction has profound forensic implications. Resident file data survives file deletion as long as the MFT entry is not reallocated — the content is physically part of the MFT record. Non-resident file data survives deletion only if the clusters it occupied have not been overwritten by new data. Additionally, NTFS supports Alternate Data Streams (ADS) — named $DATA attributes that allow files to carry hidden data payloads invisible to standard directory listings. Malware has used ADS for data hiding since the earliest days of NTFS. This subsection teaches the $DATA attribute structure for both resident and non-resident forms, the data run encoding that maps logical file offsets to physical disk clusters, and ADS detection and analysis.
Deliverable: Ability to identify whether a file's $DATA attribute is resident or non-resident from raw hex, extract resident file content directly from the MFT record, decode data run lists to map file content to physical disk clusters, identify and examine Alternate Data Streams, and assess the recoverability of deleted file data based on $DATA attribute type and cluster allocation status.
Estimated completion: 45 minutes
$DATA ATTRIBUTE — RESIDENT vs NON-RESIDENT RESIDENT $DATA (file < ~700 bytes) Non-resident flag: 0x00 Attribute header (24 bytes) ACTUAL FILE CONTENT Embedded in the MFT record itself Forensic advantage: Content survives file deletion (in MFT record) Recoverable even if clusters are overwritten NON-RESIDENT $DATA (file > ~700 bytes) Non-resident flag: 0x01 Attribute header (64+ bytes with data runs) DATA RUN LIST (cluster pointers) Points to clusters on disk where content is stored Forensic risk: Clusters may be overwritten after deletion SSD TRIM zeros clusters almost immediately DATA RUN ENCODING Each run: header byte (nibbles = length_size:offset_size) + length_bytes + offset_bytes Example: 0x31 means 1 byte for length, 3 bytes for offset → read 1 byte (cluster count) then 3 bytes (starting cluster) Runs are terminated by 0x00. Offsets are signed and relative to the previous run's starting cluster. ALTERNATE DATA STREAMS (ADS) Named $DATA attributes: Type 0x80 with a non-empty name field. File can have unlimited named streams. Access: filename:streamname (e.g., doc.txt:hidden_payload). Invisible in Explorer, dir, and most tools unless specifically queried. Malware use: store payload in ADS of legitimate file. Zone.Identifier: browser download tracking in ADS (legitimate).

Figure WF1.4 — $DATA attribute in resident and non-resident forms. Resident data is embedded in the MFT record and survives file deletion. Non-resident data is stored in clusters on disk and may be overwritten after deletion. Alternate Data Streams are named $DATA attributes that provide a hidden data storage mechanism within NTFS.

Resident $DATA — content inside the MFT record

When a file's content is small enough to fit within the MFT record alongside the header, $SI, $FN, and any other attributes, NTFS stores the data directly in the $DATA attribute as resident content. The threshold is approximately 700 bytes, though the exact limit depends on how much space the other attributes consume. A file with a short filename (one $FN attribute with namespace 0x03) has more room for resident data than a file with a long filename (two $FN attributes plus a longer name consuming more bytes).

The resident $DATA attribute has the standard attribute header (type 0x80, length, non-resident flag = 0), followed by the content size and content offset fields specific to resident attributes. The content offset (typically 0x18 from the attribute start) points to where the actual file data begins within the attribute. The content size tells you exactly how many bytes of file data are present.

This is forensically significant for two reasons. First, resident file content is physically part of the MFT record. When the file is deleted, the MFT entry is marked as free but the data remains in the record until the entry is reallocated for a new file. On a volume with many free MFT entries, deleted resident files can persist for weeks or months. Second, resident file content is captured by any tool that extracts the MFT — KAPE's $MFT target, FTK Imager's MFT extraction, or even a raw copy of the MFT file. You don't need the full disk image to recover resident data; you only need the MFT.

Small files that are commonly resident include batch scripts, small configuration files, PowerShell scripts, registry exports, CSV files, small text notes, and short email drafts. In an investigation, these are often among the most interesting files — a malicious batch script, a configuration file for a C2 implant, or a text file listing targeted data. If these files were deleted, their content may be fully recoverable from the MFT alone.

To extract resident data from a raw MFT record: locate the $DATA attribute (type 0x80) in the attribute chain, confirm the non-resident flag is 0, read the content offset and content size from the attribute header, and copy the specified number of bytes from the content offset. The extracted bytes are the literal file content — no decompression, no decoding, no cluster mapping required.

Compliance Myth: "If a file has been deleted from the recycle bin, its content is gone"
Deleting a file — even emptying the Recycle Bin — does not destroy the file's content. It marks the MFT entry as available for reuse and the clusters as available for allocation. For resident files (under ~700 bytes), the content remains in the MFT record until a new file is allocated to that entry. For non-resident files, the content remains in the clusters until new data is written to those clusters. On a traditional HDD, deleted file content can persist for days, weeks, or months depending on volume activity. On SSDs with TRIM enabled, non-resident data may be zeroed within minutes of deletion — but resident data in the MFT record persists because TRIM operates on data clusters, not MFT records. The MFT is always on a fixed region of the volume that is not subject to TRIM.

Non-resident $DATA — data run encoding

When a file exceeds the resident threshold, NTFS stores the content in clusters on the disk volume and records the cluster locations in a data run list within the $DATA attribute. The non-resident $DATA attribute header (identified by the non-resident flag = 1 at offset 0x08 of the attribute) contains additional fields not present in resident attributes: the starting VCN (Virtual Cluster Number — the logical offset within the file, usually 0 for the first data run), the ending VCN, the data run offset (where the run list begins within the attribute), the compression unit size, the allocated size (in bytes, rounded up to cluster boundaries), the real size (the actual file content size), and the initialized size (how much of the allocated space contains valid data).

The data run list is a compact binary encoding that maps the file's logical extent to physical clusters on the volume. Each run entry specifies a contiguous range of clusters: how many clusters in the run (the length) and where the run starts on disk (the offset). The encoding uses a header byte where the low nibble indicates the number of bytes used for the length and the high nibble indicates the number of bytes used for the offset.

A concrete example: the header byte 0x31 means the length field is 1 byte and the offset field is 3 bytes. If the next bytes are 08 56 34 12, the run length is 0x08 (8 clusters) and the starting cluster is 0x123456 (little-endian reading of 56 34 12). This run describes 8 contiguous clusters starting at cluster 0x123456 on the volume. To find the physical byte offset on disk, multiply the cluster number by the cluster size (typically 4,096 bytes): 0x123456 × 4,096.

The offset field in subsequent runs is signed and relative to the previous run's starting cluster. If the first run starts at cluster 0x123456 and the second run's offset field contains 0x000100, the second run starts at cluster 0x123456 + 0x000100 = 0x123556. This relative encoding saves space — most runs are near each other on disk (because NTFS tries to allocate contiguous clusters), so the offset differences fit in fewer bytes than absolute cluster numbers.

The run list is terminated by a 0x00 byte. When walking the run list, read the header byte; if it's 0x00, you've reached the end. Otherwise, extract the length and offset according to the nibble sizes, record the run, and advance to the next entry.

For fragmented files, multiple runs describe non-contiguous cluster ranges. A heavily fragmented file might have dozens of runs. The data run list maps the complete physical layout of the file on disk — essential for recovering file content from a raw disk image when the filesystem structures are damaged.

Decision point

You are recovering a deleted executable from a forensic image. The MFT record shows the file is non-resident with a data run list indicating the file occupied clusters 50,000–50,127 (128 clusters = 512 KB with 4K clusters). The file was deleted 3 days ago. The volume is an SSD with TRIM enabled in the operating system.

Your options: (A) Navigate to the cluster range in the disk image and extract the data — the clusters should contain the executable content. (B) Check the volume bitmap ($Bitmap, MFT entry 6) to determine whether those clusters have been marked as free and whether TRIM has likely zeroed the data. If TRIM commands were issued, the cluster content may be all zeros regardless of whether new files were written to those clusters.

The correct approach is B. On an SSD with TRIM enabled, the operating system sends TRIM commands to the SSD controller when clusters are freed. The SSD controller may zero the underlying flash blocks at any time after receiving the TRIM command — this can happen within seconds. The volume bitmap shows whether the clusters are marked as free, and examining the raw cluster content tells you whether the data is still present or has been zeroed. On an HDD, option A would be reasonable because clusters retain their data until overwritten. On an SSD, always verify the cluster content before assuming recoverability.

Alternate Data Streams

NTFS supports multiple $DATA attributes per MFT record. The default (unnamed) $DATA attribute contains the file's primary content — what you see when you open the file. Named $DATA attributes are Alternate Data Streams (ADS) — additional data payloads attached to the file but invisible to standard directory listings, dir commands, and Windows Explorer.

An ADS is identified in the MFT record by a $DATA attribute (type 0x80) with a non-zero name length in the attribute header. The name follows the attribute header and precedes the content (for resident ADS) or data run list (for non-resident ADS). The name is in UTF-16LE, just like filenames in $FN attributes.

To access an ADS from the command line: type filename:streamname or notepad filename:streamname. To list ADS on a file: dir /r filename. To list ADS using PowerShell: Get-Item filename -Stream *. MFTECmd reports ADS in its output — look for entries with the same MFT entry number but different $DATA attribute names.

Forensically, ADS has two significant applications:

Zone.Identifier — legitimate browser tracking. When Internet Explorer, Edge, or Chrome downloads a file, Windows writes a Zone.Identifier ADS containing the download URL and the security zone (typically Zone 3 = Internet). This is the mechanism behind the "This file was downloaded from the Internet" warning dialog. The Zone.Identifier ADS is forensic evidence that a file was downloaded from a specific URL. Many users and even some investigators don't realize this metadata exists because it's invisible in normal file operations. In the insider threat scenario (INC-NE-2026-0915), Zone.Identifier streams on files copied to USB can reveal that the files were originally downloaded from cloud storage before being moved to local storage and then to USB.

Malware data hiding. Attackers store malicious payloads in ADS of legitimate files. A benign-looking text file readme.txt can have an ADS readme.txt:payload.exe containing a full executable. The file's displayed size in Explorer reflects only the primary $DATA attribute — the ADS content is not included. Antivirus solutions have improved their ADS scanning over the years, but some still miss ADS payloads during routine scans. In the MFT, ADS appear as additional $DATA attributes on the same MFT record — the examiner who walks the complete attribute chain will find them.

Try it: Identify resident data and ADS in MFTECmd output

Load a KAPE-collected MFT into MFTECmd and examine the output CSV in Timeline Explorer.

1. Filter for resident files: look for entries where the file size is under 700 bytes. In MFTECmd output, resident files have their data within the MFT record — the "Allocated" and "RealSize" fields will both be small. 2. Find a specific small file (a .txt, .bat, .ps1, or .cfg file under 700 bytes). Note its MFT entry number. 3. In HxD, navigate to that MFT entry (entry_number × 1,024) in the raw MFT. Walk the attribute chain until you find the $DATA attribute (type 0x80). Confirm the non-resident flag is 0. Read the content offset and size from the attribute header. Extract those bytes — they are the literal file content. Compare against the file content on disk (if available) or against what the application that uses the file would expect. 4. Search the MFTECmd output for ADS entries — these appear as records with the same MFT entry number but a non-empty "ADS" or "Zone" column. Zone.Identifier streams are the most common. Examine the content of any Zone.Identifier streams to recover download URLs.

Data attribute and file recovery assessment

When assessing whether a deleted file's content is recoverable, the $DATA attribute type determines your approach:

Resident $DATA (deleted file): High probability of recovery. The content is in the MFT record. As long as the MFT entry has not been reallocated to a new file, the resident data is intact. Check the MFT record flags (offset 0x16) — if flags = 0x00 (file, not in use), the entry is available but not yet reallocated. The content is likely intact. If the flags indicate the entry is now in use (0x01 or 0x03), the record has been reallocated and the previous file's data is gone.

Non-resident $DATA (deleted file, HDD): Moderate probability of recovery. The data run list in the MFT record still points to the clusters that contained the file's data. If those clusters have not been overwritten by new files, the content is intact. Check the volume bitmap to determine whether the clusters are still marked as free (unallocated). Unallocated clusters on an HDD retain their data until new data is written to them.

Non-resident $DATA (deleted file, SSD): Low probability of recovery. TRIM commands may have zeroed the clusters after deletion. Even if the volume bitmap shows the clusters as free, the underlying flash blocks may have been erased by the SSD controller's garbage collection. Extract the clusters and check whether they contain data or are zeroed. If zeroed, the data is unrecoverable from the SSD — though a pre-deletion backup, Volume Shadow Copy, or the USN Journal may provide alternative evidence of the file's existence and characteristics.

Non-resident $DATA (deleted file, fragmented): The data run list shows multiple non-contiguous cluster ranges. Each range must be checked independently — some clusters may be intact while others have been overwritten. Partial recovery is possible: the intact clusters provide fragments of the file content, and depending on the file format, these fragments may be interpretable (text files, log files) or useless (encrypted containers, compressed archives where partial data cannot be decompressed).

You extract the MFT from a forensic image and find a deleted file (flags = 0x00, sequence number 4). The $DATA attribute has the non-resident flag set to 0, a content size of 487 bytes, and the content offset points to data within the MFT record. What is the forensic significance?
The file's content is resident — all 487 bytes are stored directly in the MFT record. Despite the file being deleted, the content is fully recoverable by reading 487 bytes from the content offset within this MFT record. The file has been deleted and reallocated 3 times (sequence number 4, started at 1), but the current data represents the most recent file that occupied this entry before deletion. Extract the bytes and examine the content — for a file this small, it could be a script, configuration file, or text note that may be directly relevant to the investigation.
The file is corrupted — a 487-byte file should not be resident because it exceeds the typical resident threshold. The content at the offset is likely garbage from a previous file.
The file's content is on disk clusters but the MFT record has been overwritten to show it as resident — this is an anti-forensic technique that hides the real cluster locations.
The sequence number of 4 means the data has been overwritten 4 times, so the content in the MFT record is from the fourth version of the file and the earlier versions are lost.

You've built the foundations of artifact-level forensic analysis.

WF0 gave you the taxonomy, NTFS architecture, and the five-step methodology. WF1 took you inside the MFT at the binary level — every attribute, every timestamp, every edge case. From here, every artifact category gets the same raw-first treatment.

  • WF2–WF10: every major Windows artifact decoded at binary level — USN Journal, Prefetch, Amcache, Shimcache, ShellBags, LNK, Jump Lists, SRUM, Event Logs, and the Registry hives
  • INC-NE-2026-0915 (WF13) — Insider data exfiltration capstone. Work the complete investigation from USB history to OneDrive exfiltration evidence
  • INC-NE-2026-1022 (WF14) — Ransomware capstone. Three-host triage (FIN01 → IT03 → FS01) across the 72-hour attack chain
  • The lab pack — 25+ realistic evidence files in 10 formats, simulated KAPE triage pre-populated, both capstones deployable to your own VM
  • Anti-forensic detection methodology — defeat timestomping, log clearing, and Prefetch deletion with cross-artifact correlation
Unlock with Specialist — £25/mo See Full Syllabus

Cancel anytime