NTFS Sparse files – another possible quick anti- trick

A number of tricks that cause trouble to sandboxes, as well as malware analysts leverage less known features of NTFS (note: less known to programmers and perhaps reversers than to forensic experts). NTFS is rich in features and malware successfully abused these in the past, and… still does nowadays e.g. storing the code and data inside the Alternate Data Streams, Extended Attributes, toying around with Unicode character set by using RTLO (Right To Left Override) or homographic attacks to hide or obfuscate file names.

What about Sparse files?

The way it works is that one can create a normal file using e.g. CreateFile API then use the FSCTL_SET_SPARSE control code to make this file grow in a perceived size very quickly. The change is instant as the system allocates a chain of clusters for such file inside the $MFT and does so in a smart way without actually using physical clusters that it would normally fill in with data (zeroes). So large these files can become that copying them outside of lab/sandbox will cause a lot of trouble, and who knows, in some cases may even DoS the whole lab device or network.

There is also one more trick that can be done here (while it doesn’t require using sparse files per se it is certainly easier to deliver it with this specific feature being enabled): a slightly more complex malware could artificially generate a new payload – a large PE file (and creating it in a sparse mode would be the fastest way to do so). It would then fill it in with a modified PE header/sections data and sections placed in the vast space of a new file yet in a way that the file can be still executed. There are some constraints against maximum size and available memory of course. Again, it will be impossible to copy such file outside the lab/sandbox w/o either compressing it or shrinking it somehow. It may also be harder to dump its memory and post-process/analyze it efficiently (note that if these artificially created PE sections are large enough malware could fill it in memory with a lot of random data so the memory dump would contain some ‘data’ – imagine how long would it take to generate strings from it).

And last but not least – such trickery may affect forensic evidence processing – such files would be certainly harder to extract. I don’t know what techniques forensics software can use to ensure extraction of sparse files is done efficiently (and how forensic software deals with it today), but well… using sparse files for the output could be probably a good idea? Also, how to browse such files efficiently? Some special mode that removes zeroes from the output and shows ‘islands’ of data? Some food for thought.

No PoCs as it is just a random thought.

Hexacorn

Hexacorn

NTFS Sparse files – another possible quick anti- trick