Clustering | Hexacorn

In my last post I looked at ‘good’ file names. Today I will look at them again.

Sort of…

Over the years I have written a number of yara rules that use a peculiar condition that hits on an internal PE file name sometimes being preserved inside some of the PE files, both DLL and EXE… If you ever looked at an internal structure of a PE file you know that its export directory has a capability to preserve a programmer-chosen, internal file name that is compiled into the final binary file, and that internal file name often differs from the file name being used on a file system level…

Some Threat actors know about it and abuse it, but many don’t – in some cases allowing us to write very precise detection rules… That internal file name is a great forensic and telemetry artifact and it would be a crime not to use it, where applicable…

In my old Yara rules I would usually rely on this (somehow) esoteric syntax that I copied and pasted from someone else (sorry, don’t remember who that person was):

strings:
   $dllname = "<filename>"
condition:
   ($dllname at pe.rva_to_offset(uint32(pe.rva_to_offset(pe.data_directories [pe.IMAGE_DIRECTORY_ENTRY_EXPORT].virtual_address)+12)))

which is basically a rudimentary PE file format parsing condition checking if the specific ANSI string is present at a given place inside the file’s export directory (where that internal PE file name resides) and if it matches the string I defined…

After the release of yara 4.0.0 we can use a far more simpler construct to define the very same condition – one that leverages the PE module:

pe.dll_name=="<filename>"

Now…

This internal file name preserved in the export directory of many PE files is a bit of a phenomenon because if we just focus on native Windows OS binaries we will discover a lot of interesting bits. Say, we look at the native PE files taken from the Windows 11 system32 directory — we can easily discover a number of PE files where the ‘external’ (file system-based) and ‘internal’ (PE export directory-based/pe.dll_name) file names do not match…

Here’s a quick & dirty list of such files that I’ve extracted…

And just for a second, let me digress here – I must mention that I generated this quick&dirty file for the purpose of writing this post but then… just eyeballing its content… my attention was immediately drawn to this interesting finding…:

The Windows’ library AppVTerminator.dll uses an internal file name of Arnold.dll. What’s more, the file exports a function called ‘IllBeBack’

If you ever watched the 80/90’s Terminator movie franchise you know this really cannot be a coincident, and a quick google session that followed led me to this gist by @mcbroom_evan. I really love to be the first reporting OS-related interesting facts, peculiarities, and things that make you go “hmmm interesting’, but I was simply late in this case! Kudos to you @mcbroom_evan!

Back to our quick & dirty list…

Looking at the internal file names used by many native Windows OS binaries we can immediately see a bit of a pattern:

dll.dll 21
deffile.dll 8
stub.dll 7
SWEEPRX.dll 3
vm3ddevapi-release.exe 3
vm3dum.dll 3
vm3dum10.dll 3
module.dll 3
sb.dll 2
iwb.dll 2
USERCPL.dll 2
smalldll.dll 2
DeviceInfoParser.dll 2
AppxDeploymentExtensions.dll 2
inprocserver.dll 2
winload.sys 2
PACK2.dll 1
Source.dll 1
respub.DLL 1
client.dll 1

Seeing these stats we can speculate that lot of early code for these native system DLLs might have been created via a simple copy&paste mechanism (dll.dll, deffile.dll, smalldll.dll and stub.dll are hardly unique file names…). Some discrepancies suggest internal struggles with terminology f.ex. PrintIsolationProxy.dll vs. PrintSandboxProxy.dll and some are completely off the limits (tcblaunch.exe/winload.exe -> winload.sys). I’d like to believe there is a logic to it, but I am not very optimistic.

Anyway…

Now that we know what this post is all about, let’s take a stab at a far larger set… that is, legitimate files produced by legitimate vendors – many of their files do include these internal PE file names too, so it would be a crime not to explore this data set…

So, here it is, a list of legitimate internal PE file names you may come across while analyzing samples. Using any of these ‘good’ internal file names as a ‘pe.dll_name==”<filename>”‘ condition in your yara rules will most likely produce FPs… You have been warned 🙂

Note: you can’t use the _file_types_PE_INTERNAL_NAME.zip/_file_types_PE_INTERNAL_NAME files for commercial purposes.

A few days ago Nas kicked off an interesting discussion on Xitter about detections’ quality. I liked it, so I offered my personal insight. I then added a stupid example to illustrate my point to which DylanInfosec replied:

Would love to set some time aside and gather some OS log dumps, throw em in a SIEM and test that way or something. I guess crowd validation with a trusted diverse group could work too. Not-for-profit or anything but just to share with the community

This made me think…

I am an old-school data hoarder; as far as I remember I have always been actively looking for data of interest in a lot of places… And I must confess that the only reason I could immediately provide that stupid mimi-based regex filename search example was because I had an access to my private ‘clean’ file names dataset…

You see… over a decade ago I kicked off a personal project of mine that focused on collecting software data from CLEAN sources. While many people in the cybersecurity industry at that time primarily focused on malware collections, I decided to take a step forward and collect data that was most likely clean. So, I wrote a number of web scrapers, downloaders, used VPN and Tor where necessary and eventually built a large data set of samples that is a a collection of (most likely) clean files downloaded from publicly available sources. I didn’t stop there. I took every single sample that I downloaded and got it decompiled, whenever it was possible… then processed all the decompiled files only to build a modern, full-blown, Windows-centric clean software data collection set that I believed at that time to be far better than NIST’s.

Now, it’s been a few years and this set is getting older and older, every single day, so perhaps it’s time for it to win some brownie points in the community…

Many of our threat hunting rules depend on file names. The file I am attaching to this post includes a list of many PE file names in my collection that are known to be ‘clean’ (to be precise, these are all file names ending with the following file extensions: ‘exe’, ‘dll’, ‘drv’, ‘ocx’, ‘sys’). It goes without saying that you must treat this list as very suspicious, but I hope it will help you to write better detections…

_files_of_interest.su.zip

And to illustrate the point, let’s run a query that is similar to the one I did for my tweet:

rg -i "mimi.*?\.(dll|exe|sys)" _files_of_interest.su

Note: you can’t use the _files_of_interest.zip/_files_of_interest.su files for commercial purposes.

Hexacorn

Hexacorn

Category Archives: Clustering

High Fidelity detections are Low Fidelity detections, until proven otherwise, Part 2

High Fidelity detections are Low Fidelity detections, until proven otherwise