The art of overDLLoading

Some time ago I came up with a silly idea: i’d like to build an executable that statically links to most of the c:\windows\system32 libraries. It’s a non-sensical programming exercise, but it’s also an interesting challenge.

Forcing a static import of so many libraries into a single executable is actually a non-trivial task, and there are many approaches we can take to do it. Most of the high-level language-based avenues one can pursue here are kinda problematic though, because they are full of custom library building aka lots of troubleshooting. After looking at various programming languages I have eventually found myself looking at the assembly language compilers available out there. The incredible simplicity of generating your own, customized import tables offered by fasm immediately caught my attention.

With a bit of python foo and fasm compilation magic, I was able to build this monster (79K APIs):

I am not 100% sure it is a correct PE file (in terms of all structures filled in properly), but it seems to run on windows 11 (with a caveat that it reports a critical error).

If you are wondering what is the purpose of this exercise, I’d like to throw a few ideas:

  • linking to many OS-dependent libraries could be an interesting guardrail technique
  • it may break tools (it would seem it breaks python’s pefile module and it causes problems to decompilers)
  • it is a great learning exercise about a PE file format; after so many years of dealing with it I am still surprised how much I don’t know about it

And here’s the import table as seen by Ida:

An attempt to copy these function names to clipboard pretty much freezes the program.

Looking for the randomness in the most non-AI/ML way…

Here’s an old-school file name-based research… it is not game changing, it won’t bring any immediate solution, but it’s still worth doing today…

The software we install (focus here is on Windows, as usual) creates a loooot of files, and while many of them seem to be completely random, whimsical in nature, especially with regards to their file names, they do end up forming a corpora of sort… Or, when bundled together, all these file names known to be created for legitimate purposes are a great material for research.

For this post I collected 1.5M executable file names from Windows. They may not be a full set of file names ‘out there’, but it’s enough to play around with….

I then looked at statistics of 2- and 3- and 4-character long infixes (ignoring any non [a-z] characters).

The results are below:

  • How often 2-character long infixes appear in these 1.5M file names: filename_stats_2.txt – as you can see, not very useful…
  • How often 3-character long infixes appear in these 1.5M file names: filename_stats_3.txt – not very useful either…
  • How often 4-character long infixes appear in these 1.5M file names: filename_stats_4.txt – this is better… we definitely can cherry-pick a lot of 4-character long infixes that never appear in the set: filename_stats_4_non-existing.txt

Using the latter, we can create regexes sets:

Using these regexes sets you may actually get better at finding randomly named filenames! You will also find a lot of FPs, of course, but now you have a set of regexes you can tune to your needs…

Can this be used in ML/AI research?

Yes, by all means, but the set of file names used as a base should be a loooot higher and collected in a more meaningful way. One can argue that f.ex. temporary files created by installers could be excluded, we could also exclude file names that are following certain patterns in names (f.ex. starting with a dollar ‘$’, tilde ‘~’, or file names conforming to a pattern ‘<GUID>.exe’), we could reduce the corpora by understanding versioned file names (f.ex. ‘FirefoxSetup63.exe’, ‘FirefoxSetup64.0.2.exe’, etc.), we could ignore non-English file names (‘Менеджер BIM Сервера GRAPHISOFT 19.exe’, ‘联系汉化作者.exe’, etc.) or, artificially created file names that are used by many ‘download/update’ managers (‘ICReinstall_’ as in ‘ICReinstall_any_video_converter.exe’, ‘ICReinstall_driver identifier.exe’, etc.), or … we could also focus entirely on signed installers only as well, or compiled within a certain timeframe f.ex. last decade).

As I said… it is not game changing, it won’t bring any immediate solution, but it’s still worth doing today…

And I will now answer the ‘why’:

– just to understand how hopeless the whole file name-matching idea is!