String analysis for n00bs

I like to demo this little windows executable to everyone who thinks they are doing the reverse engineering bit right, by using available automated static and dynamic analysis tools, and trusting them blindly.

The sample is a PE32 that is 2560 bytes long. Running ‘strings’ over it produces these results:

!This program cannot be run in DOS mode.
Rich
.text
`.rdata
@.data
8/u
ExitProcess
GetCommandLineA
kernel32.dll
GetStdHandle
WriteFile
Hello World!

Running it from a CLI gives us the following text being printed out to the STDOUT:

Hello World!

One can say that both static and dynamic analysis give us the same output. Based on this info it’s kinda obvious to conclude that this small binary is a simple CLI program that prints out ‘Hello World!’ when executed.

Except, only code analysis can help us to determine that the program behaves differently if we pass a ‘/h’ argument to it.

In such case, the dynamic analysis will show that the following string is being printed out to the STDOUT:

Hello Baby!!

Static analysis was done right. Default dynamic analysis was done right. And code analysis was done right too. It’s just the automation that failed.

Just a reminder that we can’t blindly trust the automation, because it only sees the obvious. And command line arguments are not the only way to trigger execution of a different branch of code. It could be a guard rail of any sort: time of the day, locale of the OS, delayed payload, payload downloaded from a site that is not available atm, etc.

in the interest of full disclosure: I have not ‘analyzed’ this sample with any AI framework, so am still hopeful that at least some of them would see through this little mischief.

Good Exports are real

Collecting ‘good’ samples helps to discover a lot interesting patterns. In my old post I focused on the PDB paths extracted from the DriverPack driver collection, yesterday I touched on the list of ‘file names associated with good known kernel drivers’, and today I will cover the function names exported by a very large corpora of ‘good’ DLL samples.

You may ask what is the value here, and I can answer that ‘this is how the normal looks like’.

How is that useful to the Threat Hunting crowd?

If you monitor rundll32 invocations referencing DLLs and their API functions you may quickly discover a lot of anomalies. Any invocation referring to a non-OS DLL is suspicious. Any invocation referring to a DLL in a suspicious location is… suspicious. Any process using unusual constructs is suspicious. Any process invoking DLL exported functions via ordinal numbers is suspicious. Any process referencing API ordinals via negative or large ordinal numbers is super-suspicious, too.

These are great ‘suspicious’ tests, but we can do more.

The ‘StartW’ export used by Cobalt Strike DLLs is a good example. Invocations of this function are not necessarily ‘suspicious’ by default, because we don’t have a point of reference. There are so many legitimate invocations of rundll32 executing exported functions from so many DLLs that it’s hard to zoom-in on this particular function and declare that it’s bad. Again, we need a point of reference, of sort.

The list of functions exported by ‘good’ DLLs is far longer than expected: 11375507 unique entries, with many very popular and some only occurring once. You can download an archived text file referencing many ‘good export names’ from here.

There are so many uses for this set:

  • known-good names for threat hunting purposes
  • a very fertile ground for a deeper lolbin research
  • a very fertile ground for discovering new vulnerabilities

The set is watermarked hence you have been warned. You cannot use this set for any commercial reason. You cannot create any commercial detection based on this data. The only exceptions are: fully unlimited use by law enforcement, and for educational and non-commercial research purposes only.