String analysis for n00bs

I like to demo this little windows executable to everyone who thinks they are doing the reverse engineering bit right, by using available automated static and dynamic analysis tools, and trusting them blindly.

The sample is a PE32 that is 2560 bytes long. Running ‘strings’ over it produces these results:

!This program cannot be run in DOS mode.
Rich
.text
`.rdata
@.data
8/u
ExitProcess
GetCommandLineA
kernel32.dll
GetStdHandle
WriteFile
Hello World!

Running it from a CLI gives us the following text being printed out to the STDOUT:

Hello World!

One can say that both static and dynamic analysis give us the same output. Based on this info it’s kinda obvious to conclude that this small binary is a simple CLI program that prints out ‘Hello World!’ when executed.

Except, only code analysis can help us to determine that the program behaves differently if we pass a ‘/h’ argument to it.

In such case, the dynamic analysis will show that the following string is being printed out to the STDOUT:

Hello Baby!!

Static analysis was done right. Default dynamic analysis was done right. And code analysis was done right too. It’s just the automation that failed.

Just a reminder that we can’t blindly trust the automation, because it only sees the obvious. And command line arguments are not the only way to trigger execution of a different branch of code. It could be a guard rail of any sort: time of the day, locale of the OS, delayed payload, payload downloaded from a site that is not available atm, etc.

in the interest of full disclosure: I have not ‘analyzed’ this sample with any AI framework, so am still hopeful that at least some of them would see through this little mischief.

Files of interest

I really like this MalBeacon’s project because it highlights how easy it is to detect many malware families by just looking at the telemetry/forensic data obtained from a file system. And since I like it so much I decided to comb my data looking for file names that may serve a similar purpose… The result is this file.

The set is watermarked hence you have been warned. You cannot use this set for any commercial reason. You cannot create any commercial detection based on this data. The only exceptions are: fully unlimited use by law enforcement, and for educational and non-commercial research purposes only.