Malware Analysis | Hexacorn

When you create a system object e.g. file you can specify if its handles can be inherited by child processes. We then just need to tell CreateProcess to duplicate these inheritable handles and the child process can access these handles as well. It’s a very well-known and very well documented functionality.

BUT

With that in mind, a simple idea was born: what if we create a file (open handle) with one process, then write to the very same file with a different process (or processes) – using that inherited handle. The inherited process doesn’t need then to formally open the handle, because… it’s already there.

This is how it looks like in practice:

We have got two test.exe processes here – 3340 creates the file/opens it for writing, and spawns 3584 that writes and closes the file.

I have not tested it, but I am wondering if such split handle handling (pun intended) could confuse any of the existing security solutions.

If a solution relies on any sort of dynamic per process lookup tables that are mapping handles to file names in realtime (mapping built by intercepting file creation requests), such solution would not be able to map it for a spawn child process (handle is already there, and file creation operation was never there). This is probably rare case, but always…

Also, I believe most of security solutions expect one process to be managing the whole lifecycle of each created file. The typical patterns goes like this: malware is downloaded, malware runs, malware drops a file, malware executes it, and so on and so forth.

BUT

Could we for example write every N-th byte or N-th line of code inside the dropped file using a different process, or a number of them? What if these writes happened at different times (avoiding temporal proximity analysis). Could such operations be coherently put together and presented on a single timeline?

On UI level EDRs and sandboxes are very process-tree oriented and the timelines follow this UI paradigm. Browsing through a timeline of one process would surely be not enough to see the whole context of all the operations for a file managed by multiple processes. What’s more, the main process could be writing benign data to a file knowing that these red herring operations is the very first thing analysts look at, and only spawn children could be writing the real juice to the target file and e.g. 1 hour later – long after the main parent process already died.

There are probably other concurrency issues here?

Obviously forensics will always reveal the actual content of the file, but… over last few years we moved many of our processes away from heavy-duty forensics (hard drives) towards lighter forensics (volatile data) towards timeline analysis (edr/threat hunting). Attacking assumptions that these security solutions rely on is probably one of the first steps to more robust anti-timeline techniques we will see in the future.

When you begin your programming career one of the first lessons focuses on reading command line arguments. It is very trivial, but when you start coding more and in new languages you will quickly discover that it’s actually less than trivial and a bit of a mess.

Programming languages use many different ways to access the command line arguments, e.g.:

argv
wargv
args
$argv
@ARGV
arg
sys.argv
ParamStr
Command$
WScript.Arguments
etc.

I can’t count how many times I googled proper name/syntax for these over the years – ad hoc programming in different languages makes it quite difficult to remember. Also, some programming languages start indexing of arguments from 0, some from 1.

A way to access these parameters also differs. Sometimes you have it available as a string, an array, sometimes you need to call a function to retrieve specific items for you, and in some cases you need to write your own parser or tokenizer.

And finally, some frameworks require certain (standard) approach to passing arguments so that a (standard) parsing routine can extract them properly. Then there are quirks – paths with spaces, extra spaces, ANSI, Unicode characters, and you have two buffers available for parsing – a path to actual executable, and its command line. And the first is not always a full path, or is a path expressed in a different way than expected.

It gets even more complicated when you start reversing. This time it’s not only programming languages per se, but also the binaries they produce and these differ depending on architecture, OS, compiler’s flavor, version, optimization settings. It is all very messy.

Grepping a repo of import function names I came up with this short list of APIs & external, or internal symbols/variables:

CommandArgs
CommandLineToArgvW
GetCommandLineA
GetCommandLineW
g_shell_parse_argv
osl_getCommandArg
osl_getCommandArgCount
rb_argv
StringToArgv
_acmdln
_wcmdln
__argc
__argv
__p__acmdln
__p__wcmdln
__p___argc
__p___argv
__p___wargv
__wargv

Why would we need these?

Many programs require command line arguments to run. Sandboxes that can’t recognize these will fail to produce an accurate report. Not only some malware is using this trick on purpose, there are also tones of good programs that end up in sandbox repositories and never get properly analyzed (e.g. compiled work from students of IT, or native OS binaries)

Sandboxes that recognize programming frameworks & the way they parse command line arguments are in a better position to analyze such samples. This is because there is at least a theoretical possibility of heuristic determination if a sample require command arguments, or, if it accepts any. At the very least, they should hint that in their reports.

There are some command line arguments that are universal and can be guessed e.g. /? or /h. Others require a lot of reversing since program’s logic is often hidden under many layers of code and nested calls.

What kind of heuristics we can come up with?

For instance, if an API called immediately after GetCommandLine is ExitProcess then the chances are this program requires command line arguments.

If we can determine location and internal layout of WinMain or main functions and then also of an argc variable (using e.g. signatures, hooking, or emulation, or by monitoring stack), we can attempt to trace the access to this variable. When access is detected we can try to analyze code that is using the variable’s value. If our sample exits almost immediately after this comparison the program most likely is requiring command line arguments.

Other possibilities could involve:

monitoring of dedicated parsing routines, e.g. getopt function, but also many inline functions that are embedded in popular frameworks
string detection for popular arguments, e.g. /s, -embedding
string detection for help information, e.g.: usage:
detection of installer type, version (they usually accept some command line arguments that are predefined)
fuzzy comparison against known files (if we know sample X required command line arguments, chances are that a similar file will too)
‘reverse proof’ of no CLI requirement
- if it calls GUI functions then less likely to wait for arguments (but may still accept them)
- if it is an installer, then we typically know how to handle it (e.g. using clickers)
- if it is a driver – no command line arguments
- if it is a DLL, most likely no command line processing (BUT some of the exported functions do rely on command line arguments!)
etc.

Overall this is a non-trivial task and there are very poor chances of offering a generic solution here, but it is a good idea to at least flag the file for manual analysis. Either in-house or in a report for client.

Hexacorn

Hexacorn

Category Archives: Malware Analysis

Toying with inheritance

Enter Sandbox part 25: How to get into argument