Not installing the installers, part 2

In the last post I described how we can pull some interesting metadata from decompiled installers. Today I want to discuss one practical example of how this data can enrich analysis, both manual and automatic (f.ex. sandbox, EDR).

Many programs cannot be properly analyzed by sandboxes, because they require command line arguments. While command line options for native Windows OS binaries are usually well documented (well, not really, there is a lot of undocumented stuff, but let’s forget about it for a second), command line options used by goodware is a completely different story. And of course, even worse for malware.

The good news about goodware is that they handle command line arguments in a very predictable way. The string comparisons are usually ‘naive’, direct and not optimized, and often, the programs include actual help that can be seen after running the program with the /?, /help, -h, –help arguments. And very often, a search for ‘usage’ keyword inside the binary can help us to find the options that program recognizes. f.ex. this is what we see inside cscript.exe:

Predictable is good, and can serve at least a few purposes:

  • we can generate a list of known parameter strings that goodware typically uses (and even attribute it to specific software vendors)
  • we can create yara signatures for these
  • we can incorporate this set into EDR command line parsing routines to assess invocation’s similarity to known good
  • we can also leverage this to run the sample in a sandbox or manually with these easily discovered command line arguments (kinda like assisted fuzzing)
  • we can include these findings in the report itself to hint analysts they may need to do some manual reversing (it would seem the program accepts the following command line arguments…)

Looking for typical command line arguments is actually quite difficult. There are a lot of ways to implement string comparisons and as I explained long time ago in one of my sandbox series, there are like gazillion different string functions out there. Plus different compilers, different optimizations make the code even harder to comprehend. Naive search for /[a-zA-Z_0-9]_/ could work on a binary level, but this is going to hit a lot of FPs. Decompiled scripts can come to the rescue, as they include actual invocations of programs and specify precisely what parameters will be passed to the program.

The attached list focuses on a basic command line argument extraction (just the /foo part) from around 10K decompiled scripts. More advanced analysis would include options taking parameters (f.ex. /foo bar).

You can download it from here.

Delphi API monitoring with Frida, Part 2

In my previous post I have demoed a simple example of Frida-based Delphi API monitor. Let’s look at one more example — this time the strings are stored in a different way, with a 32-bit length preceding the actual text which is passed via eax/edx to the utility functions e.g. LStrAsg and LStrLAsg. These 2 functions are very interesting, because they are often used to initialize strings in Delphi programs. By simply monitoring their input we can sometimes guess what the program’s functionality is.

Let’s take 0000E1A0B0CAEFD82B9E71098E92EE08FA3C47B889108B450712B4D9C3AE4D6E sample as an example. A small Delphi downloader that is not really worth any attention, but it’s perfect to demo Frida’s capabilities.

Applying Delphi Flirt signatures to the sample tells us that LStrAsg is located under 0x00403B3C (0x3B3C) and LStrLAsg is located under 0x00403B80 (0x3B80). We can create a quick&dirty handler for both of these functions – it can look like this:

onEnter(log, args, state) {
  edx_len = this.context.edx.sub(4).readS32();
  if (edx_len>0&&edx_len<256)
  {
    edx_str = this.context.edx.readUtf8String(edx_len);
    console.log(JSON.stringify(edx_str));
  }
},

We are taking the value of edx (second argument to both functions), read the length of the string stored as a 32-bit integer at a memory offset at (edx-4), and then we read the UTF8 string from the edx location.

If we now run Frida-trace (sample renamed to 1.exe) we get the following output:

Not bad. We can almost immediately tell there is a network connectivity (user agent) and possible destination for the downloaded payload.

Combine it with InternetOpenUrlA API monitoring (which should be done by default for any Windows binary), we get a really simple and nice answer to ‘what is this program doing?’ question — that’s what sandboxes are for, right?

frida-trace c:\test\1.exe 1.exe -a 1.exe!3B80 -a 1.exe!3B3C -i wininet.dll!InternetOpenUrlA

Yes, it is THAT simple.

Kudos to Frida developers, you have created something truly wonderful!