Delphi API monitoring with Frida, Part 2

In my previous post I have demoed a simple example of Frida-based Delphi API monitor. Let’s look at one more example — this time the strings are stored in a different way, with a 32-bit length preceding the actual text which is passed via eax/edx to the utility functions e.g. LStrAsg and LStrLAsg. These 2 functions are very interesting, because they are often used to initialize strings in Delphi programs. By simply monitoring their input we can sometimes guess what the program’s functionality is.

Let’s take 0000E1A0B0CAEFD82B9E71098E92EE08FA3C47B889108B450712B4D9C3AE4D6E sample as an example. A small Delphi downloader that is not really worth any attention, but it’s perfect to demo Frida’s capabilities.

Applying Delphi Flirt signatures to the sample tells us that LStrAsg is located under 0x00403B3C (0x3B3C) and LStrLAsg is located under 0x00403B80 (0x3B80). We can create a quick&dirty handler for both of these functions – it can look like this:

onEnter(log, args, state) {
  edx_len = this.context.edx.sub(4).readS32();
  if (edx_len>0&&edx_len<256)
  {
    edx_str = this.context.edx.readUtf8String(edx_len);
    console.log(JSON.stringify(edx_str));
  }
},

We are taking the value of edx (second argument to both functions), read the length of the string stored as a 32-bit integer at a memory offset at (edx-4), and then we read the UTF8 string from the edx location.

If we now run Frida-trace (sample renamed to 1.exe) we get the following output:

Not bad. We can almost immediately tell there is a network connectivity (user agent) and possible destination for the downloaded payload.

Combine it with InternetOpenUrlA API monitoring (which should be done by default for any Windows binary), we get a really simple and nice answer to ‘what is this program doing?’ question — that’s what sandboxes are for, right?

frida-trace c:\test\1.exe 1.exe -a 1.exe!3B80 -a 1.exe!3B3C -i wininet.dll!InternetOpenUrlA

Yes, it is THAT simple.

Kudos to Frida developers, you have created something truly wonderful!

Analysing NSRL data set for fun and because… curious, Part 2

This is the second post discussing what we can find inside the NSRL data set.

At this stage we know it’s not only file hashes, but also sections of executables and java .class files stored inside JAR files. Digging a bit more in the file name statistics we find there is another subset of hashes that is quite substantial: MSI tables. They happen to be named in a very specific way i.e. with an exclamation point as a prefix. There are 350K entries like this:

35666 "!_StringData"
26126 "!_StringPool"
14844 "!File"
11991 "!Property"
11530 "!Error"
9987 "!Media"
9545 "!MsiFileHash"
9063 "!Feature"
9026 "!InstallExecuteSequence"
8949 "!Component"
8844 "!Registry"
8822 "!CustomAction"
8781 "!Directory"
8702 "!FeatureComponents"
7535 "!UIText"
7080 "!_Columns"
7009 "!_Tables"
6609 "!Binary"
5837 "!Control"
5540 "!AdvtExecuteSequence"
5528 "!Upgrade"
5500 "!_Validation"
5236 "!RegLocator"
5174 "!ActionText"
5085 "!InstallUISequence"
4843 "!AdminExecuteSequence"
4704 "!CreateFolder"
4638 "!Dialog"
4606 "!RadioButton"
4087 "!AppSearch"

While it may look like not a lot, if we exclude all file names that start with an exclamation mark, dot (a bit unfair, but a good estimate for section names), and .class files we drop the number of entries by nearly 30%:

192,677,750 - all
 57,390,395 - !<filename>, .<filename>, <filename>.class
135,287,355 - excluding !<filename>, .<filename>, <filename>.class

We could further narrow it down by excluding filenames starting with bracketed numbers f.ex. [5]SummaryInformation or underscores f.ex. __DATA__la_symbol_ptr. There is also a substantial number of filenames that are just numbers, numbers with media file extensions (f.ex. 1494.bmp) or are one way or another related to executable resources (manifest.txt, version.txt, CERTIFICATE, VERSION, etc.).

Another area of interest are files that will be most likely always uniquely bound to the NSRL test systems where they were generated and will never appear on other systems f.ex. files with the following properties. <filename>.pyc, <filename>.pyo, .gitignore.

Kudos to whoever is responsible for maintaining the NSRL set. It is an incredibly difficult task to build a list of good hashes. It’s tempting to unpack, decompile, debundle everything and sometimes this activity may just generate a little bit too much noise. I hope that unless I missed something, future versions of the set will include a flag for each entry to indicate whether the file is an embedded resource or a regular file system object. And another useful entry would be a parent. So that one could rebuild a tree of parent-child relationships leading it back to original ancestor file f.ex. if we start with a .msi file that includes a .zip file that includes a bunch of PE files, we could trace it back from the PE file back to .zip file and tgen to .msi and vice versa.

But hey… what does ‘file’ even mean today? Is a .class inside .JAR file a file system object? Or a resource hidden from the file system by the packager abstraction layer?