Sandboxing | Hexacorn

This is just a simple proof of concept that can be extended to build a full-blown Delphi API Monitor.

Delphi lives in its own API ecosystem. Reversing Delphi applications requires us to use a dedicated tool/decompiler (e.g. IDR), flirt signatures, and most of this work relies on DCU32INT decompiler. When I built my sandbox and wanted to add Delphi support I created some mini-signatures for some of the more crucial Delphi APIs and anytime Delphi app would be analyzed, I’d look for code patterns, patch them with my API hook, and then observe the results (I described it here).

With the invention of new reversing tools we have an opportunity to re-visit this topic to rapidly produce a prototype of a Delphi API monitor that will be fast, robust and will cover most angles.

Before we begin, couple of points first:

Multiple versions of the same API exist:
- it’s just a different binary encoding of the same functions that made it to Delphi DCUs f.ex. LStrFromString:
  - 870C245131C98A0A42E9xxxxxxxxC3
  - FF3424894C240431C98A0A42E9xxxxxxxxC3
- you may also come across differences in API declarations e.g.:
  - LStrLen (const S: AnsiString)
  - LStrLen (const S: string)

Delphi APIs use a different calling convention, so need to take it into account while writing Frida handlers — eax, edx registers being the registers that Delphi uses to pass 2 first arguments
Strings used by Delphi are encoded differently than in C, with the most typical being a length of the string encoded in first byte followed by the actual string (there are others)
Frida hooking engine accepts both Win32 API module/API names and addresses; the addresses need to be provided as RVA offsets within a monitored module f.ex. -a foo.exe!address
For each foo.exe!address, you need to create a respective handler called sub_<address>.js e.g. sub_34AF.js.

With that, we just need to find an application for testing, and write our first handler.

The old Resource Hacker is written in Delphi. Using IDA we can quickly identify one of its comparison functions PStrCmp at address 0x004029E0 (RVA=29E0):

The example handler showing the calls to this API with parameters can look like this:

{
   onEnter(log, args, state) {
     eax_len = this.context.eax.readS8(); 
     edx_len = this.context.edx.readS8(); 
     eax_str = this.context.eax.add(1).readUtf8String(eax_len);
     edx_str = this.context.edx.add(1).readUtf8String(edx_len);
 console.log(this.context.eip + ":" + eax_str+" "+edx_str);
 },
 onLeave(log, retval, state) {
   }
 }

Now if we launch rsold.exe under frida-trace:

frida-trace c:\test\rsold.exe c:\windows\notepad.exe -a rsold.exe!2A64

which will tell frida-tools to load old Resource Hacker (rsold.exe) and make it open resources of c:\windows\notepad.exe, and add API hook for PStrCmp (RVA=29E0 –> handlers\rsold.exe\sub_2a64.js), we get result like this:

Now that we know what we can do with it, there are at least 2 different avenues we can follow:

Write an idapython script that will export handlers for a given binary and for our APIs of choice
Use DCU32INT and export code for functions of interest from as many Delphi/CodeGear/Embarcadero versions as possible, then convert them into regular expressions (or leverage yara) and build signatures; find these signatures inside target Delphi PE files and convert file offsets of matched hits to RVA offsets, and finally export handlers for all functions of interest (no need for IDA in this case)

What are interesting APIs to handle?

Could start with strings — these are often great to understand the inner workings of programs:

LStrCat
LStrFromPWChar
LStrFromPWCharLen
LStrCatN
LStrCat3
LStrSetLength
LStrFromPChar
LStrAsg
LStrCopy
LStrCmp
LStrLAsg
LStrInsert
LStrDelete
LStrArrayClr
LStrToPChar
LStrFromPCharLen
LStrClr
LStrFromWArray
LStrFromWStr
LStrFromArray
LStrFromChar
LStrFromWChar
LStrFromUStr
LStrAddRef
LStrToString
LStrFromString
LStrEqual
LStrPos
LStrLen
LStrFromLenStr
LStrOfChar

File operations are of interest as well f.ex.:

ChangeFileExt
CreateDir
DateTimeToFileDate
DeleteFile
DiskFree
DiskSize
ExpandFileName
ExpandUNCFileName
ExtractFileDir
ExtractFileDrive
ExtractFileExt
ExtractFileName
ExtractFilePath
FileAge
FileClose
FileCreate
FileDateToDateTime
FileExists
FileGetAttr
FileGetDate
FileOpen
FileRead
FileSearch
FileSeek
FileSetAttr
FileSetDate
FileWrite
FindClose
FindFirst
FindNext
GetCurrentDir
RemoveDir
RenameFile
SetCurrentDir

Okay, we can dump heap buffers. What’s next?

What about a sandbox-like, IOC generator & payload dumper? In its most basic version we will run a sample and our handlers will spit out all the file names of files being opened by the analyzed program. They will also dump file buffers read to and written from. And for a good measure, we will try to convert some of the file creation flags/arguments passed to the APIs so we can get a more readable log.

To dump a list of files being opened by APIs I will focus on handling CreateFileA, and CreateFileW APIs. I chose these APIs for a couple of reasons:

They are very commonly used and are easy to test
CreateFileA & CreateFileW exist inside kernel32.dll
CreateFileA & CreateFileW exist inside kernelbase.dll
you may hook them all, and you may also want to choose either of them; of course, too many hooks is not good, hence there are challenges introduced by this duplication

To test handlers (copy provided at the end of this post) just run this:

frida-trace -i RtlFreeHeap -i RtlAllocateHeap -i KERNEL32.DLL!CreateFileA -i KERNEL32.DLL!CreateFileW -i KERNEL32.DLL!WriteFile -i KERNEL32.DLL!ReadFile -f <exe> > log

Same as with buffers, we will store file handles in a table at the time file is created / opened. We will then lookup these handles at the time of file reading and writing so we can log actual file names in our logs, as opposed to just file handles. In my old sandbox I used a code inject that was relying on NtQueryObject executed in a context of a target process at the time Read/Write APIs were executed, but then again – I had to inject my code into that process, hook APIs before the malicious implant took over. Pretty complicated.

Anyway… since we can map file handles to file names we can now output the content of buffers/arguments to appropriate files (one file will store list of files/objects and the other one – actual file buffers). And for the fun of it, we will file buffers in hex + will include PID and TID, and of course a file name in our log:

The list of objects (and file handle to file name mapping) created using CreateFile APIs will be stored inside objects_list.txt file:

You may notice that some of them are 0xFFFFFFFF — these failed to open. It’s an interesting result – you will not only see existing files being accessed, but also these that don’t. Let me reiterate — these are calls to CreateFile API to access _some_ files or directories that may not be present on the system. Pretty much like Procmon, but a bit easier to read and far easier to mod the output to our needs. Such log’s value in security research cannot be overstated — it can help finding references to non-existing files, phantom libraries, anti-debugging strings e.g. device names, etc..

Finally, our attribute/flags resolution code works as well:

The screenshot below shows how this works in practice – the dwDesiredAccess’s value of 0x80000000 is translated to ‘GENERIC_READ’:

Now, before we get too excited about our ‘building our own sandbox’ experience… let me mention that there are caveats. One of them is that Frida doesn’t work all the time. For the benefit of this article, I tried to run my handlers over pafish.exe executable and… it just got stuck:

I wanted to test pafish, because it refers to a number of devices associated with guest OS devices that help to detect a virtualization:

and

– so I thought I can output all these referenced device names and show how cool the handlers can be. Then the main pafish.exe process got stuck and that’s about it. So, you have been warned.

Still, I have never worked with such rapid prototyping & hooking engine in one. It’s amazing what you can do with a few lines of JavaScript.

You can download my testhandlers from here.

Hexacorn

Hexacorn

Category Archives: Sandboxing

Delphi API monitoring with Frida

Memory buffers for… initiated, part 3 – Frida(y) edition