Enter Sandbox 26: Logs from 1.6M sandboxed samples – a bit of history, a bit of a requiem

July 20, 2019 in Sandboxing

Now that some of you downloaded the file we can have a look at its origin.

When I started coding my sandbox program I never thought it will go that far and that I will continue improving it for so many years. It’s probably the biggest software project I have ever written, maintained, and… I rebuilt it at least 3 times from the scratch ;). Each time changing the direction & adding lots of automation, new algorithms, etc.

While the goal was always pretty vague: to make my malware analysis faster, the real focus was always on… log readability & actionability. This requirement was always my #1 for a very simple reason – I needed a triage tool to tell me a story. If my tool can’t tell me a good story then it is a bad tool. Any other program that I need to engage to triage malware –> fail. As a result my tool failed many times, but each fail meant an improvement and less fails in the future.

It did work this way for many years and supported my personal and commercial work very well, but as the trends changed, the project itself started to get a bit rusty. It doesn’t work with 64-bit programs, no support for OSX, Linux, Android, Metro, Java, Powershell etc. etc.. And using it on newer version of Windows (where it often breaks) only exposes what a big hack it really is/was. Time for it to retire.

I covered many aspects of sandboxing in my Enter Sandbox series. The below is a bit of a summary of how all this stuff is linked together, including a bit of history.

The whole project is implemented in asm32. The automation / building is done primarily with perl. Some bits are maintained in Excel. And many bits are results of months of processing large corpora of samples & painstaking analysis.

Anyway… back to the beginning.

In early 2000s API hooking was one of two main reliable ways to intercept API calls. You could either use debugging functions, or API hooking. Debugging was not an option at that time because of Armadillo that was still quite big back then, and … my experiments highlighted a big issue which was… debugging API performance. It was not too bad, but it was too many wrappers & dependency on OS –> not enough control/speed. On the other hand, patching APIs was fast. It was not easy, but it worked like a charm. While today there are plenty of libraries to help with this task in a coder-friendly way + you can literally learn & implement it instantly, back in a day you had to research and write most of it yourself — including your own length disassembler, injection module, IPC stack, handlers, etc. — lots of hours spent with a debugger to figure it out & make it work.

You see… when you hook an API, you may assume that there is only one way to do it. When you focus on Windows API only it’s relatively straightforward for functions with a fixed number of arguments. For C functions that use variable number of arguments it is tougher + you have a number of libraries to hook in a kinda generic way (e.g. all the MFC versions). Then you have Delphi with no prologues and epilogues that would be easy to patch. Then for COM you need to not only hook COM object instantiation APIs/methods, but also intercept callbacks from interfaces. Then there is .NET that pretends not to be a wrapper for Win32, but it actually is. For Nullsoft installer API you need to build a dedicated engine too. For any less common libraries e.g. specific Nullsoft Plug-ins, python libraries etc. you need to analyze them one by one and add dedicated hooks too. And finally, functions that can’t be found via any exposed pointers can be hooked as well but it’s very tricky – I had to build my own primitive flirt-like detection library to support this functionality. The inline functions are one thing, but let’s not forget RunPE, process hollowing, or code loaded manually e.g. ntdll.dll. For these you need to discover them loaded/built/mapped in memory first, and then parse and hook functions after finding their code either via manually resolved import table or signatures. It’s fun. Many debugging hours of fun.

Then you need to handle tones of different types of arguments. If we just focus on strings they are passed as ANSI, Unicode, PUNICODE_STRING, UTF8, MBCS, Pascal, BSTR, VARARG, and often handled via completely different calling conventions (different registers for starters). And if that’s not enough you want to intercept not only the calls, but also their returns. Again, many ways of doing it, but each little thing causes a new headache. (to be clear, not only intercepting the return code, but also modified buffers).

I mentioned readability. To achieve that, I had to exclude lots of calls. At first it was just selective API logging. When that was not enough I started adding basic value-based filtering to exclude the most noisy stuff. When this was not enough, I started excluding based on a callee address. For Windows API calls, I wouldn’t report NT functions. For C functions, I would not report Windows API functions. In other words, if there is a higher-level wrapper called I try to log it instead of low-level stuff that does the dirty work (and is noisy). When this was not enough I started splitting logs into multiple files. One would log everything, other would log ‘story’ calls, other – memory buffers only, another one – anything that could quickly give me IOCs, or other hints. Hints often included addresses where to put a breakpoint in a debugger if I wanted to further analyze the sample manually. And so on and so forth. The report file I shared is a combined content of all ‘story’ reports.

Then came a long a process of building data enrichment layers. If the function returned a value or buffer content, I would include it. It sounds trivial, but not always easy to do properly (e.g. APIs modify input buffers in unpredictable way, some programs call APIs in a buggy way and you don’t want this to break the session, etc.).

To minimize a size of the report I would try report return values in a same line as the call, if possible e.g.:

GetProcAddress (mod=KERNEL32.dll, api=GetProcAddress)=7C80AE40

A good case for data enrichment is an automatic API name resolution. If an API was resolved via an ordinal, I would add API name to the log to speed up the interpretation/analysis – no one wants to resolve these manually:

GetProcAddress (mod=WS2_32.dll, api=#23 (socket))=71AB4211

Having API calls is one thing; having them with appropriate flags shown makes the story much easier to read, so I added constant resolvers too, both specific values, and bitmasks. Where applicable, paths would be converted to more readable paths using existing Environment variables.

I also created an Excel dictionary with a curated list of interesting terms:

which I kept updating: process names, windows classes, IPs, mutexes, atoms, etc.. I used a very early version of this database in hexdive. To ensure these keywords were looked up quickly during run-time, I added support for Aho-Corasick algorithm, and later added PCRE library too. And with regards to output, I experimented with the SQLITE database as well, but it was way too slow.

The aforementioned exclusions/whitelisting is actually a big part of the project and took a lot of time to develop. Most of sandboxes actually do a very bad job at exclusions, especially when it comes to Registry entries. They just output everything, even if reported Registry activity is a result of clean code executed inside OS libraries. When these are initialized or when certain APIs are called they simply generate a lot of noise. If this noise is not excluded it all ends up in a final, very lengthy report. Same goes for NT API, and sometimes Win32 API. While I hook many, most of them are reported only if they are accessed from the main payload. We really don’t need to log _everything_.

The Excel database of interesting artifacts serves another purpose. If I know names/classes of popular windows, process names, or popular mutexes, etc. – I can look for them in string comparison functions or specific API calls (e.g. FindWindow). If any of these are found it is a very quick way to highlight interesting hits e.g. a sample looking for a specific AV process name, browser window, reversing tool, etc.

The context of reported API calls is also important. If a function is called that requires a callback, I hook that callback and later report all callback calls e.g. EnumWindowsProc. While a bit noisy, this enriches the output a lot – if anything happens when this callback is executed it’s very easy to spot if any specific window or class is targeted _inside_ the callback. For example, below we see windows enumeration that is looking for AhnLab AV window:

CBK::EnumWindowsProc (wnd=… [class=…, title=…], lparam=…)
VC::vc_strstr4 (substring=…, string=ahnlab)
CBK::EnumWindowsProc (wnd=… [class=…, title=…], lparam=…)

Sometimes a hint of such activity can help to quickly tweak your environment to force malware to do stuff that is executed conditionally.

Another issue that I wanted to resolve pretty early was disappearance of artifacts, or their volatility, if you will. I addressed it in a number of ways:

  • If a file was about to be deleted, I would intercept the deletion and copy it
  • If memory was released, I would intercept it before releasing and dump the buffer to a file (this helps a lot to recover payloads, encrypted strings, encrypted code e.g. perl, WinBatch, etc.)
  • If a decompresion function was called, I would write its buffers to a file before and after decompression

As a side note here… run-time memory buffers are probably the most overlooked forensic/reversing artifacts ever. I can’t count how many times I was able to rebuild payloads (often encrypted with some crazy encryptors and wrappers) by… simply reassembling MZ/PE header and sections dumped from memory at the time these buffers were freed. All thanks to malware authors being good programmers enough to release buffers w/o wiping them out first. And I don’t mean virtual memory functions only (they are easy), but also a number of heap and framework-based functions e.g. RtlFreeHeap, free. etc.

Another subset of logging is focused on tricks. Process hollowing, APC, thread context modification, rarely used APIs (e.g. ResumeProcess), sideloading, printer drivers, etc. – any technique that would inject code into another process is intercepted, and then the target process is hooked before the injection itself is completed. This way I can monitor child processes spawn by a monitored process as well as directly or indirectly affected (e.g. if service is started I start hooking services.exe).

The result of all this patchwork is that the final reports contain a relatively small subset of all callbacks that have been actually intercepted by the sandbox. And… they often tell the story immediately. For the trivial cases: time spent to triage a sample is 2-5 minutes; and time spent in debugger: 0.

Don’t get me wrong. I spent more time with analyzed samples, but having all these pointers easily available within first few seconds of triage is an enormous help, and almost always a success story with a client.

Last, but not least – I implemented tones of bugs 🙂 If you find some weird stuff in the reports – it’s most likely my sloppy coding 🙂

Comments are closed.