Enter Sandbox – part 18: How to tell the story + some thoughts about sandbox 2.0

Many sandbox vendors use various function call interception techniques. Their marketing brochures often highlight why their technology stack is better than the one used by the other products. One of the most common arguments are:

using bare metal is better than emulation or guest OS as it can evade anti-* tricks
kernel mode interception is better than the user-mode, because it ‘sees’ it all
solutions that don’t influence the test environment and tested programs are better than those that do (since the programs that modify the environment can be used to detect the test environment)
static ‘hooking’ points are better than dynamic ones as they focus on very specific areas of activity (this one I made up, but I want to highlight the point that static hooking is very 200x)
etc.

They rarely focus on the audience though – i.e. guys who are actually reading these reports on almost daily basis. Usability – an aspect often neglected in the past – is now becoming really important.

I like to think of a perfect sandbox as one that actually tells the best story, is open to many audiences, and the best ones can drill into the nuances, and support more advanced analysts, when asked to.

One that covers a combination of system service call hooking, API hooking, COM hooking, inline signatures to hook Delphi calls and many other popular functions that – while not being exported as APIs via DLLs – are easily recognizable as they are simple static version of popular and often (re-) used code. Add to it known anti- tricks. Add to it support for all the classes of VB language (VB, VBA, VBS), internal APIs used by popular installers, APIs exported by popular libraries (sqlite, zlib, openssl, etc.), support localization, snapshots and differential comparisons (before and after execution), signature scans, memory analysis, etc and you can get a really nice output that can cater for many tastes and needs. Add to it parallel execution on 2 or more distinctive systems (e.g. XP, Win7, Win10, or systems with the network disabled vs. enabled), post-processing with cross-system diff, and you can get a far richer output than from just a single OS tested that has an easy access to the network… Add to it ability to preserve dumped PE modules, memory snapshots, and… provide the interactivity, and we are on a good way to the Sandbox 2.0.

I am actually a big fan of modifying the environment in which that test sample runs. I think it’s necessary and in many cases – inevitable. As long as we want the output to be the most useful. Hiding away from it and observing stuff from a kernel level makes us lose a lot of very interesting, contextual information and make a lot of interceptions much harder.

Let me give you a couple of examples:

There are classes of APIs that return user-mode pointers, which need to be intercepted within the process space dynamically the moment they become available e.g.
- COM interfaces
- Call backs for many functions e.g. timers, some winsock functions, enumeration functions, etc.
- Inline functions returning pointers, structures
There are many wrapper libraries that require hooking of the library functions that go even higher level that standard OS DLLs
- the best example are C functions e.g. fopen, fseek, etc.
- since they use internal system to track handles you need to be able to track the files they open, and their mapping to both internal handles, and ones provided by Win API
- this may be pretty hard on a kernel level, because the mapping system may change over time and per sample to sample (since these are internal structures that may change between compiler versions)
- it is often much easier to obtain a handle of a file by calling an existing user mode API: _get_osfhandle in a context of a monitored process
Live and dynamic patching may be a bit tricky from a kernel level (it’s possible, but keeping that logic outside the VM, emulator, etc. is not very easy to manage).
AutoClickers… you can’t run away from it if you want to handle GUI programs (e.g. installers)
Taking screenshots is also easier from a user mode component

Coming back to the Sandbox 2.0 vision.

Today’s sandboxes are a total mess. Most of them go for easy, low hanging fruits that make life of sandbox analysts a hell.

I think the fundamental issue is that there exists a solid misconception about who is using these sandbox results. Apart from full-automation, sandbox-based products (e.g. all mail attachments executed inline via a sandbox before delivered to the user, if non-malicious), there is a growing number of junior SOC analysts that actually use these products on daily basis. When they see lots of information, often contradicting itself, let alone crazy number of signature- and reputation-based claims they just… start guessing. This is not good for the security industry.

Again, let me provide an example (it’s made up, but I witnessed exactly same scenario for a different .exe) …

If you submit the good ol’ rar.exe, you will see that some of the sandboxes claim it’s definitely clean, because it’s whitelisted. Yet, some may also include a conflicting information that the file has been seen in a correlation with some malicious files. Bad guys often use rar.exe and bundle it with actual, real malware or hacking tools. As a side-effect of this activity they game reputation systems by skewing stats for the rar.exe. Junior analysts seeing such correlation get fooled and assume the rar.exe file is actually bad! I kid you not. If it is 100% clean, why not state it in just a single sentence ‘this file is CLEAN’?

I think there is a great need for more output scenarios from sandboxes. And also, more accountability. It’s no longer enough to just drop a 100 MB XML/JSON output at analysts and expect them to draw their own conclusions. There should be different outputs, targeting a different audience. Things that are 100% bad and common things that are mislabeled as bad (IsDebuggerPresent _is_ not really a malicious API c’mon…, same goes for a bunch or a combination of Registry, WinSock, etc. functions).

Static, all-inclusive and contextless results need to got through a proper cosmetic surgery.

And back to the topic… how to tell the story?

For automated processing – output 0 or 1 (block: no/yes)
For Junior Analysts – output ‘bad’, ‘good’, ‘ask senior engineer for help’
For Senior Analysts – output high-level flow of events so that they can quickly read through the output and understand what the program is doing; an example is provided below; also, let them dig deeper – provide lower-level output, allow them to download files, memory dumps, pcaps, etc. – they will be able to make a call based on all this info
For Senior Management – please don’t…

And really… API Monitors allow to exclude API calls that originate from the common user mode libraries for like 10-15 years. Please add this functionality to the sandboxes. Who cares that msctf.dll is creating mutexes prefixed with CTF. It’s a BAU. It’s NORMAL. Add filters based on data stacking at least… please.

And last, but not least. Make these stories readable. API log is hard to read. If you add a little bit of a narrative even the most junior analysts can walk through it and pick up some bits.

Example for a narrated API log on a program level (excluding calls from common OS libraries) is shown below; you can almost ‘see’ the calculator interface being built here… button after button. Isn’t that cool? This is the closest one can get to generating an automated, dynamic reverse engineering output similar to one could obtain by manual inspection of the program under a user-mode debugger, and after many hours of analysis. If sandbox can produce such output within a few secs, we now enter a discussion about ROI-oriented sandbox analysis beating any malware analysts out there. At least for preliminary analysis, and supporting further digging. So much time saved!

Sandbox 3.0 will be able to decompile a piece of code and map these calls to the pseudocode of the program and show the dynamic calls as it walks through that pseudocode in an interactive session. Because why not. We will get there pretty soon.

See Log1

The very same example, when drilled down to show NT APIs as well (just a few first lines) – this is so much more unreadable, because of additional, noisy OS-library-driven calls:

See Log2

Hexacorn

Hexacorn

Enter Sandbox – part 18: How to tell the story + some thoughts about sandbox 2.0