I am going to start a new series about sandbox and sandbox evasions. I will utilize data I gathered over last 10 years together with an experience of actually getting my hands dirty and coding my own monitor from the scratch. I actually never considered it a real sandbox, as it does way much more, but I’ll use ‘sandbox’ here, cuz everyone already knows what it is.
Creating a good sandbox is a very challenging task. Not only it’s technically challenging, but you also need to be very selective. One such area where you have to be really specific about what you do is a list of APIs that you need to intercept, because:
- if you miss some – you may lose vital information from the report, or fail to intercept one of many ‘escape’ mechanisms that modern malware utilizes for evasion purposes (heaven’s gate, tricks to launch code under a different process while at the same time fooling the sandbox/av monitor that nothing is going on, etc.)
- if you monitor too much – you will get a headache trying to understand the output
There are many ‘schools’ of what to intercept. Some people prefer kernel-mode hooks and monitor stuff on a high-level (or, a low-level, depending where we observe it from). They ‘see’ everything, but they miss context of the execution (process, thread, window procedure, etc.). The user-mode monitoring fans are better off when it comes to the context, but they may miss the more complex stuff. In some approaches the monitoring of APIs/services is also supported by extra checks e.g. $MFT, Registry analysis pre- and post-session, and outside-sandbox analysis of disk/file system/memory. Plus, of course network stuff.
I am personally a big fan of user-mode only monitoring. It worked for me for last 10 years pretty well, and while it may miss stuff I believe that wherever evasive or kernel mode stuff is involved you need to simply get your hands dirty and do manual analysis. This btw. is actually the fun part of the malware analyst job
Note: I am mainly talking about the manual, in-depth analysis of malware and not general-purpose sandbox that is commercially ‘required’ to ‘see’ it all. This is actually quite a headache to manage and I do not envy sandbox companies that need to worry about it.
Okay. So, if we focus on user-mode monitoring we definitely need to know what to monitor.
One approach that can be taken to figure out what APIs to monitor is…very naive statistics – naive, because based on a simple principle and this is the topic I will cover today.
Most of malware nowadays is either packed, or somehow protected. Once it executes, the wrapper launches the actual payload and during this phase it often resolves the APIs. Later on it may inject stuff into other processes, some more APIs may get resolved, and so on and so forth. It can get pretty messy.
Now, there are plenty of methods to resolve the APIs including leveraging the GetProcAddress and/or LdrGetProcedureAddress, or simply walking through the export tables of respective libraries and finding the required API addresses. You can also do pattern searching, brute-force and some fancy algorithmic API address discovery, but these are exotic cases and we don’t need to care for them.
I mentioned that this is going to be about naive statistics – this is why we will only look at GetProcAddress. All we need is data.
As long as we execute a large number of samples while we monitor this particular API we can get a nice, and quite a fair representation of popularity of certain APIs. These APIs need to be screened manually and then a subset of them can be selected for monitoring.
So, looking at results of 150K+ sandboxed samples I came up with the following list of APIs (top 100 are listed):