Enter Sandbox part 21: Intercepting Buffers #2, Abusing the freedom of buffers, after their release

In my last post I mentioned the magic word ‘buffer’. If you follow the series, you now know that the strings are great buffers to look at, and… that ‘there is more’.

There is indeed.

If Steve Ballmer was at some time a CEO of a sandbox company I bet his famous Developers video would be now known as Buffers.

Apart from monitoring string functions, the most successful results in my dynamic malware analysis came from monitoring of selected memory-oriented functions. And no, no, not these that allocate the memory only, but these that actually release the previously allocated memory blocks, or these that change their security rights.

You see, most of the malware using in-memory payloads or encrypted configs follows a very well established pattern:

  • allocate memory
  • unpack/decompress payload/config to it
  • use some runPE module to resolve the imported APIs (if code)
  • transfer execution to the new entry point (if code)
  • in some cases temporary buffers are used, and they are often freed after use

This pattern is highly prevalent.

This pattern almost begs for us to start monitoring the VirtualFree and VirtualProtect functions.

Why?

At the time VirtualAlloc returns, the block will be allocated, but there is nothing in it. At the time the VirtualProtect or VirtualFree APIs are called, the actual juicy code/data is already there. Most of the time.

As a side note: obviously, you can, and should expand the described monitoring coverage to NT functions as well (NtFreeVirtualMemory, etc.).

You may not believe it, but before malware authors become malware programmers, they were just… well… programmers. And I don’t know any C programmer who wouldn’t be taught these two fundamental principles of memory management during their C training courses:

  • if you allocate some memory, you must free it when no longer needed
  • if you change permissions, there must be a reason for it i.e. the memory block is already filled-in with something

The second one I actually made up, but it not only fits my narrative – it is actually aligned with the way most of malware is written. Allocate a buffer. Copy stuff to it. Change Permissions. Execute it/Interpret it/Use it (e.g. decrypted config). Or, move it somewhere else.

In my experience there are not that many malware programmers who can disengage from these two code patterns. Obviously, it only applies to old-school programming languages where it actually matters how you utilize memory. And nicely written shellcodes.

Okay. So… the moment the buffers are freed, or their permissions are changed, we can jump on it, dump it, and harvest the juicy code/data.

There are even more good news.

In case the malware author doesn’t free buffers, doesn’t change permission there is a hope. At the time the memory is allocated, you can set up an event that will trigger the monitor to dump the memory of the allocated buffer.

For example, you can dump the block when:

  • instruction pointer is within the previously allocated block (i.e. the code is being executed from a dynamically allocated buffer!)
  • certain amount of time passed after allocation
  • checksums of the memory block change
  • certain number of memory writes to the region occurred
  • a number of APIs were called after allocation
  • network connection was initiated (i.e. payload is ‘working’)
  • program terminates or crashes
  • etc. etc.

And there are even more good news.

Many script/code obfuscators that rely on hiding the code using the security-by-obscurity tricks are coded in high-level programming languages. These make extensive use of the heap functions when they deal with memory blocks.

A-ha.

While Virtual memory functions are cool to monitor, what about heap functions?

Bingo.

In most cases you can access the hidden code/data processed by these ‘obfuscators’ almost instantly. Just wait for the functions that release the memory from heap to be called, and just dump the content of these allocated, but no longer needed memory blocks.

Also, if you are wondering about monitoring GlobalAlloc & LocalAlloc APIs and their respective GlobalFree and LocalFree memory releasing functions, at this stage they are just wrappers for heap functions. You can of course monitor them separately too (may help in malware family fingerprinting).

We mentioned virtual memory functions (served by both win32 and NT APIs), heap functions, what about the stack?

Yes, by all means.

This is again yet another great source of intel. If you use debugger on regular basic you know that the stack is a great source of information. If you can build a tree of calls that led to e.g. crash you have a lot of information to investigate and troubleshoot the issue.

And it can be extended to a data buffer inspection, e.g. looking at the local variables of a calling code.

Anytime you intercept an API call, you can inspect the stack buffers and see what interesting information can be found there. Again, very often it will be strings of any sort (including these that were built on the stack using more obfuscated code), pointers to strings, pointers to pointers to strings, offsets to structures in memory, sometimes functions callbacks, etc. All of it can support the manual analysis a lot.

Listing hexadecimal values of what is currently on the stack (and buffers some stack values point to) before and after API is called is really useful (meaning: e.g. 10-20 dwords/qwords beyond the actual API arguments that can be of course interpreted easily, because we know what arguments are passed to APIs).

Natural progression will take us towards more obscure areas. Hooking of malloc, free, calloc, new, various constructors, and COM-oriented stuff e.g. CoTaskMemAlloc and CoTaskMemFree functions, COM interfaces, etc..

The scope is very big.

Is it worth it?

Yes, this trick worked for me under a number of occasions, and primarily… it saved me a lot of time; instead of trying to reverse engineer the whole thing, I would just wait for these functions to be called, dump the code, edit it a bit, beautify, and analyze.

And if you ever used Flypaper from HBGary, you gonna love it. Sandboxes offering such granular API-interception level, or even inline-function monitoring takes what FlyPaper did to the next level. You will see as many buffers as possible. You can inspect them on a timeline. You can literally copy&paste stuff out of them: actual code, configs, decrypted URLs and other IOCs, and can break apart C2 easier as well.

And last, but not least. The difficult part.

When we talk about monitoring memory functions, there is one caveat I need to mention. Most of these functions, when intercepted, will require you to estimate the size of the buffer that is being released. You need the proper size, or you will be dumping never-ending pages of random memory data. Trust me, w/o a proper size you will be dumping hundreds of megabytes of garbage.

For strings, you can calculate the length, or use predefined structures that hold the length of a string buffer. For generic memory buffer it’s much harder. You may of course use various heuristics, exclude padding zeroes, etc. but… the best is to obtain the actual, real size of the buffer.

I can think of three approaches here…

You track memory allocation functions and register the requested sizes, and track their changes (e.g. realloc functions). Pretty hard to do.

Or…

You can interpret the actual memory of the process to calculate the size.

Or…

It’s really handy if your sandbox monitor can actually call the dedicated API functions that can provide this information ad-hoc, and within the context of a given process, thread. So, your callback for ‘free’ function is called, you call the ‘give_me_the_size_of_this_block_given_the_address” function. With the retrieved size, you can dump the properly sized buffer.

For instance:

  • for heap functions you can call RtlSizeHeap
  • for Virtual memory functions you can call VirtualQuery
  • for COM functions, you can call respective APIs or methods, if they exist
  • for any high-level-language wrappers you need to find (often inline) a wrapper function that will tell you the size of the allocated buffer based on its address

It’s very invasive, somehow expensive, but works really great.

To conclude… buffers are everywhere and it’s worth looking at them, collecting them, and offering them to analysts:

  • Any sort of memory functions that are documented Windows API
  • Any sort of functions that are detectable inline (statically linked libs, Delphi, etc.)
  • Mapping files and sections, Unmapping files and sections.
  • Crypto functions (CryptDecrypt, CryptEncrypt, CryptDeriveKey, CryptHashData, CryptProtectData).
  • File Writing, File Reading functions. File Seeking functions.
  • Internet Read, Write functions.
  • Copying memory buffers
  • Filling in memory buffers with zeroes or other values
  • Compression/Decompresion, built-in, and well-known copypasta code (or, family-based) that can be hooked inline
  • Encoding/Decoding, as above
  • Database queries
  • WMI queries
  • String operations of any sort, including translation (Unicode->MBCS, DBCS, ANSI, and vice versa)
  • Hash calculation – in, and out buffers (e.g.A_SHAFinal|Init|Update)
  • Resource buffers
  • GUI elements (not only desktop screenshots, but also window elements, including invisible ones, icons, bitmaps, menus, dialog boxes, property sheets, etc.)
  • Bitmaps of any sort (BitBlt, StretchBlt, etc.)
  • Video buffers of any sort (capCreateCaptureWindow)
  • DirectX/OpenGL buffers
  • Console buffers
  • MessageBox buffers
  • Programming language-specific buffer APIs (e.g. VB __vbaCopyBytes)
  • and tones of others

It all asks for an interception.

It all asks for depth of analysis that goes beyond your regular sandbox output.

It all asks to be configurable.

Modern Sandboxes intercept a lot of artifacts created by the samples. I value the most these that can actually preserve not only information about high-level artifacts, but also full snapshots of file content, Registry buffers, network operations, and memory dumps, including properly dumped PE files, where available, as well as windows. The more the merrier.

Sandbox as a tool to determine whether sample is bad or good is old news.

Sandbox that actively supports the reverser who will take all these dumped buffers and will finish the analysis of the sample is much better news.

And what’s in it for sandbox companies?

Better detection capabilities. Expansion of the audience from just analysts to hardcore reversers. Expansion of the possible market to QA/QC/test labs. Providing a black-box support for debugging, localization.

Perhaps simple ‘being ahead of a curve’, too?

Enter Sandbox part 20: Intercepting Buffers, f.ex. Python code from compiled binaries

In my previous post in this series I mentioned that looking at ‘dynamic’ strings processed by the analyzed sample adds a lot of value.

We shouldn’t really think of strings as strings. We should think of them as buffers. As such, intercepting interesting buffers is actually what makes sandboxes so useful. Strings are a big part of it, but as usual, there is more.

In some older posts I have already demonstrated how often it is the case that knowing where to look allows us to extract very interesting buffers, and often – the actual code of the hidden program/script:

It applies to:

  • Delphi programs – hooking inline comparison functions helps with extracting info of command line arguments accepted by the program (manual analysis would be quite painful, even with IDR, or designated IDA scripts and flirt signatures; they are, admittedly, a game-changer for static analysis for these binaries, but why can’t we just extract this data with a sandbox?)
  • Nullsoft Installers – intercepting actual Nullsoft installation scripts
  • Perl2Exe – POS malware is easy to analyze when you extract the perl script that _is_ the actual malware

The very same applies to WinBatch, and many other ‘script to exe’ solutions that basically try to hide script using the good ol’ security by obscurity method.

And anyone who looked at modern (emphasis on ‘interesting’) malware knows that most of the juicy code is hidden in memory buffers allocated temporarily during the run-time, or in tones of randomly generated garbage code, or code that is virtualized. No matter what technique is used to slow down the analysis tho, tracking these buffers is often the key for a quick determination of what sample is doing.

Admittedly, it is relatively easy to monitor the copypasta code, but much harder for creations coming from more advanced malware authors. They actively try to make this tracking work harder. Not only they strip the MZ/PE headers, section names, sometimes use their own PE loader, some use shellcode-only code, etc. Some use hundreds of small buffers that are hard to keep a track of. And then there are noise-generators that will make analysis of event intercepted by even the best-placed hook really hard (e.g. string operations that don’t mean a thing, but may trigger various detection, or will simply be truncated due to a number of API calls). The latter is actually another anti-trick. Call an API enough times and it will stop being logged. For every clever monitoring idea, there is a way to make it less clever.

Anyway… talking about buffers is a subject for another post. In this short text I will show how placing a good hook works very well with some Python programs that got converted to .exe. In this particular instance – I will describe my thought process for analysis of an old PyInstaller-ed sample (note, it may not apply to all versions of PyInstaller; the sample I am talking about is from ~6 years ago!).

I remember looking at this particular sample a few years back and was scratching my head. I knew it’s a wrapper, but was not sure how to bite it. At that time there was not that much body of knowledge available on how to analyze this sort of samples, no good static decrypters/code extractors were available (at least these I tried didn’t work), so I was looking for some quick wins using the good ol’ reversing trick – cheating.

I quickly noticed that python27.dll was loaded early during the program execution. Looking at the function names resolved by the program via GetProcAddress I hypothesized that some of them could be monitored to retrieve the source code that I assumed was present inside the sample:

Py_NoSiteFlag, Py_OptimizeFlag, Py_VerboseFlag, Py_Initialize, Py_Finalize, Py_IncRef, Py_DecRef, PyImport_ExecCodeModule, PyRun_SimpleString, PyString_FromStringAndSize, PySys_SetArgv, Py_SetProgramName, PyImport_ImportModule, PyImport_AddModule, PyObject_SetAttrString, PyList_New, PyList_Append, Py_BuildValue, PyFile_FromString, PyString_AsString, PyDict_GetItemString, PyErr_Clear, PyErr_Occurred, PyErr_Print, PyObject_CallObject, PyObject_CallMethod, PyThreadState_Swap, Py_NewInterpreter, Py_EndInterpreter, PyInt_AsLong, PySys_SetObject

My attention immediately focused on the PyRun_SimpleString function

int PyRun_SimpleString
(const char *command)

This is a simplified interface to PyRun_SimpleStringFlags()
below, leaving the PyCompilerFlags* argument set to NULL.

int PyRun_SimpleStringFlags
(const char *command, PyCompilerFlags *flags)

Executes the Python source code from command in the __main__
module according to the flags argument. 
[...]

I hypothesized a.k.a. hoped that monitoring it would get me the python code executed by the program. I added a quick hook for this function to my program, and… lo-and-behold, I immediately was able to see the results:

PyRun_SimpleString:
import sys

PyRun_SimpleString:
del sys.path[:]

PyRun_SimpleString:
sys.path.append(r”<path>“)

PyRun_SimpleString:
sys.path.append(r”
<other path>“)

PyRun_SimpleString:
# Copyright (C) —
<some bootstrap pyinstaller code>

[…]

PyRun_SimpleString:
from Crypto.Cipher import AES;
from base64 import b64decode as hAtayw;
import os;
import base64;
import ctypes;
from Crypto.Cipher import AES as Ahquye
exec(hAtayw(“

[…]
the actual encoded malicious code followed! from there it was easy-peasy…

This simple hook served me many times since, and I was able to quickly analyze many samples that were ‘protected’ this way.

Sometimes the simplest things work.

Monitoring crucial functions is not one of these things, unfortunately, because you need to first discover what these crucial functions are.

I hope this post and other in this series help…