Using Virtual Machine tools for Guest OS fingerprinting

A popular way of binding samples to a specific machine is by taking a hardware fingerprint of the system, and sending it to the server. The server then encrypts the payload using a key that is derived from the system fingerprint and sends it back. The payload will only run on a system for which the payload loader can extract the same fingerprint as used to encrypt the payload.

It turns out you can bind the sample not only to the hardware fingerprint, but also to the custom, guest-only properties assigned to the guest operating snapshots (okay, in some way it is still a hardware fingerprint).

For VMWare, you can use a tool rpctool.exe / vmware-rpctool.exe / vmware-guestd to either set or retrieve the value of guest OS properties e.g.:

  • rpctool “info-set guestinfo.foo 1234”
  • rpctool “info-get guestinfo.foo”

Malware could either use the tool, or talk to the RPC interface directly.

There are various possibilities. If the VM is a part of the farm, malware authors could enumerate existing guest OS properties in a same way they list well-known sandbox hostnames. Based on the analysis of incoming information (e.g. stats) determine then if the VM is deemed ‘infectable’. They could also set their own guest properties and only continue deployments of future updates to malware if the property still exists and contains expected value.

So, it could be yet another potential anti-sandbox trick/evasion.

There are more Guest OS tools.

Detecting VM is a popular way to evade sandboxes. With a growing number of environments that rely on VMs the way the detection is done may soon change – perhaps it will have to be a little bit more refined than before? I.e. it is OK if it is a VM, just exclude some of them?

For example, using a tool VMwareToolboxCmd.exe, one can retrieve the time of the host machine:

  • VMwareToolboxCmd.exe stat hosttime

If the host and guest times are not synchronized, what could be the reason?

An alternative method can check if the time synchronization is enabled at all:

  • VMwareToolboxCmd.exe timesync status

Yet another way to detect possible customized properties of the guest OS is by looking at its scripts e.g.:

  • VMwareToolboxCmd.exe script power default

The ‘power’ refers to an event that makes the script run, and can be replaced with other event names e.g. ‘resume’, ‘suspend’, ‘shutdown’, each associated with a respective script. I have described using these as a possible persistence trick long time ago.

Turns out that using the VMwareToolboxCmd.exe tool one could modify the paths of the default scripts (i.e. it is a new persistence trick). Their content, if changed, could also be an additional fingerprint signature.

And one more command to cover: testing the availability of new version of VMWare tools. The version of VMTools, as well as the fact it is not being updated can help to enrich profile of a guest OS:

  • VMwareToolboxCmd.exe upgrade status

There is probably more possibilities out there…

Enter Sandbox part 19: The string theory is cool, but the practice is not

Monitoring string functions inside a sandbox is really helpful. This is because strings are probably the must important buffers that we want to see being processed by the analyzed programs. They are on such a nice high-level of abstraction that we intuitively understand their meaning, often even context, and all of it without much effort. Allowing us to save time otherwise needed to understand the inner workings of samples.

By peeking at strings we can extract a lot of valuable information f.ex. about program processing its command line arguments (e.g. number of arguments, discovery of available options when strings are compared directly), how dynamic strings are built, how conversions are done, what conditions are tested, including detecting a variety of anti-* tricks. Monitoring strings helps with an automated extraction of additional IOCs as well, f.ex. URLs that are not being actively used, yet have been built and stored in memory, etc.

The problem is… it is almost impossible to cover them all! Hmm let me rephrase that last sentence – it’s almost impossible to cover even the basics!

If you ask any programmer how many string functions are offered by their favorite programming language I bet the answer will be oscillating within a few dozen, to … say… a 100? and if they are aware of more archaic issues related to character encoding, and worked with various programming languages… maybe they will double that number, maybe even triple it.

It turns out that the ‘old’ Windows API framework alone offers more than 380 of these. And together with libs offered by various programming frameworks this number easily multiplies.

Over the years I made many attempts to build a comprehensive list of all strings functions that are offered via Windows API, and popular libraries.

It should be easy, right?

  • Pick up a number of the most commonly imported DLLs e.g.:
    • msvcrt.dll and its variants
    • kernel32.dll
    • user32.dll
    • advapi32.dll
    • oleaut32.dll
    • ole32.dll
    • maybe throw in the ntdll.dll as well
  • look at their exported functions
  • cherry-pick these APIs that are associated with string processing
  • and… you are done.

BUT

Such exercise brings up more questions than answers. Which ones do we include? How many function sets are actually out there?

Let’s see…

Copying, moving, length calculation, comparison, substring extraction, character position finding from left and right side, parsing, tokenization, concatenation, reversing, searching, replacement, regexes, ANSI versions, Unicode versions, UTF-8 versions, NT API, Windows API, case sensitivity variants, string memory allocation and release functions, integer to string, string to integer, long integer to string, string to long integer, time/date to string,  string to time/date, string formatting, string trimming, lower case, upper case and many other conversion functions, gazillion of wrappers to cater for various character sets, endianess, or to address certain classes of vulnerabilities, or specific ad-hoc needs of programmers or frameworks (e.g. Variant type, strings used by COM, etc.).

And these are just ‘standard’ APIs, without:

  • C++ functions (with its overloaded constructors)
  • Visual Basic
  • Visual Basic for Application
  • Visual Basic Script
  • JavaScript
  • tones of other wrappers that exist either within native OS libs and programs (.NET, PowerShell, Office, Shell APIs, crypto APIs, OLE/COM wrappers, multiple versions of msvcrt, kernelbase and api- wrappers)
  • popular frameworks (QT)
  • popular libraries (PCRE, SQLite3)
  • exports from popular DLLs supporting programming environments like python, perl (yes, we can monitor these too!)
  • different compilers e.g. MingW, or Borland/Delphi/Code Gear/Embarcadero that rely heavily on inline functions
  • kernel functions
  • internal functions that can be recognized via debug symbols
  • inline/internal functions re-used by malware, if signatures and hooks can be applied to them (e.g. string encryption/decryption functions)
  • plus… tones of duplicated code, thanks to static compiling, and your good ol’ copypasta

One can argue that many Window-oriented functions, or messages could be also included as they offer an extra insight into the program’s inner working. Hence functions operating on resources that are building blocks of the UI (ribbons, menus, labels, buttons, etc.), dynamically created UI elements, as well as any messages that have to do with a text (WM_*, EM_*, etc.) could be also included. Going further, we can also include more advanced, or shall we say higher-level functions e.g. XML processing APIs, Database APIs, any APIs or method processing a syntax (e.g. WQL in WMI). And yes, we can argue that many of them will eventually reach out to the lower-level string APIs that operate on actual text, but hey… the API tree like this will be a great time-saver.

If we take a step further the need to monitor all strings can be more precisely defined as seeing all strings processed on the highest possible level (i.e. on the program nesting level, not intermediate libraries). It is extremely difficult to do, but perhaps one day… in Sandboxes 3.0.

You see where it is going?

The other end of the rope is the inevitable noise and performance that such in-depth monitoring would certainly affect very badly… Still, for specific samples such in-depth analysis would offer a lot time back to reversers who otherwise need to manually deconstruct the business logic of the samples.

Nearly 390 string functions are listed below. There is more, but I can’t list them all; because if you program sandboxes, you need to do your homework yourself 🙂

IsTextUnicode, CompareStringA, CompareStringEx, CompareStringOrdinal, CompareStringW, IdnToAscii, IdnToUnicode, lstrcat, lstrcatA, lstrcatW, lstrcmp, lstrcmpA, lstrcmpW, lstrcmpi, lstrcmpiA, lstrcmpiW, lstrcpy, lstrcpyA, lstrcpyW, lstrcpyn, lstrcpynA, lstrcpynW, lstrlen, lstrlenA, lstrlenW, __isascii, __toascii, _isalnum_l, _isalpha_l, _isatty, _iscntrl_l, _isctype, _isctype_l, _isdigit_l, _isgraph_l, _isleadbyte_l, _islower_l, _ismbbalnum, _ismbbalnum_l, _ismbbalpha, _ismbbalpha_l, _ismbbgraph, _ismbbgraph_l, _ismbbkalnum, _ismbbkalnum_l, _ismbbkana, _ismbbkana_l, _ismbbkprint, _ismbbkprint_l, _ismbbkpunct, _ismbbkpunct_l, _ismbblead, _ismbblead_l, _ismbbprint, _ismbbprint_l, _ismbbpunct, _ismbbpunct_l, _ismbbtrail, _ismbbtrail_l, _ismbcalnum, _ismbcalnum_l, _ismbcalpha, _ismbcalpha_l, _ismbcdigit, _ismbcdigit_l, _ismbcgraph, _ismbcgraph_l, _ismbchira, _ismbchira_l, _ismbckata, _ismbckata_l, _ismbcl0, _ismbcl0_l, _ismbcl1, _ismbcl1_l, _ismbcl2, _ismbcl2_l, _ismbclegal, _ismbclegal_l, _ismbclower, _ismbclower_l, _ismbcprint, _ismbcprint_l, _ismbcpunct, _ismbcpunct_l, _ismbcspace, _ismbcspace_l, _ismbcsymbol, _ismbcsymbol_l, _ismbcupper, _ismbcupper_l, _ismbslead, _ismbslead_l, _ismbstrail, _ismbstrail_l, _isspace_l, _isupper_l, _iswalnum_l, _iswalpha_l, _iswcntrl_l, _iswctype_l, _iswdigit_l, _iswgraph_l, _iswlower_l, _iswprint_l, _iswpunct_l, _iswspace_l, _iswupper_l, _iswxdigit_l, _isxdigit_l, _mbcasemap, _mbccpy, _mbccpy_l, _mbccpy_s, _mbccpy_s_l, _mbcjistojms, _mbcjistojms_l, _mbcjmstojis, _mbcjmstojis_l, _mbclen, _mbclen_l, _mbctohira, _mbctohira_l, _mbctokata, _mbctokata_l, _mbctolower, _mbctolower_l, _mbctombb, _mbctombb_l, _mbctoupper, _mbctoupper_l, _mbctype, _mblen_l, _mbsbtype, _mbsbtype_l, _mbscat, _mbscat_s, _mbscat_s_l, _mbschr, _mbschr_l, _mbscmp, _mbscmp_l, _mbscoll, _mbscoll_l, _mbscpy, _mbscpy_s, _mbscpy_s_l, _mbscspn, _mbscspn_l, _mbsdec, _mbsdec_l, _mbsdup, _mbsicmp, _mbsicmp_l, _mbsicoll, _mbsicoll_l, _mbsinc, _mbsinc_l, _mbslen, _mbslen_l, _mbslwr, _mbslwr_l, _mbslwr_s, _mbslwr_s_l, _mbsnbcat, _mbsnbcat_l, _mbsnbcat_s, _mbsnbcat_s_l, _mbsnbcmp, _mbsnbcmp_l, _mbsnbcnt, _mbsnbcnt_l, _mbsnbcoll, _mbsnbcoll_l, _mbsnbcpy, _mbsnbcpy_l, _mbsnbcpy_s, _mbsnbcpy_s_l, _mbsnbicmp, _mbsnbicmp_l, _mbsnbicoll, _mbsnbicoll_l, _mbsnbset, _mbsnbset_l, _mbsnbset_s, _mbsnbset_s_l, _mbsncat, _mbsncat_l, _mbsncat_s, _mbsncat_s_l, _mbsnccnt, _mbsnccnt_l, _mbsncmp, _mbsncmp_l, _mbsncoll, _mbsncoll_l, _mbsncpy, _mbsncpy_l, _mbsncpy_s, _mbsncpy_s_l, _mbsnextc, _mbsnextc_l, _mbsnicmp, _mbsnicmp_l, _mbsnicoll, _mbsnicoll_l, _mbsninc, _mbsninc_l, _mbsnlen, _mbsnlen_l, _mbsnset, _mbsnset_l, _mbsnset_s, _mbsnset_s_l, _mbspbrk, _mbspbrk_l, _mbsrchr, _mbsrchr_l, _mbsrev, _mbsrev_l, _mbsset, _mbsset_l, _mbsset_s, _mbsset_s_l, _mbsspn, _mbsspn_l, _mbsspnp, _mbsspnp_l, _mbsstr, _mbsstr_l, _mbstok, _mbstok_l, _mbstok_s, _mbstok_s_l, _mbstowcs_l, _mbstowcs_s_l, _mbstrlen, _mbstrlen_l, _mbstrnlen, _mbstrnlen_l, _mbsupr, _mbsupr_l, _mbsupr_s, _mbsupr_s_l, _mbtowc_l, _strcmpi, _strcoll_l, _strdate, _strdate_s, _strdup, _strdup_dbg, _strerror, _strerror_s, _stricmp, _stricmp_l, _stricoll, _stricoll_l, _strlwr, _strlwr_l, _strlwr_s, _strlwr_s_l, _strncoll, _strncoll_l, _strnicmp, _strnicmp_l, _strnicoll, _strnicoll_l, _strnset, _strnset_s, _strrev, _strset, _strset_s, _strtime, _strtime_s, _strtod_l, _strtoi64, _strtoi64_l, _strtol_l, _strtoui64, _strtoui64_l, _strtoul_l, _strupr, _strupr_l, _strupr_s, _strupr_s_l, _strxfrm_l, _tolower, _tolower_l, _toupper, _toupper_l, _towlower_l, _towupper_l, isalnum, isalpha, iscntrl, isdigit, isgraph, isleadbyte, islower, isprint, ispunct, isspace, isupper, iswalnum, iswalpha, iswascii, iswcntrl, iswctype, iswdigit, iswgraph, iswlower, iswprint, iswpunct, iswspace, iswupper, iswxdigit, isxdigit, strcat, strcat_s, strchr, strcmp, strcoll, strcpy, strcpy_s, strcspn, strerror, strerror_s, strftime, strlen, strncat, strncat_s, strncmp, strncpy, strncpy_s, strnlen, strpbrk, strrchr, strspn, strstr, strtod, strtok, strtok_s, strtol, strtoul, strxfrm, wcscat, wcscat_s, wcschr, wcscmp, wcscoll, wcscpy, wcscpy_s, wcscspn, wcsftime, wcslen, wcsncat, wcsncat_s, wcsncmp, wcsncpy, wcsncpy_s, wcsnlen, wcspbrk, wcsrchr, wcsrtombs, wcsrtombs_s, wcsspn, wcsstr, wcstod, wcstok, wcstok_s, wcstol, wcstombs, wcstombs_s, wcstoul, wcsxfrm, SysAllocString, SysAllocStringByteLen, SysAllocStringLen, SysFreeString, SysReAllocString, SysReAllocStringLen, SysReleaseString, SysStringByteLen, SysStringLen, ToAscii, ToAsciiEx, ToUnicode, ToUnicodeEx, WCSToMBEx