You are browsing the archive for Sandboxing.

Enter Sandbox part 25: How to get into argument

June 11, 2019 in File Formats ZOO, Malware Analysis, Sandboxing

When you begin your programming career one of the first lessons focuses on reading command line arguments. It is very trivial, but when you start coding more and in new languages you will quickly discover that it’s actually less than trivial and a bit of a mess.

Programming languages use many different ways to access the command line arguments, e.g.:

  • argv
  • wargv
  • args
  • $argv
  • @ARGV
  • arg
  • sys.argv
  • ParamStr
  • Command$
  • WScript.Arguments
  • etc.

I can’t count how many times I googled proper name/syntax for these over the years – ad hoc programming in different languages makes it quite difficult to remember. Also, some programming languages start indexing of arguments from 0, some from 1.

A way to access these parameters also differs. Sometimes you have it available as a string, an array, sometimes you need to call a function to retrieve specific items for you, and in some cases you need to write your own parser or tokenizer.

And finally, some frameworks require certain (standard) approach to passing arguments so that a (standard) parsing routine can extract them properly. Then there are quirks – paths with spaces, extra spaces, ANSI, Unicode characters, and you have two buffers available for parsing – a path to actual executable, and its command line. And the first is not always a full path, or is a path expressed in a different way than expected.

It gets even more complicated when you start reversing. This time it’s not only programming languages per se, but also the binaries they produce and these differ depending on architecture, OS, compiler’s flavor, version, optimization settings. It is all very messy.

Grepping a repo of import function names I came up with this short list of APIs & external, or internal symbols/variables:

  • CommandArgs
  • CommandLineToArgvW
  • GetCommandLineA
  • GetCommandLineW
  • g_shell_parse_argv
  • osl_getCommandArg
  • osl_getCommandArgCount
  • rb_argv
  • StringToArgv
  • _acmdln
  • _wcmdln
  • __argc
  • __argv
  • __p__acmdln
  • __p__wcmdln
  • __p___argc
  • __p___argv
  • __p___wargv
  • __wargv

Why would we need these?

Many programs require command line arguments to run. Sandboxes that can’t recognize these will fail to produce an accurate report. Not only some malware is using this trick on purpose, there are also tones of good programs that end up in sandbox repositories and never get properly analyzed (e.g. compiled work from students of IT, or native OS binaries)

Sandboxes that recognize programming frameworks & the way they parse command line arguments are in a better position to analyze such samples. This is because there is at least a theoretical possibility of heuristic determination if a sample require command arguments, or, if it accepts any. At the very least, they should hint that in their reports.

There are some command line arguments that are universal and can be guessed e.g. /? or /h. Others require a lot of reversing since program’s logic is often hidden under many layers of code and nested calls.

What kind of heuristics we can come up with?

For instance, if an API called immediately after GetCommandLine is ExitProcess then the chances are this program requires command line arguments.

If we can determine location and internal layout of WinMain or main functions and then also of an argc variable (using e.g. signatures, hooking, or emulation, or by monitoring stack), we can attempt to trace the access to this variable. When access is detected we can try to analyze code that is using the variable’s value. If our sample exits almost immediately after this comparison the program most likely is requiring command line arguments.

Other possibilities could involve:

  • monitoring of dedicated parsing routines, e.g. getopt function, but also many inline functions that are embedded in popular frameworks
  • string detection for popular arguments, e.g. /s, -embedding
  • string detection for help information, e.g.: usage:
  • detection of installer type, version (they usually accept some command line arguments that are predefined)
  • fuzzy comparison against known files (if we know sample X required command line arguments, chances are that a similar file will too)
  • ‘reverse proof’ of no CLI requirement
    • if it calls GUI functions then less likely to wait for arguments (but may still accept them)
    • if it is an installer, then we typically know how to handle it (e.g. using clickers)
    • if it is a driver – no command line arguments
    • if it is a DLL, most likely no command line processing (BUT some of the exported functions do rely on command line arguments!)
  • etc.

Overall this is a non-trivial task and there are very poor chances of offering a generic solution here, but it is a good idea to at least flag the file for manual analysis. Either in-house or in a report for client.

Enter Sandbox part 24: Intercepting Buffers #3 – The Punto H & magic points

January 19, 2019 in Archaeology, Batch Analysis, Sandboxing

I mentioned that monitoring buffers is the key to quickly understand the software inner workings. It doesn’t work all the time, but in majority of cases it does. More so, in ‘desperately’ challenging cases it may help to gain access to the internals of a highly obfuscated code, sometimes even virtualized, and may help to understand large, bulky programs that are really hard to analyze using ‘static’ tools.

Now, we are so used to primarily monitor APIs, and the buffers that these APIs handle, that we often forget that there are many additional places where the monitoring could take place.

I listed a lot of examples in the past. And there are always more ideas. Think of it – your sandbox is your baby. You know every single bit of it. You control its existence. You can extract hard-coded addresses for certain functions, or patch some code. You can modify the OS any way you want. You can even replace every OS single file, disable OS anti-tampering code, introduce clever redirections, callbacks – sky is the limit really. It is a controlled environment. Let’s be adventurous with that.

And yes, this is hard, and perhaps sounds like a very abstract idea, but these are many of available possibilities that may actually work well, if applied to modern sandboxes leveraging techniques that typically focus on inspecting the guest system from the outside (as opposed to old API monitors).

You may ask – it all sounds nice, why don’t I provide some more specific example? I am glad you asked. This is the topic of this post.

I personally find the Punto.H / Point.H trick to be one of the best examples of such cleverly placed breakpoints. The trick was developed by a community mainly focused on an art of software cracking and…. very looooooong time ago (the trick is often attributed to Ricardo Narvaja). And yes, it sounds archaic, and it really is.

How does it work?

The old shareware applications usually asked for a serial key. Most of them, especially in the early days, would just ask for a string provided by the user. Once entered, the serial would be retrieved from the UI control (edit box), and would be tested with the program’s serial verification routine. If the serial was OK, program would be reconfigured as ‘registered’.

Shareware programs were very popular back then, but many of them were quite bulky, plus there was no decompilers yet, and it was quite a pain to analyze them. The observant reversers noticed that by intercepting the calls to the internal function called ‘hmemcpy’ they could see all the data being sent between the program UI and its internals. The first letter of the function gave the name to the actual technique: ‘Punto.H’ (since it was very popular among Spanish-speaking crackers, I opted to use the Spanish name in this article, instead of English ‘Point.H’).

So, catching these buffers pretty much was the first step to crack serials. Once you got the buffer, you could track it and eventually reach the actual routine that was processing it. And then, either patch the code to bypass the serial check, or more advanced reversers would write a serial generator, one that would generate strings that the program would accept. It sounds pretty simple, but typically required many hours of work. The Punto.h trick simplified the cracking process a lot.

Again, it’s really different now with regards to software protection, but this technique still illustrates the point: you need to look for good places where you can add breakpoints for monitoring. Punto.h was so popular that even today there are still many plug-ins that implement this technique and its clones, often introducing many other and additional breakpoints for other software platforms, for example: