Enter Sandbox – part 4: In search for Deus Ex Machina

June 12, 2015 in Batch Analysis, Clustering, Malware Analysis, Sandboxing

When we talk about sandboxing we can’t avoid talking about its limitations. The first thing that comes to our mind in this context is usually evasion (or evasions really since there are so many of them), but it’s actually not the most important part. The important (and depressing) part is that sandboxes are actually nearly identical with antivirus in many aspects. They share the very same, flawed concept of relying on a hope that detection of badness can be somehow codified. They are reactive even if it is masked under the depth of analysis they can provide and their ability to actually “see” what a given sample is doing. And mind you, I am not making a point here to bash on sandboxes (they are extremely useful), but about being _realistic_ with regards to what they can and cannot do. If you buy one, you need to make an informed decision.

Let me explain.

First, a bit of history. In 2007 or so I was responsible for processing large volumes of samples coming from a new, behavioral engine implemented by my employer at that time. My first reaction to this avalanche of suspicious files was that there is really a lot of crap out there that I would never imagine existed. I bet this is the first thought of anyone who ever had to deal with large quantities of samples and… it gets worse. These samples I was getting were samples that were already somehow _highlighted_ by the engine as suspicious.

Who knows what else is out there.

Let’s face it. There are gazillions of files out there that defy logic, assumptions, and fool your parsers and rules. And for any million of samples that you have just ‘covered’ by your new engine update there is another million of… people doing some weird stuff. They are either coding their legitimate apps in some very unique, creative way and coming up with some “clever” software choices, or intentionally trying to obfuscate and break stuff to make your life difficult (as a malware analyst, let alone automated systems).

It is one thing to sandbox a sample and see what it does [a.k.a. manually review the report], it is another to automatically decide whether it is good or bad. In my first attempts, I started looking for patterns on a file-level [static analysis]. Obviously, it didn’t go very far thanks to wrappers, protectors, and packers of any sort (often used by legitimate apps as well). I remember I went as far as implementing something that was a primitive version of a fuzzy file comparison based on specifically created rules for dedicated families (e.g. if the only difference between 2 files was just an URL, or small area of config, then my rule would calculate hash from the file excluding this area and then mark it as identical, if such hash was already ‘seen’). I later got a help from a very talented developer who took these ideas much further (he was a much better coder than I am) and he added a lot of interesting ‘detection’ features to the ‘samples sorting’ script, but at the end of the day we both felt that it was a pretty mundane work. Yes, it worked for many stupid installers and samples – and it does work even today, but it yet has to be proven as a reliable decision maker.

Static analysis on a file format level can’t take us very far. So I started running the samples in my own primitive sandbox and the resulting log was helping me to cherry-pick similar actions carried out by various samples. This definitely improved the ‘family’ or ‘outbreak’ detections and I can even claim some successes there, but it was not even close to be strong enough to clearly distinguish between good and evil. I was expanding on it further and further and started defining rules that would flag samples for further review. To give you an example: if CreateProcess/WriteProcessMemory/CreateRemoteThread happen then flag it as potentially bad (code injection). More and more rules, and more and more ambiguity.

Like many clever researchers who constantly prove that AV is a piece of mierda, these samples were doing it to my efforts w/o any security-related research done. They were already out there, often for many years and it’s just the fact I have not seen it before made my life miserable, because I had to catch up. And catching up with all this requires a lot of resources.

To conclude: neither sandbox nor AV can provide enough insight (even if sandbox takes us further) to provide a good decision. Maybe it’s just that detecting bad stuff is really a terrible idea. While I am not the biggest fan of whitelisting, there are moments when I think that going totalitarian on all unapproved software and files is really the only way to go. Perhaps there is no room for democracy in security industry.

So, I spent a couple of paragraphs talking about limitations of sandbox that comes as a natural consequence of ingenuity of software developers (whether on a bad or good side of the fence, it doesn’t matter) and without mentioning a single evasion. Even if I didn’t think of it and would never admit it at that time, efficient sandboxing cannot co-exist w/o actually creating behavioral signatures the very same way as you do in your regular av work. Find the pattern, codify it, move on. That’s why I am now returning to state what I said earlier: sandboxes are actually nearly identical with antivirus in many aspects. Yes, they have some advantages, but it’s naive to see it as a solution to all the problems. It’s just yet another security control out there. And it is often bypassed – the funniest part is that is probably more often done incidentally and unpurposely than by relying on ‘anti-sandbox’ tricks. I will come back to this topic in the future.

Lots of babbling requires some specific examples to highlight the issues I was talking about:

  • static analysis fail not only because of wrappers and protectors, but also because executable file properties are inspected during run-time and affecting them during run-time will lead to code paths that are established only when the code is actually running
  • dynamic analysis is limited by the business logic of the application running; to date, most of it is an idiot logic relying on ‘get there fast asap’ and they do all the malicious stuff w/o much thinking; occasional evasions are just a distraction from a general trend
  • a couple of non-evasive ideas that break most of the sandboxes (and reversers ;)); these ideas are all parts of legitimate  software you can find everywhere:
    • command line aguments
    • give an application any sense of interactivity and it fools every single sandbox on the planet
    • using any proprietary UI framework kills all autoclickers
    • using non-English language in your application will (with a few exceptions) instantly confuse any western reverser and also kill the autoclickers
    • APIs relying on ANSI code pages make life difficult if combined with the usage of non-English languages; guessing which ANSI code is used is not easy and requires a dedicated engine
    • Non-latin alphabets is an instant kill to many reversers
    • Scripting languages are hard to cope with properly (monitoring native functions won’t help on such a high level) f.ex. AutoIt
    • There are legitimate cases for injecting data to a child process using memory writing functions (one is e.g. writing a copy of environment block to a child process’ memory)
    • There are gazillion versions of libraries used by software – they are often compiled from the very same (or slightly modified) source code, but with various options – hard to distinguish (in a generic way) whether they handle ANSI or Unicode; and attempts to intercept inline functions require dedicated signatures (note that compilation may turn them to many different forms e.g. optimized code /in many ways, depending on options/, targeting a specific processor, architecture, etc.), using various versions of the same compiler may also produce different results

Comments are closed.