Enter Sandbox – part 18: How to tell the story + some thoughts about sandbox 2.0

Many sandbox vendors use various function call interception techniques. Their marketing brochures often highlight why their technology stack is better than the one used by the other products. One of the most common arguments are:

  • using bare metal is better than emulation or guest OS as it can evade anti-* tricks
  • kernel mode interception is better than the user-mode, because it ‘sees’ it all
  • solutions that don’t influence the test environment and tested programs are better than those that do (since the programs that modify the environment can be used to detect the test environment)
  • static ‘hooking’ points are better than dynamic ones as they focus on very specific areas of activity (this one I made up, but I want to highlight the point that static hooking is very 200x)
  • etc.

They rarely focus on the audience though – i.e. guys who are actually reading these reports on almost daily basis. Usability – an aspect often neglected in the past – is now becoming really important.

I like to think of a perfect sandbox as one that actually tells the best story, is open to many audiences, and the best ones can drill into the nuances, and support more advanced analysts, when asked to.

One that covers a combination of system service call hooking, API hooking, COM hooking, inline signatures to hook Delphi calls and many other popular functions that – while not being exported as APIs via DLLs – are easily recognizable as they are simple static version of popular and often (re-) used code. Add to it known anti- tricks. Add to it support for all the classes of VB language (VB, VBA, VBS), internal APIs used by popular installers, APIs exported by popular libraries (sqlite, zlib, openssl, etc.), support localization, snapshots and differential comparisons (before and after execution), signature scans, memory analysis, etc and you can get a really nice output that can cater for many tastes and needs. Add to it parallel execution on 2 or more distinctive systems (e.g. XP, Win7, Win10, or systems with the network disabled vs. enabled), post-processing with cross-system diff, and you can get a far richer output than from just a single OS tested that has an easy access to the network… Add to it ability to preserve dumped PE modules, memory snapshots, and… provide the interactivity, and we are on a good way to the Sandbox 2.0.

I am actually a big fan of modifying the environment in which that test sample runs. I think it’s necessary and in many cases – inevitable. As long as we want the output to be the most useful. Hiding away from it and observing stuff from a kernel level makes us lose a lot of very interesting, contextual information and make a lot of interceptions much harder.

Let me give you a couple of examples:

  • There are classes of APIs that return user-mode pointers, which need to be intercepted within the process space dynamically the moment they become available e.g.
    • COM interfaces
    • Call backs for many functions e.g. timers, some winsock functions, enumeration functions, etc.
    • Inline functions returning pointers, structures
  • There are many wrapper libraries that require hooking of the library functions that go even higher level that standard OS DLLs
    • the best example are C functions e.g. fopen, fseek, etc.
    • since they use internal system to track handles you need to be able to track the files they open, and their mapping to both internal handles, and ones provided by Win API
    • this may be pretty hard on a kernel level, because the mapping system may change over time and per sample to sample (since these are internal structures that may change between compiler versions)
    • it is often much easier to obtain a handle of a file by calling an existing user mode API: _get_osfhandle in a context of a monitored process
  • Live and dynamic patching may be a bit tricky from a kernel level (it’s possible, but keeping that logic outside the VM, emulator, etc. is not very easy to manage).
  • AutoClickers… you can’t run away from it if you want to handle GUI programs (e.g. installers)
  • Taking screenshots is also easier from a user mode component

Coming back to the Sandbox 2.0 vision.

Today’s sandboxes are a total mess. Most of them go for easy, low hanging fruits that make life of sandbox analysts a hell.

I think the fundamental issue is that there exists a solid misconception about who is using these sandbox results. Apart from full-automation, sandbox-based products (e.g. all mail attachments executed inline via a sandbox before delivered to the user, if non-malicious), there is a growing number of junior SOC analysts that actually use these products on daily basis. When they see lots of information, often contradicting itself, let alone crazy number of signature- and reputation-based claims they just… start guessing. This is not good for the security industry.

Again, let me provide an example (it’s made up, but I witnessed exactly same scenario for a different .exe) …

If you submit the good ol’ rar.exe, you will see that some of the sandboxes claim it’s definitely clean, because it’s whitelisted. Yet, some may also include a conflicting information that the file has been seen in a correlation with some malicious files. Bad guys often use rar.exe and bundle it with actual, real malware or hacking tools. As a side-effect of this activity they game reputation systems by skewing stats for the rar.exe. Junior analysts seeing such correlation get fooled and assume the rar.exe file is actually bad! I kid you not. If it is 100% clean, why not state it in just a single sentence ‘this file is CLEAN’?

I think there is a great need for more output scenarios from sandboxes. And also, more accountability. It’s no longer enough to just drop a 100 MB XML/JSON output at analysts and expect them to draw their own conclusions. There should be different outputs, targeting a different audience. Things that are 100% bad and common things that are mislabeled as bad (IsDebuggerPresent _is_ not really a malicious API c’mon…, same goes for a bunch or a combination of Registry, WinSock, etc. functions).

Static, all-inclusive and contextless results need to got through a proper cosmetic surgery.

And back to the topic… how to tell the story?

  • For automated processing – output 0 or 1 (block: no/yes)
  • For Junior Analysts – output ‘bad’, ‘good’, ‘ask senior engineer for help’
  • For Senior Analysts – output high-level flow of events so that they can quickly read through the output and understand what the program is doing; an example is provided below; also, let them dig deeper – provide lower-level output, allow them to download files, memory dumps, pcaps, etc. – they will be able to make a call based on all this info
  • For Senior Management – please don’t…

And really… API Monitors allow to exclude API calls that originate from the common user mode libraries for like 10-15 years. Please add this functionality to the sandboxes. Who cares that msctf.dll is creating mutexes prefixed with CTF. It’s a BAU. It’s NORMAL. Add filters based on data stacking at least… please.

And last, but not least. Make these stories readable. API log is hard to read. If you add a little bit of a narrative even the most junior analysts can walk through it and pick up some bits.

Example for a narrated API log on a program level (excluding calls from common OS libraries) is shown below; you can almost ‘see’ the calculator interface being built here… button after button.  Isn’t that cool? This is the closest one can get to generating an automated, dynamic reverse engineering output similar to one could obtain by manual inspection of the program under a user-mode debugger, and after many hours of analysis. If sandbox can produce such output within a few secs, we now enter a discussion about ROI-oriented sandbox analysis beating any malware analysts out there. At least for preliminary analysis, and supporting further digging. So much time saved!

Sandbox 3.0 will be able to decompile a piece of code and map these calls to the pseudocode of the program and show the dynamic calls as it walks through that pseudocode in an interactive session. Because why not. We will get there pretty soon.

See Log1

The very same example, when drilled down to show NT APIs as well (just a few first lines) – this is so much more unreadable, because of additional, noisy OS-library-driven calls:

See Log2

Requiem for the infosec of 90s and 2000s

Browsing through the results of my recent GDPR experiment took my mind away from the original idea.

Why?

Well…

Once I got quite a batch of PNG files from many web sites I started manually browsing through the results to ensure the script worked okay, and that I could see the actual content (screenshots) and can assess/judge it.

WEB A.D. 2018

After clicking through a lot of snapshots from what appears to be the top of the top most visited domains on the web a pattern started to emerge. And it’s actually a sad one…

Most of the web sites belonged to a few categories only:

  • advertising
  • advanced advertising
  • advanced vertical AI-driven video revenue-enhancing sustainable predictive advertising monetization platforms

The rest are your usual suspects:

  • multi-billion large companies: IT, media, etc.
  • OS, browsers, some software in general, lots of mobile apps
  • VPN, video conferencing, security companies
  • financial companies
  • games, porn

and…

  • 403s, 404s, captchas, some web sites don’t render and report error, some even block me 🙂

plus

  • real nuggets – web sites not updated for ~10 years of more; it’s almost nostalgic to see them still hanging there

The majority of web sites come in English, Russian, Chinese… other languages are scarce. Most of the web sites look pretty much the same, uniform in the looks and content. I may be old-fashioned, but web of 2018 doesn’t look as exciting as it was back in 90s and 2000s.

Now… for the really sad part.

Where are we now?

It is obvious that Web has changed, became monopolized and uniform. But the saddest thing to me is that the most prevalent theme of the screenshots, and one that pretty much drives the narrative of this post, is just these two words:

  • monetization
  • tracking

Some boring personal take

I remember when in the early naughties I joined, and eventually ended up leading a team created specifically to target adware, spyware and trackware. Our mission was simple: combat a plague of adware, spyware, rogue software, dodgy installers, dodgy web sites, and dodgy affiliate programs. The focus was primarily on the desktop software. That war was kinda won at some stage when the number of adware, spyware, fake/rogue antispyware applications dropped, and lots of adware sites ceased to exist, plus lots of dodgy software was killed (remember Bonzi Buddy, Dollar Revenue, or Klik team?).

Then along came the APT, and introduced a dramatic shift of focus for many security companies. The money you can make on APT is much bigger than the old-school malware, or spyware. The ‘low’ flying threats stopped being really attended to, and as a result the quiet progression of this ‘branch’ of dodgy industries stayed kinda under the radar. The ‘low’ priority stuff authors regrouped and developed in many directions taking into account changes in hardware and human behavior associated with browsing the internet. Obviously, smartphones and tablets acted as a huge influencer and a catalyst for software houses (including both good and bad ones) that took content and application development that was traditionally happening on the desktop/laptop computers to the web and/or smartphone apps.

Enter 2018 and the majority of web, apps, and social media sites track our every moment, build our profiles and deliver us the ‘customized’ content a.k.a. ads, or attempt to manipulate us to buy freemium content. These apps are almost always online for NO apparent reason at all.

Over last 10 years creators of traditional desktop OSs and software had to think quickly on how to adapt to the true game changer that iPhone certainly was. And to generate a revenue from a more and more challenging market. They learned from the failures of adware/spyware and took it to a next level. So now it’s all legally sound, EULA is there, and the legal team is big and can fight the world… yet… we still see the same old, same old… e.g. an opt-in as a default.

The plague affects not only application layer, but also the OS, many reputable (in 2000s) software-downloading web sites, freeware turned downware/bundleware, etc. If you are looking for examples: we know Windows 8 and 10 produce lots of telemetry, they also install potentially unwanted apps by default, and communicate with lots of servers very frequently and w/o us knowing what is really being sent out. We also know iOS collects a lot of data that both the Apple and apps can leverage to deliver better ‘content’ and ‘experience’ (e.g. controversial at some stage health data). And there are obviously apps on the smartphones – a completely different area with a lot of ‘crazy’ privacy issues around it.

We very rapidly entered the world of portable supercomputers, got surrounded by myriads of sensors and every single thing that we do is already, or can quickly be, collected, classified, matched against databases, stored, processed, and… sold. And guess what… this is not necessary a bad thing per se, lots of benefits there, and I am actually not preaching against it. I just want to know about it all. ‘It’ being every single bit that is being sent out. As a user I want to be empowered to inspect ‘it’ (in general terms e.g. by looking at raw data, logs, etc. and including any means of automation e.g. via a dedicated blocking software). And yes, I will actually allow some of it to fly. But I want to be the one who makes that call…

Basically, we lost sight of what is being transmitted out ages ago, and then… it got far worse.

What we need?

We need a DATA FIREWALL. We need data models that protect users. We need accountability.

Just think of it for a moment and on a more generic level – don’t we want to inspect every software that sends data out? A software doing an update check? An editor? A CAD application? A new version of your favorite game on smartphone? What does it send out? What about the metadata your AV or EDR is sending out? Lists of metadata feeding large databases to help with the ‘AI’ that is often backed by a simple reputation check? Look at your logs. You may see GBs, TBs of data. No one knows. So… if you are a bored researcher this is a great area for analysis!

Back to the data.  It goes out. No one knows what’s in it.

Shouldn’t there be some common protocol for this sort of stuff?

Other areas we should be looking at

The threat landscape has evolved, its categorization matured, but also got really blurred. And, while perhaps a bit general, the ‘monetization’ category itself, the one that drove the first ‘malware for money’ campaigns is still not a part of the popular threat frameworks!

Same goes for (often) unauthorized / clandestine content collection / aggregation by software that is harvesting data from the systems where the software is simply running. Legitimate or not, it’s just not right w/o users having a right to oversee it. Antivirus companies at least categorize many of these as PUAs, but it’s not enough.

We don’t need APT-only frameworks. We need all-inclusive frameworks.

Where are we now #2?

Add to it the fact that most of the software now can’t even run or be installed offline, or be trusted*** and we now operate in a panopticonish environment where we quickly progress towards the state where NONE of the decisions about hardware and software we purchase (or download for ‘free’) belong to us!!!

Since we allow it, there is no issue. Right?

GDPR is one thing, the real world is another – w/o some sort of (I can’t believe I am saying this) blockchain-like technology to account for any data operation – the accountability supported by the hardware, software, and the network layer, I think we are going to see big breaches for many years ahead.

*** Think of Supply chain attacks for starters. Then think of the ‘free’ websites. Admins often download bundled ‘freeware’ and install it using their admin creds; they believe the ‘free’ downloading sites so much that they can’t fathom the idea that they are voluntarily running PUAs with admin privileges! Recent infosec drama around Filezilla is a good example of ‘free’ going possibly rogue. You CAN’T trust. You MUST verify before you install. Talk to your CSIRT guys to validate the downloads for you!

There is a lot of discussions around the ‘how to sell’ CSIRT to the board of directors. What value does the team add? Being a cost centre is always problematic. However… IMHO we reached the stage where some sort of technical function in the company must be involved in every single decision that involves EXTERNAL code and data sources on a level similar to compliance. Will that be CSIRT? I don’t know, but who else can answer this simple question – is this safe to do/download/run? W/o clearance from that team, I think the risk is not managed well.

Rant du jour

Why there is no new threat categories created to block monetization-oriented behavior, telemetry, OS fingerprinting, any data collecting, storing, and processing? No matter what purpose, legitimate or not.

Why do we need tools like Destroy-Windows-10-Spying and run them on this new shiny OS?

Why the configuration settings are being dumbed down and the ‘disable this stuff completely’ buttons are harder and harder to find? Often there is no way to disable them at all!

Why do we still see Opt-in boxes ticked ON by default to change your default search provider, browser, etc. It should be Opt-out default, and user able to change it manually.

When did we start to feel uncomfortable updating OS and software? Why I can’t have an option to opt-out from updates. Sounds counter-intuitive, insecure, but not being able to do it puts us in a under-privileged position as a user. There are lots of arguments to patch/update, but there are also many against – there is no silver bullet, and it depends on circumstances (on one occasion I stopped my windows updates as the update was causing my system to BSOD; you can’t force updates on that system, because I won’t be able to work). Giving options to users is perhaps old-fashioned, but still very important!

Why do we need more than ever security researchers to poke around in every new update to OSs and apps? And not to discover the traditional security vulns, but to look for yet another mechanism implemented to bypass/violate our privacy? If the OS, software decides to send stuff out w/o a full understanding of the risk by the user perhaps they latter needs to be protected, by default?

The true blame game question 😉

How come we, the infosec, allowed this to happen?

Some more areas to focus on…

If you browse the internet today the amount of annoyances and nagging that was traditionally very visible on the desktop computer now often passes through security monitors w/o a single question. We do have an option of blocking domains, popups, or using ad blockers, but my point is that we are lacking more web-centered categories or detection/containment security mechanisms. The best illustration of this point are coin miners. One day all security vendors woke up to realize their solutions can’t really fully protect the users from such content. It does not infect, it does not drop malware at all (other than cached files), it is not phishing, yet it is a threat and steals CPU cycles in such a clever way.

And the most important question of it all: where is the AI that allowed this to happen??? 😉

Some final thoughts, and suggestions

The APT is not your only enemy. The focus shift from desktop to web and smartphone apps happened. And this requires more work done on the content analysis, and building APIs that support and enforce certain ways of doing things. The browser or the portable device is literally our second (and sometimes even first) desktop now. Judge what you see, and classify it according to user experience and risk. So that these legal, border-line legal and non-malicious, yet annoying thingies can be also blocked.

It’s things like:

  • PAW – Potentially Annoying Web sites (intrusive UI, sing-up popups, ‘better install the app than use the web site’ nags, nagging chats, autoplaying videos, detection of mouse going to a toolbar, or a specific corner of the screen, and reacting to it to keep the user on site, etc.)
  • Clickbait Farms (e.g. searching for file or threat names leads to lots of them)
  • Paywall web sites (I am fine with them, but if I don’t want to pay, why not removing it from my search results /as an option/)
  • Web sites using clickbait techniques / promoting clickbait content
  • Web sites promoting Fake News
  • Web sites full of affiliate links generated using legitimate, often stolen content and preying on gullible users to make that affiliate click to Amazon, or other large site
  • Web sites offering borderline, inflammatory language or visuals
  • Web sites engaging multiple ad trackers
  • Web sites engaging in interaction tracking
  • Web sites and apps promoting freemium content
  • Web sites using obfuscation techniques in the code (not bad per se, but can be suspicious)
  • Blocking comments on sites e.g. Youtube; it’s actually healthy to do so 🙂
  • etc.
    and of course, good stuff too…
  • Clean web sites (yes, no ads, no freemium, etc.) – this could be enabled by default
  • ‘Well-Behaving’ web sites (ads, but keep it under control since they follow isdustry standards, non-aggressive GUI, etc.)

This is just a dump of ideas – a far better place to start is probably the categorization already offered by popular proxy vendors. The metadata is already there for years.

The freemium model is here to stay. If the app needs to nag the user, let’s recognize it as a valid marketing tool and actually allow the user to be nagged, but the app needs to go via a proper ‘nagging’ API to do so. One that keeps the user in control and allows to set arbitrary rules e.g. user can disable freemium notifications,  manually and automatically (e.g. after 3 nags)! So, if I download the ad-supported app that can be upgraded to premium, I can still have it installed 3 years later w/o seeing a single nagging (while of course still seeing ads that can’t be disabled).

And then we have the GDPR. Wouldn’t it be cool to have a standardized way to DISABLE all the cookies-, and gdpr-related notifications with a bunch of browser or account settings? Let the user decide.

As for the search engines… What about a standardized way to do some post-processing of the search results? I know, Google, Bing, Yandex, etc. are great tools, but wouldn’t be good to have additional level of filtering by security vendors _and_ based not only on the domain blacklist, but also on a content blacklist – on the browsing device itself? Not to kill the ads, but to kill obvious sites that are PAWs shown in the results (these that search engine providers don’t feel like, or may not be empowered to remove).

And so that I don’t forget – the trackers (web bugs) from popular social media web sites do need a dedicated category as well… so that they can be safely blocked. Just because I visit the site I don’t want F, T, etc. to track all the info on what led me to enter the site and track my interaction with the GUI.

Lots of it is already possible on a proxy level, using noscript, ad blockers, etc. but we need that on the very basic random user level as well. And enabled by default, or if not possible, the user can be guided to to do so in a option-safe environment (also, if protection opt-in invoked by the user such action potentially protects the vendor from legal issues /I am not a lawyer, so this needs to be evaluated/).

I think the game of 90s and 2000s was kinda easy as it was about plain-vanilla criminals, and individual small companies that were simply dodgy, and also affiliate programs that were too obviously ‘ill-centered’ and very transparent, and content back then was very easy to classify.

Today it’s much harder: big vendors are untouchable. They incorporated lots of old adware and tracking tactics into their products and everyone agrees to it, because otherwise you won’t get the greatest, latest version of the OS, or software.  I think it’s not what users want. At least some of them. I hope.

Last words

So… if you are a security vendor, there is lots of new territories to conquer. Lots of new products can be developed and sold. Perhaps we can come back to 90s, 2000s for a moment and actually think of the user for just a moment.

Update 2019-01: I recently came across this article that paints a future that is even more gloomy and dramatic…