Monitoring clipboard – a quick antisandbox trick

Many existing anti-sandbox tricks rely on using timers, detecting mouse movement, checking the presence of the security tools, detecting virtualization, etc. While the list of existing tricks is long I don’t recall seeing clipboard monitoring being mentioned in this context and was curious if anyone discussed that before. Quick google search didn’t bring any results so I thought I will at least describe a high-level idea (FWIW most of the stuff I found online refers to malware monitoring clipboard in order to steal data that is copied to it – this includes an in-depth post by Michael Ligh who discusses it in a context of Volatility framework.)

Btw. if you know any malware that is already using this trick it would be great if you could let me know. Thanks!

As per Microsoft, there are three ways to check if the clipboard content has changed; all of them rely on using dedicated APIs + in some cases require processing of window messages:

  • Monitoring GetClipboardSequenceNumber return value changes
  • AddClipboardFormatListener + WM_CLIPBOARDUPDATE message
  • SetClipboardViewer + WM_DRAWCLIPBOARD message

There are at least two ways to incorporate these functions in an anti-sandbox routine:

  • One can use GetClipboardSequenceNumber API in a way similar to rdstc / GetTickCount trick and stall the code execution until a decent number of clipboard changes occurred (under assumption that the real person is actually using the system and CTRL+C/CTRL+V will generate enough changes to trigger the payload)
  • Using AddClipboardFormatListener / SetClipboardViewer will require creation of a worker window that will need to respond to the respective clipboard change window messages and when they arrive, the program can increase the internal counter until the threshold is met; only then execute the payload

Both are very easy to implement, and I won’t be providing a PoC code as you can grab it from MSDN and/or popular coding forums.

So, if you write sandboxes you may consider monitoring use of these APIs and trigger appropriate playbook that will generate a sequence of clipboard changes to trigger the code execution.

It’s good to mention that all of these APIs have their Nt equivalents that are processed by the win32u.dll/win32kfull.sys:

  • NtUserGetClipboardSequenceNumber
  • NtUserAddClipboardFormatListener
  • NtUserSetClipboardViewer

So may be worth monitoring them on this level too.

What can you do with 250K sandbox reports?

I was recently asked about the data I released for the New Year celebration. The question was: okay, what can I do with all this alleged goodness?

Well…

For starters, this is the first time (at least to my knowledge) someone dumped 250K reports of sandboxed samples. The reports are not perfect, but can help you to understand the execution flow for many malware (and in more general terms: software) samples.

What does it mean in practice?

Let’s have a look…

Say you want to see all the possible driver names that these 250k include.

Why would you need that?

This could tell you what anti-analysis tricks malware samples use, what device names are used to fingerprint the OS. Perhaps some of these devices are not even documented yet!

grep -iE \\\\\.\\ Sandbox_250k_logs_Happy_New_Year_2018

gives you this:

You can play around with the output, but writing a perl/python script to extract these is probably a better idea.

Okay, what about the most popular function resolved using the GetProcAddress API?

Something like this could help:

grep -iE API::GetProcAddress Sandbox_250k_logs_Happy_New_Year_2018 | 
cut -d: -f3 | cut -d, -f2 | cut -d= -f2 | cut -d) -f1

This will give you a list of all APIs:

We can save the result to a file by redirecting the output of that command to e.g. ‘gpa.txt’:

grep -iE API::GetProcAddress Sandbox_250k_logs_Happy_New_Year_2018 | 
cut -d: -f3 | cut -d, -f2 | cut -d= -f2 | cut -d) -f1 > gpa.txt

This will take a while.

You can now sort it:

sort gpa.txt > gpa.txt.s

The resulting file gpa.txt.s can be then further analyzed – sorting by number of API occurrence, then sort the results in a descending order showing the most popular APIs:

cat gpa.txt.s | uniq -c | sort -r | more

All the above commands could be combined into a one, single ‘caterpillar’, but using intermediate files is sometimes handy. It facilitates further searcher later on… It also speeds things up.

Coming back to our last query, we could inquire for all APIs that include ‘Reg’ prefix/infix/suffix – this can give us some rough idea of what popular Registry APIs are resolved the most frequently:

cat gpa.txt.s | uniq -c | sort -r | grep -E "Reg" | more

How would you interpret the results?

There are some FPs there e.g. GetThemeBackgroundRegion, but it’s not a big deal. ANSI APIs (these with the ‘A’ at the end) are still more popular than the Unicode ones (more precisely, ‘Wide’ ones, with the ‘W’ at the end). Or, … the dataset we have at hand is biased towards older samples that were compiled w/o Unicode in mind. So… be careful… interpretation is very biased really.

But see? This is all an open book!

Again, want to emphasize that all the searches can be done in many ways.  It’s also possible you will find some flaws in my queries. It’s OK. This is a data for playing around!

Now, imagine you want to see all the DELPHI APIs that we intercepted:

grep -iE Delphi:: Sandbox_250k_logs_Happy_New_Year_2018 | more

or, all inline functions from Visual C++:

grep -iE VC:: Sandbox_250k_logs_Happy_New_Year_2018 | more

or, all the rows with the ‘http://’ in it (highlighting possible URLs):

grep -iE http:// Sandbox_250k_logs_Happy_New_Year_2018 | more

You can also see what debug strings samples send:

grep -iE API::OutputDebugString Sandbox_250k_logs_Happy_New_Year_2018 | more

You can check what values are used by the Sleep functions:

grep -iE API::Sleep Sandbox_250k_logs_Happy_New_Year_2018 | more

and Windows searched by Anti-AV/Anti-analysis tools:

grep -iE API::FindWindow Sandbox_250k_logs_Happy_New_Year_2018 | more

etc. etc.

The sky is the limit.

You can look at the beginning of the every single sample and identify the ‘dynamic’ flow of the WinMain procedure for many different compilers, discover various environment variables used by different samples, cluster APIs from specific libraries, observe techniques like process hollowing, observe the distribution of WriteProcessMemory to understand how many sample use a RunPE for code injection and execution, and how many rely on Position-independent-code (PIC), you can see what startup points are the most frequently used (it’s not always HKCU\…\Run!) , what mechanisms are used to launch code in a foreign/process (e.g. RtlCreateUserThread, APC functions), how many processes are suspended before code is injected to them (CREATE_SUSPENDED), etc. etc.

Again… this data can remain a dead data, or you can make it alive by being creative and mining it in any possible way…

If you have any questions feel free to DM me on Twiter, or ping me directly via email.

Note: commercial use of this data is prohibited; I only mention it, because not only it’s most likely temping to abuse it, but you may be actually better off using a different data set. If you want to use it commercially I could provide you with 1.6M Unicode-based reports for analysis with more details included. Get in touch to find out more 🙂