What can you do with 250K sandbox reports?

January 20, 2018 in Batch Analysis, Clustering, Malware Analysis, Sandboxing

I was recently asked about the data I released for the New Year celebration. The question was: okay, what can I do with all this alleged goodness?


For starters, this is the first time (at least to my knowledge) someone dumped 250K reports of sandboxed samples. The reports are not perfect, but can help you to understand the execution flow for many malware (and in more general terms: software) samples.

What does it mean in practice?

Let’s have a look…

Say you want to see all the possible driver names that these 250k include.

Why would you need that?

This could tell you what anti-analysis tricks malware samples use, what device names are used to fingerprint the OS. Perhaps some of these devices are not even documented yet!

grep -iE \\\\\.\\ Sandbox_250k_logs_Happy_New_Year_2018

gives you this:

You can play around with the output, but writing a perl/python script to extract these is probably a better idea.

Okay, what about the most popular function resolved using the GetProcAddress API?

Something like this could help:

grep -iE API::GetProcAddress Sandbox_250k_logs_Happy_New_Year_2018 | 
cut -d: -f3 | cut -d, -f2 | cut -d= -f2 | cut -d) -f1

This will give you a list of all APIs:

We can save the result to a file by redirecting the output of that command to e.g. ‘gpa.txt’:

grep -iE API::GetProcAddress Sandbox_250k_logs_Happy_New_Year_2018 | 
cut -d: -f3 | cut -d, -f2 | cut -d= -f2 | cut -d) -f1 > gpa.txt

This will take a while.

You can now sort it:

sort gpa.txt > gpa.txt.s

The resulting file gpa.txt.s can be then further analyzed – sorting by number of API occurrence, then sort the results in a descending order showing the most popular APIs:

cat gpa.txt.s | uniq -c | sort -r | more

All the above commands could be combined into a one, single ‘caterpillar’, but using intermediate files is sometimes handy. It facilitates further searcher later on… It also speeds things up.

Coming back to our last query, we could inquire for all APIs that include ‘Reg’ prefix/infix/suffix – this can give us some rough idea of what popular Registry APIs are resolved the most frequently:

cat gpa.txt.s | uniq -c | sort -r | grep -E "Reg" | more

How would you interpret the results?

There are some FPs there e.g. GetThemeBackgroundRegion, but it’s not a big deal. ANSI APIs (these with the ‘A’ at the end) are still more popular than the Unicode ones (more precisely, ‘Wide’ ones, with the ‘W’ at the end). Or, … the dataset we have at hand is biased towards older samples that were compiled w/o Unicode in mind. So… be careful… interpretation is very biased really.

But see? This is all an open book!

Again, want to emphasize that all the searches can be done in many ways.  It’s also possible you will find some flaws in my queries. It’s OK. This is a data for playing around!

Now, imagine you want to see all the DELPHI APIs that we intercepted:

grep -iE Delphi:: Sandbox_250k_logs_Happy_New_Year_2018 | more

or, all inline functions from Visual C++:

grep -iE VC:: Sandbox_250k_logs_Happy_New_Year_2018 | more

or, all the rows with the ‘http://’ in it (highlighting possible URLs):

grep -iE http:// Sandbox_250k_logs_Happy_New_Year_2018 | more

You can also see what debug strings samples send:

grep -iE API::OutputDebugString Sandbox_250k_logs_Happy_New_Year_2018 | more

You can check what values are used by the Sleep functions:

grep -iE API::Sleep Sandbox_250k_logs_Happy_New_Year_2018 | more

and Windows searched by Anti-AV/Anti-analysis tools:

grep -iE API::FindWindow Sandbox_250k_logs_Happy_New_Year_2018 | more

etc. etc.

The sky is the limit.

You can look at the beginning of the every single sample and identify the ‘dynamic’ flow of the WinMain procedure for many different compilers, discover various environment variables used by different samples, cluster APIs from specific libraries, observe techniques like process hollowing, observe the distribution of WriteProcessMemory to understand how many sample use a RunPE for code injection and execution, and how many rely on Position-independent-code (PIC), you can see what startup points are the most frequently used (it’s not always HKCU\…\Run!) , what mechanisms are used to launch code in a foreign/process (e.g. RtlCreateUserThread, APC functions), how many processes are suspended before code is injected to them (CREATE_SUSPENDED), etc. etc.

Again… this data can remain a dead data, or you can make it alive by being creative and mining it in any possible way…

If you have any questions feel free to DM me on Twiter, or ping me directly via email.

Note: commercial use of this data is prohibited; I only mention it, because not only it’s most likely temping to abuse it, but you may be actually better off using a different data set. If you want to use it commercially I could provide you with 1.6M Unicode-based reports for analysis with more details included. Get in touch to find out more 🙂

Yet another way to hide from Sysinternals’ tools, part 1.5

January 19, 2018 in Anti-*, Anti-Forensics, Autostart (Persistence), Compromise Detection, Forensic Analysis, Silly

This little trick can be used to prank your friend more than using it as a real nation-state pwning technique, but it’s worth documenting, as usual, so here it goes…

I mentioned previously the Autoruns program registers the file type HKCR\Autoruns.Logfile.1  / HKCR\.ARN. The file stores the information autoruns grabbed from the system. You can save the autoruns log, and you can load them.

The last bit is the interesting part – if we can force the system to redirect all autoruns instances to one we can control, and also one that will always load the preserved data from the .arn file (instead of loading the fresh data set directly from the system), we will be able to fool the user that the state of the system has not changed.

So… the recipe goes like this:

  • Remove HKCR\Autoruns.Logfile.1 and HKCR\.ARN registry entries
  • Save autoruns.exe as e.g. c:\test\AutoNOruns.exe
  • Run c:\test\AutoNOruns.exe
    • This will create new association for .ARN file in Registry (ones that point to c:\test\AutoNOruns.exe)
    • This will also enumerate all autoruns entries on the system
      • Save these results to e.g. c:\test\AutoNOruns.arn
  • Modify registry key
    App Paths\autoruns.exe
    to point to
  • Add some ‘bad’ entry to e.g. HKCU\Run
  • Run autoruns from the terminal, or via Windows+R
  • The new ‘bad’ entry won’t be shown.


  • It takes observable time to load c:\test\AutoNOruns.arn
  • Refreshing the view (F5) will unhide all the ‘hidden’ entries as Autoruns will refresh the view directly from the system
  • Double-clicking autoruns.exe is not routed via App Paths key so autoruns.exe will run properly

So, there you have it. The first Autoruns Rootkit ;)))

It’s superlame and has so many caveats that it’s impossible to treat it seriously, but maybe you will be able to fool someone 🙂