Happy New Year 2018 & Get yourself logs from 250K sandboxed samples

Update 2

Please use this link:

https://mega.nz/#!LItwzAAL!NqcMVEnIqd17x5guL0V55gwjy8Q3xQMuSyeP-DelbRE

Update

Turns out I had a bug in my script and in the first go I exported less than 250K sessions (228K only), so I had to fix the dump and re-upload it. If you downloaded it previously, sorry, you will need to do it one more time 🙂

Thanks to @hrbrmstr for spotting and reporting the issue!

Old post

Happy New Year 2018!

Unless you are one of the companies or organizations doing commercial sample analysis and sandboxing it is almost impossible to get access to normalized data logs from sandboxing sessions. If you want to do analysis you need to either scrap data from the web, or run your own sandbox. In order to fill-in the gap I decided to release logs from 250,000 sandbox sessions.

  • The file contains logs from 250K sandboxed sessions (250K unique samples).
  • 32-bit PEs only. All executed Offline (no access to network).
  • Sometimes it may not be 100% accurate – I ran various sessions, with various settings/timeouts.
  • You’ll find traces of Windows API, NT API, VC and Delphi inline functions, COM, Visual Basic, string functions, Nullsoft APIs, Anti-VM tricks, etc. – and various stuff I discussed or will discuss in the Enter Sandbox series.

Have a look, run some analysis, crunch data – share results.

Link: https://mega.nz/#!XYcnTAyD!VvwOo9JBkqmRNPu5liSusl3tpC0kBpbRT6E8tfOejF0

File sizes (sha1 hashes):

34,515,244,109 Sandbox_250k_logs_Happy_New_Year_2018
               (30010C605B451CEA6483B93B299FA9758747B1DF)
   993,911,182 Sandbox_250k_logs_Happy_New_Year_2018.7z
               (D73FDAAC08B95536FA2702D327C8F3143A9A666C)

Note: This data cannot be used for commercial purposes.

If you like this release, you may also want to re-visit my older data dumps:

File format:

  • The file starts with a short header (easy to spot)
  • Then it’s followed by the ### SAMPLE #<number>
  • Then the actual logs start.
    • The lines start with [PID][TID][ADDRESS]
    • The API groups are prefixed with group prefixes i.e. API::, DELPHI::, VC:: (the latter are referring to inline functions)
    • The parameters are NOT named / structured accordingly to Windows API docs; this is because the log is focused on extracting the most useful information, and avoiding cluttering the log with the useless/unused function arguments (but then even this is only partially true, because this tool was growing organically over the years and was not an orchestrated effort to make OCDs  happy 😉 – if I was about to write it again, obviously it would be perfect 😉

Example:

### SAMPLE #00000001
[1980][252][00422c77]API::GetSystemTimeAsFileTime (lpSystemTimeAsFileTime=0012FFB0)
[1980][252][0041c305]API::GetModuleHandleW (lpModuleName=kernel32.dll)=7C800000
[1980][252][0041c315]API::GetProcAddress (mod=KERNEL32.dll, api=FlsAlloc)=00000000
[1980][252][0041c328]API::GetProcAddress (mod=KERNEL32.dll, api=FlsFree)=00000000
[1980][252][0041c33b]API::GetProcAddress (mod=KERNEL32.dll, api=FlsGetValue)=00000000
[1980][252][0041c34e]API::GetProcAddress (mod=KERNEL32.dll, api=FlsSetValue)=00000000
[1980][252][0041c361]API::GetProcAddress (mod=KERNEL32.dll, api=InitializeCriticalSectionEx)=00000000
[1980][252][0041c374]API::GetProcAddress (mod=KERNEL32.dll, api=CreateSemaphoreExW)=00000000
[1980][252][0041c387]API::GetProcAddress (mod=KERNEL32.dll, api=SetThreadStackGuarantee)=00000000
[1980][252][0041c39a]API::GetProcAddress (mod=KERNEL32.dll, api=CreateThreadpoolTimer)=00000000
[1980][252][0041c3ad]API::GetProcAddress (mod=KERNEL32.dll, api=SetThreadpoolTimer)=00000000
[1980][252][0041c3c0]API::GetProcAddress (mod=KERNEL32.dll, api=WaitForThreadpoolTimerCallbacks)=00000000
[1980][252][0041c3d3]API::GetProcAddress (mod=KERNEL32.dll, api=CloseThreadpoolTimer)=00000000
[1980][252][0041c3e6]API::GetProcAddress (mod=KERNEL32.dll, api=CreateThreadpoolWait)=00000000
[1980][252][0041c3f9]API::GetProcAddress (mod=KERNEL32.dll, api=SetThreadpoolWait)=00000000
[1980][252][0041c40c]API::GetProcAddress (mod=KERNEL32.dll, api=CloseThreadpoolWait)=00000000
[1980][252][0041c41f]API::GetProcAddress (mod=KERNEL32.dll, api=FlushProcessWriteBuffers)=00000000
[1980][252][0041c432]API::GetProcAddress (mod=KERNEL32.dll, api=FreeLibraryWhenCallbackReturns)=00000000
[1980][252][0041c445]API::GetProcAddress (mod=KERNEL32.dll, api=GetCurrentProcessorNumber)=00000000
[1980][252][0041c458]API::GetProcAddress (mod=KERNEL32.dll, api=GetLogicalProcessorInformation)=7C861E6F
[1980][252][0041c46b]API::GetProcAddress (mod=KERNEL32.dll, api=CreateSymbolicLinkW)=00000000
[1980][252][0041c47e]API::GetProcAddress (mod=KERNEL32.dll, api=SetDefaultDllDirectories)=00000000
[1980][252][0041c491]API::GetProcAddress (mod=KERNEL32.dll, api=EnumSystemLocalesEx)=00000000
[1980][252][0041c4a4]API::GetProcAddress (mod=KERNEL32.dll, api=CompareStringEx)=00000000
[1980][252][0041c4b7]API::GetProcAddress (mod=KERNEL32.dll, api=GetDateFormatEx)=00000000
[1980][252][0041c4ca]API::GetProcAddress (mod=KERNEL32.dll, api=GetLocaleInfoEx)=00000000
[1980][252][0041c4dd]API::GetProcAddress (mod=KERNEL32.dll, api=GetTimeFormatEx)=00000000
[1980][252][0041c4f0]API::GetProcAddress (mod=KERNEL32.dll, api=GetUserDefaultLocaleName)=00000000
[1980][252][0041c503]API::GetProcAddress (mod=KERNEL32.dll, api=IsValidLocaleName)=00000000
[1980][252][0041c516]API::GetProcAddress (mod=KERNEL32.dll, api=LCMapStringEx)=00000000
[1980][252][0041c529]API::GetProcAddress (mod=KERNEL32.dll, api=GetCurrentPackageId)=00000000
[1980][252][0041ab40]API::GetCommandLineW = "_0000034AD55817135B1B1C4AE97CD449.exe"
[1980][252][004228f9]API::GetModuleFileNameW (mod=00000000, namebuf=%SYSTEM%\_0000034AD55817135B1B1C4AE97CD449.exe, buflen=260)
[1980][252][00424495]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0042450c]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041be2d]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041bea1]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041bf83]API::WideCharToMultiByte (cp= [000004E4, 1252],fl= [00000000, 0],wide= 
[1980][252][0041be2d]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041bea1]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041bf83]API::WideCharToMultiByte (cp= [000004E4, 1252],fl= [00000000, 0],wide= 
[1980][252][0041c543]API::SetUnhandledExceptionFilter (0042247D)
[1980][252][00401dde]VC::vc_strlen1 (lpString=\/)
[1980][252][0040df7c]API::GetTempPathA (namebuf=C:\DOCUME~1\USERNAME\LOCALS~1\Temp\, buflen=260)

Enter Sandbox – part 15: rE[mn]u[mn]eration games

Observing a malware is one thing. Observing the very same malware in a rich context is another.

The traditional approach to sandboxes focuses on scoring the sample’s badness, extracting IOCs, and not focusing that much on the in-depth analysis. It’s understandable, because in-depth analysis are not the ultimate goal. Still… being able to extract more information that may help with the manual analysis is always welcome. And it’s actually getting better – the competition is slowly changing the landscape and newer sandboxes support memory dumping, PE file rebuilding, show nice process / thread trees, various graphs, etc… and place more and more hooks in place. And then again, even if they intercept the most popular APIs, inline functions, or even intercept virtual tables, it may still not be enough.

I thought, what would happen if I intercepted not only the most popular APIs that are used by malware, but also these that are less-frequently looked at, and in particular, these that may help to understand a flow of events in a better context – enriching the data that sandbox presents and making the in-depth analysis easier.

What are these APIs?

Let me show you an example…

Imagine you intercept the function CreateToolhelp32Snapshot to take a note of the fact that the malware is enumerating processes. This may add to the ‘badness’ weight, but on its own is not a malicious feature per se. Lots of ‘clean’ processes enumerate processes.

What if we not only did that, but also intercepted Process32First and Process32Next?

This could be the result (output is simplified to demo the idea):

CreateToolhelp32Snapshot
Process32First: [System Process]
Process32Next: System
Process32Next: smss.exe
Process32Next: csrss.exe
Process32Next: winlogon.exe
Process32Next: services.exe
Process32Next: lsass.exe
Process32Next: svchost.exe
Process32Next: svchost.exe
Process32Next: svchost.exe
Process32Next: svchost.exe
Process32Next: svchost.exe
Process32Next: spoolsv.exe
Process32Next: explorer.exe
Opens Process: %WINDOWS%\explorer.exe
VirtualAllocEx: %WINDOWS%\explorer.exe
NtWriteVirtualMemory: %WINDOWS%\explorer.exe
VirtualAllocEx: %WINDOWS%\explorer.exe
NtWriteVirtualMemory: %WINDOWS%\explorer.exe
VirtualAllocEx: %WINDOWS%\explorer.exe
NtWriteVirtualMemory: %WINDOWS%\explorer.exe
VirtualAllocEx: %WINDOWS%\explorer.exe
NtWriteVirtualMemory: %WINDOWS%\explorer.exe
VirtualAllocEx: %WINDOWS%\explorer.exe
NtWriteVirtualMemory: %WINDOWS%\explorer.exe
CreateRemoteThread: %WINDOWS%\explorer.exe
NtResumeThread: %WINDOWS%\explorer.exe

Analysing a log like this tells you straight away that the malware is enumerating processes, and when it finds explorer.exe, it injects a bunch of buffers into it (possibly mapping sections of the PE payload?), and then creates a remote thread. As a result, the explorer.exe process now is hosting malicious payload.

While the code injection into explorer.exe can be deducted from manual dynamic analysis, or may be even obviously apparent when we are evaluating the process tree and network connections from a report generated by a sandbox, there is a subtle difference. The context these 2 additional intercepted APIs provide allows to be quite certain that the malware is actually quite specifically looking for the explorer.exe, and not for the other process.

It also tells us HOW the process is found.

And mind you, this is actually not a trivial question if you are doing in-depth malware analysis.

There are cases where this determination is very important. Having an ability to quickly determine if we are missing some target process on the test system can save us a lot of time spent on mundane manual analysis. This is actually one of the first questions your customer will ask you, especially when it comes to targeted attacks. It is a very responsible job to deliver the results and not to miss stuff!

When you look at malware that is highly targeted, f.ex. malware that is targeting Point of Sale systems, running it through a sandbox may _not_ give you any good results, because you either won’t see the process enumeration at all, or may miss the name of the process that the malware is looking for. The malware will look ‘broken’ to us. I can’t count how many times I wasted time on manual analysis and even incorrectly concluded that the malware is ‘broken’ while looking at heavily obfuscated, or bloatwarish malware samples. Until I started looking at the context of the early exit.

It is really helpful to be able to cheat a bit.

For the case of the process enumeration one can not only intercept the Process32First and Process32Next functions, but also enhance the results with the interception of string comparison functions.

If we get lucky, the result could look like this:

Process32First: [System Process]
lstrcmpiA ([System Process], explorer.exe)
Process32Next: System
lstrcmpiA (System, explorer.exe)
Process32Next: smss.exe
lstrcmpiA (smss.exe, explorer.exe)
Process32Next: csrss.exe
lstrcmpiA (csrss.exe, explorer.exe)
Process32Next: winlogon.exe
lstrcmpiA (winlogon.exe, explorer.exe)
Process32Next: services.exe
lstrcmpiA (services.exe, explorer.exe)
Process32Next: lsass.exe
lstrcmpiA (lsass.exe, explorer.exe)
Process32Next: vmacthlp.exe
lstrcmpiA (vmacthlp.exe, explorer.exe)
Process32Next: svchost.exe
lstrcmpiA (svchost.exe, explorer.exe)
Process32Next: svchost.exe
lstrcmpiA (svchost.exe, explorer.exe)
Process32Next: svchost.exe
lstrcmpiA (svchost.exe, explorer.exe)
Process32Next: svchost.exe
lstrcmpiA (svchost.exe, explorer.exe)
Process32Next: svchost.exe
lstrcmpiA (svchost.exe, explorer.exe)
Process32Next: spoolsv.exe
lstrcmpiA (spoolsv.exe, explorer.exe)
Process32Next: PERSFW.exe
lstrcmpiA (PERSFW.exe, explorer.exe)
Process32Next: explorer.exe
lstrcmpiA (explorer.exe, explorer.exe)

That makes the in-depth malware analysis supereasy, doesn’t?

I think there is a potential market for supporting in-depth malware analysis with sandbox technology – make the interception configurable (offer a list of APIs to monitor, allow time to run to be selected manually, rebuild files, perhaps give live access to the analysis box, etc.).

Reversing ykS is the limit.

And while I do commercial in-depth analysis and I may be shooting myself in a foot here, I can’t stress enough how important ROI is for both you and the customer.