Happy New Year 2018 & Get yourself logs from 250K sandboxed samples

Update 2

Please use this link:

https://mega.nz/#!LItwzAAL!NqcMVEnIqd17x5guL0V55gwjy8Q3xQMuSyeP-DelbRE

Update

Turns out I had a bug in my script and in the first go I exported less than 250K sessions (228K only), so I had to fix the dump and re-upload it. If you downloaded it previously, sorry, you will need to do it one more time 🙂

Thanks to @hrbrmstr for spotting and reporting the issue!

Old post

Happy New Year 2018!

Unless you are one of the companies or organizations doing commercial sample analysis and sandboxing it is almost impossible to get access to normalized data logs from sandboxing sessions. If you want to do analysis you need to either scrap data from the web, or run your own sandbox. In order to fill-in the gap I decided to release logs from 250,000 sandbox sessions.

  • The file contains logs from 250K sandboxed sessions (250K unique samples).
  • 32-bit PEs only. All executed Offline (no access to network).
  • Sometimes it may not be 100% accurate – I ran various sessions, with various settings/timeouts.
  • You’ll find traces of Windows API, NT API, VC and Delphi inline functions, COM, Visual Basic, string functions, Nullsoft APIs, Anti-VM tricks, etc. – and various stuff I discussed or will discuss in the Enter Sandbox series.

Have a look, run some analysis, crunch data – share results.

Link: https://mega.nz/#!XYcnTAyD!VvwOo9JBkqmRNPu5liSusl3tpC0kBpbRT6E8tfOejF0

File sizes (sha1 hashes):

34,515,244,109 Sandbox_250k_logs_Happy_New_Year_2018
               (30010C605B451CEA6483B93B299FA9758747B1DF)
   993,911,182 Sandbox_250k_logs_Happy_New_Year_2018.7z
               (D73FDAAC08B95536FA2702D327C8F3143A9A666C)

Note: This data cannot be used for commercial purposes.

If you like this release, you may also want to re-visit my older data dumps:

File format:

  • The file starts with a short header (easy to spot)
  • Then it’s followed by the ### SAMPLE #<number>
  • Then the actual logs start.
    • The lines start with [PID][TID][ADDRESS]
    • The API groups are prefixed with group prefixes i.e. API::, DELPHI::, VC:: (the latter are referring to inline functions)
    • The parameters are NOT named / structured accordingly to Windows API docs; this is because the log is focused on extracting the most useful information, and avoiding cluttering the log with the useless/unused function arguments (but then even this is only partially true, because this tool was growing organically over the years and was not an orchestrated effort to make OCDs  happy 😉 – if I was about to write it again, obviously it would be perfect 😉

Example:

### SAMPLE #00000001
[1980][252][00422c77]API::GetSystemTimeAsFileTime (lpSystemTimeAsFileTime=0012FFB0)
[1980][252][0041c305]API::GetModuleHandleW (lpModuleName=kernel32.dll)=7C800000
[1980][252][0041c315]API::GetProcAddress (mod=KERNEL32.dll, api=FlsAlloc)=00000000
[1980][252][0041c328]API::GetProcAddress (mod=KERNEL32.dll, api=FlsFree)=00000000
[1980][252][0041c33b]API::GetProcAddress (mod=KERNEL32.dll, api=FlsGetValue)=00000000
[1980][252][0041c34e]API::GetProcAddress (mod=KERNEL32.dll, api=FlsSetValue)=00000000
[1980][252][0041c361]API::GetProcAddress (mod=KERNEL32.dll, api=InitializeCriticalSectionEx)=00000000
[1980][252][0041c374]API::GetProcAddress (mod=KERNEL32.dll, api=CreateSemaphoreExW)=00000000
[1980][252][0041c387]API::GetProcAddress (mod=KERNEL32.dll, api=SetThreadStackGuarantee)=00000000
[1980][252][0041c39a]API::GetProcAddress (mod=KERNEL32.dll, api=CreateThreadpoolTimer)=00000000
[1980][252][0041c3ad]API::GetProcAddress (mod=KERNEL32.dll, api=SetThreadpoolTimer)=00000000
[1980][252][0041c3c0]API::GetProcAddress (mod=KERNEL32.dll, api=WaitForThreadpoolTimerCallbacks)=00000000
[1980][252][0041c3d3]API::GetProcAddress (mod=KERNEL32.dll, api=CloseThreadpoolTimer)=00000000
[1980][252][0041c3e6]API::GetProcAddress (mod=KERNEL32.dll, api=CreateThreadpoolWait)=00000000
[1980][252][0041c3f9]API::GetProcAddress (mod=KERNEL32.dll, api=SetThreadpoolWait)=00000000
[1980][252][0041c40c]API::GetProcAddress (mod=KERNEL32.dll, api=CloseThreadpoolWait)=00000000
[1980][252][0041c41f]API::GetProcAddress (mod=KERNEL32.dll, api=FlushProcessWriteBuffers)=00000000
[1980][252][0041c432]API::GetProcAddress (mod=KERNEL32.dll, api=FreeLibraryWhenCallbackReturns)=00000000
[1980][252][0041c445]API::GetProcAddress (mod=KERNEL32.dll, api=GetCurrentProcessorNumber)=00000000
[1980][252][0041c458]API::GetProcAddress (mod=KERNEL32.dll, api=GetLogicalProcessorInformation)=7C861E6F
[1980][252][0041c46b]API::GetProcAddress (mod=KERNEL32.dll, api=CreateSymbolicLinkW)=00000000
[1980][252][0041c47e]API::GetProcAddress (mod=KERNEL32.dll, api=SetDefaultDllDirectories)=00000000
[1980][252][0041c491]API::GetProcAddress (mod=KERNEL32.dll, api=EnumSystemLocalesEx)=00000000
[1980][252][0041c4a4]API::GetProcAddress (mod=KERNEL32.dll, api=CompareStringEx)=00000000
[1980][252][0041c4b7]API::GetProcAddress (mod=KERNEL32.dll, api=GetDateFormatEx)=00000000
[1980][252][0041c4ca]API::GetProcAddress (mod=KERNEL32.dll, api=GetLocaleInfoEx)=00000000
[1980][252][0041c4dd]API::GetProcAddress (mod=KERNEL32.dll, api=GetTimeFormatEx)=00000000
[1980][252][0041c4f0]API::GetProcAddress (mod=KERNEL32.dll, api=GetUserDefaultLocaleName)=00000000
[1980][252][0041c503]API::GetProcAddress (mod=KERNEL32.dll, api=IsValidLocaleName)=00000000
[1980][252][0041c516]API::GetProcAddress (mod=KERNEL32.dll, api=LCMapStringEx)=00000000
[1980][252][0041c529]API::GetProcAddress (mod=KERNEL32.dll, api=GetCurrentPackageId)=00000000
[1980][252][0041ab40]API::GetCommandLineW = "_0000034AD55817135B1B1C4AE97CD449.exe"
[1980][252][004228f9]API::GetModuleFileNameW (mod=00000000, namebuf=%SYSTEM%\_0000034AD55817135B1B1C4AE97CD449.exe, buflen=260)
[1980][252][00424495]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0042450c]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041be2d]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041bea1]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041bf83]API::WideCharToMultiByte (cp= [000004E4, 1252],fl= [00000000, 0],wide= 
[1980][252][0041be2d]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041bea1]API::MultiByteToWideChar (CodePage=000004E4,dwFlags=MB_PRECOMPOSED [00000001, 1],lpMultiByteStr= 
[1980][252][0041bf83]API::WideCharToMultiByte (cp= [000004E4, 1252],fl= [00000000, 0],wide= 
[1980][252][0041c543]API::SetUnhandledExceptionFilter (0042247D)
[1980][252][00401dde]VC::vc_strlen1 (lpString=\/)
[1980][252][0040df7c]API::GetTempPathA (namebuf=C:\DOCUME~1\USERNAME\LOCALS~1\Temp\, buflen=260)

Enter Sandbox – part 15: The muddy, heavy water world of atomic formats…

Sample analysis process typically covers looking at the most common forensic suspects including mutexes, event names, and atoms. However, there is one more sub-artifact sitting on the same bench with the last one I have listed… one that often escapes the scrutiny of sandboxes and malware analysts – the clipboard format.

The clipboard format is registered using the RegisterClipboardFormat function – it allows applications to interchange data as long as they understand the format. The registration is implemented with the use of atoms as explained in this presentation.

Sandboxes and analysts inspecting the calls to RegisterClipboardFormat can use the received data in many ways. It can help to determine a file type of the sample, its modules, detect a family of a malware/adware, or perhaps a programming framework, and in some cases heuristically detect capabilities of the tested sample. I have listed a few example clipboard formats below. If you look at it one set that immediately stands out are Delphi clipboard formats:

  • Delphi Picture
  • Delphi Component
  • ControlOfs<HEX-STRING> (f.ex. ControlOfs00400000000007A8)

Finding these in the API calls or even in memory is a good indication that there is a Delphi application running.

The same goes for ATL samples:

  • WM_ATLGETCONTROL
  • WM_ATLGETHOST

There are also malware-adware-specific formats e.g.:

  • AmInst__Runing
  • yimomotoTec Picture
  • yimomotoTec Component
  • PowerSpider
  • RinLoggerInstance
  • SatoriWM_SetNetworkShareableFlag
  • Transfer_File_Success_Doyo
  • 180StartDownload

… RAT-related formats:

  • WinVNC.Update.Mouse
  • WinVNC.Update.DrawRect
  • WinVNC.Update.CopyRect
  • WinVNC.AddClient.Message
  • UltraVNC.Viewer.FileTransferSendPacketMessage

… test formats:

  • Hey, this is unicough single instance test
  • UWM_GAMETESTCMD_{75AEED17-2310-4171-94C6-08AC4438E814}_MSG
  • Message.My.Super.Puper.Test.Program.XXX
  • KSDB_TEST: Message communciation between Agent and its TEST host client.
  • FONT-TEST

… various functionality-related formats:

  • WM_HTML_GETOBJECT
  • RasDialEvent
  • EXPLORER.EXEIsDebuggerPresentExEdLl
  • winmm_devicechange
  • WM_HOOKEX_RK
  • UWM_KEYHOOK_MSG-968C3043-1128-43dc-83A9-55122C8D87C1
  • Transfer_File_Success_Doyo
  • 3rdeye_tb_hacking_dll
  • keyhook_msg

… P2P programs formats:

  • EMULE-{4EADC6FC-516F-4b7c-9066-97D893649570}
  • KazaaNewSearch

… possible hints on programmer’s mother tongue:

  • Karte ziehen
  • querodarmeucu

…random:

  • trhgtehgfsgrfgtrwegtre
  • frgjbfdkbnfsdjbvofsjfrfre
  • hgtrfsgfrsgfgregtregtr
  • gsegtsrgrefsfsfsgrsgrt

A short list of top 30 formats I collected from my sampleset:

 46894 TaskbarCreated
 30020 commdlg_FindReplace
 27886 Delphi Picture
 27886 Delphi Component
 27491 commdlg_help
 13920 WM_ATLGETCONTROL
 13914 WM_ATLGETHOST
 11000 3
  8395 commctrl_DragListMsg
  7445 1
  6909 WM_GETCONTROLNAME
  5475 FileName
  5020 Embedded Object
  4899 Link Source
  4885 Rich Text Format
  4787 Object Descriptor
  4652 commdlg_ColorOK
  4576 OwnerLink
  4574 Embed Source
  4569 Link Source Descriptor