In the last three posts I talked about batch analysis, clustering and applying these techniques to APT sampleset.
Batch processing is a step necessary for retrieving ‘clusterable’ data from samples in an automated fashion.
Clustering is a way of putting these samples into buckets, potentially grouping them into some families.
I want to see if w/o using any assumption/knowledge (retrieved from the white paper or other blogs) it is possible to cluster these samples in a reliable way. It is an interesting experiment and I am curious if I will ever get closer to already known clusters. Quite frankly, I don’t know yet. We shall see.
The clustering I have done so far was focused on dynamic analysis and a little bit on the source code analysis. In this post I will exploit code analysis further – this time focusing on disassembled .asm files generated as usual by the IDA Pro.
The resulting assembly code is quite nice for parsing as each line contains only one line of code – this allows to group the code into blocks on function boundaries and for each call to API or to another subroutine (including calls via registers), we can extract a simplified code of the program procedures e.g.
sub_401000 proc near ; CODE XREF: _main+20Ap [...] lea ecx, [esp+310h+szLongPath] push 104h ; nSize push ecx ; lpFilename push 0 ; hModule call ds:GetModuleFileNameA lea edx, [esp+310h+szLongPath] push 104h ; cchBuffer lea eax, [esp+314h+szLongPath] push edx ; lpszShortPath push eax ; lpszLongPath call ds:GetShortPathNameA lea ecx, [esp+310h+Parameters] push offset String2 ; "/c del " push ecx ; lpString1 call ds:lstrcpyA mov esi, ds:lstrcatA lea edx, [esp+310h+szLongPath] lea eax, [esp+310h+Parameters] push edx ; lpString2 push eax ; lpString1 call esi ; lstrcatA lea ecx, [esp+310h+Parameters] push offset s->>>nul ; " >>NUL" push ecx ; lpString1 call esi ; lstrcatA mov esi, ds:ShellExecuteA push 0 ; nShowCmd push offset Directory ; lpDirectory lea edx, [esp+318h+File] push offset Parameters ; "/c del wuauclt.exe" push edx ; lpFile push offset Operation ; "open" push 0 ; hwnd call esi ; ShellExecuteA push 0 ; nShowCmd push offset Directory ; lpDirectory lea eax, [esp+318h+File] push offset s->CDelSvchost_exe ; "/c del svchost.exe" push eax ; lpFile push offset Operation ; "open" push 0 ; hwnd call esi ; ShellExecuteA [...] retnsub_401000 endp
becomes
GetModuleFileNameA
GetShortPathNameA
lstrcpyA
lstrcatA
lstrcatA
ShellExecuteA
ShellExecuteA
ShellExecuteA
and can be written as a single line of code
GetModuleFileNameA|GetShortPathNameA|lstrcpyA|lstrcatA|lstrcatA|ShellExecuteA|ShellExecuteA|ShellExecuteA
Applying such methodology on procedure boundaries and to each disassembled program I eventually came up with a shortened and flattened source code of each sample. I then built a histogram of the most common sequences of such code blocks across all the source code from all files and got the following stats:
5514 |sub 2507 |sub|sub 1332 |sub|sub|sub 860 |sub|sub|sub|sub 558 |__security_check_cookie(x) 479 |__security_check_cookie(x)|__security_check_cookie(x) 475 |sub|sub|sub|sub|sub 392 |sub|sub|sub|sub|sub|sub 353 |operator delete(void *) 276 |sub|operator delete(void *) 269 |sub|sub|sub|sub|sub|sub|sub 235 |sub|sub|sub|sub|sub|sub|sub|sub 185 |sub|sub|sub|sub|sub|sub|sub|sub|sub 168 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 165 |__alloca_probe|sub|sub 137 |eax 132 |sub|sub|ecx 132 |__alloca_probe|sub 130 |_atexit 123 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 110 |_chkstk|sub|sub 108 |strlen|operator delete(void *)|operator new(uint)|strcpy 106 |nullsub 106 |__alloca_probe 101 |_chkstk|sub 97 |eax|sub 92 |__alloca_probe|sub|sub|sub|sub 91 |__alloca_probe|sub|sub|sub 88 |_chkstk|sub|sub|sub 88 |__alloca_probe|sub|sub|sub|sub|sub|sub 85 |__alloca_probe|sub|sub|sub|sub|sub 80 |exception const &) 75 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 73 |strlen 73 |_chkstk|sub|sub|sub|sub|sub 72 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 71 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 71 |_Tidy(bool,uint) 69 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 68 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub 68 |InternetCloseHandle|InternetCloseHandle|InternetCloseHandle 67 |sub|eax 63 |_chkstk|sub|sub|sub|sub|sub|sub 62 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 62 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub 61 |free 60 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 60 |allocator<char>>(char const *)|_atexit 59 |sub|_CxxThrowException(x,x) 56 |_CxxThrowException 56 |InternetReadFile 55 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 55 |_chkstk 55 |SetUnhandledExceptionFilter 52 |operator new(uint)|exception(char const * const &)|_CxxThrowException(x,x) 52 |operator delete(void *)|_CxxThrowException(x,x) 52 |_flsall 51 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 51 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub 50 |_chkstk|sub|sub|sub|sub 49 |j_free 48 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 47 |sub|sub|_CxxThrowException(x,x) 47 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 45 |sub|sub|sub|sub|eax 44 |strchr|strchr 44 |malloc|sub|sub|free 43 |dword ptr [ecx+8] 42 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 40 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 40 |sub|_Split(void)|_wmemmove|sub|_Eos(uint)|_Split(void)|_Tidy(bool)|sub 40 |operator delete(void *)|operator delete(void *) 40 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub 40 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 40 |ReadFile|_memcpy_0 39 |sub|_CxxThrowException 39 |GetModuleFileNameA|GetShortPathNameA|GetEnvironmentVariableA|lstrcpyA|lstrcatA|lstrcatA|GetCurrentProcess|SetPriorityClass|GetCurrentThread|SetThreadPriority|ShellExecuteExA|SetPriorityClass|SetProcessPriorityBoost|SHChangeNotify|GetCurrentProcess|SetPriorityClass|GetCurrentThread|SetThreadPriority 38 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 38 |_chkstk|sub|sub|sub|sub|sub|sub|sub 37 |GetCurrentProcess|OpenProcessToken|LookupPrivilegeValueA|AdjustTokenPrivileges|CloseHandle|GetLastError 36 |sub|sub|dword ptr [eax]|sub|sub|sub 36 |sub|ecx 36 |dword ptr [ecx+4] 36 |_memset|sub|__security_check_cookie(x) 35 |sub|sub|__security_check_cookie 35 |sub|operator delete(void *)|operator delete(void *)|operator delete(void *)|operator delete(void *) 35 |__invalid_parameter_noinfo 34 |operator new(uint) 34 |_free 34 |_LocaleUpdate(localeinfo_struct *)|___strgtold12_l|sub|__security_check_cookie(x) 33 |sub|sub|eax|sub 33 |sub|operator delete(void *)|operator delete(void *) 33 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 33 |__errno|__invalid_parameter 32 |operator delete(void *)|operator new(uint) 32 |memset 31 |operator new(uint)|sub 31 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 30 |eax|sub|sub|sub|sub 30 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 30 |__EH_prolog|_Tidy(bool)|_strlen|sub|sub|_CxxThrowException(x,x) 30 |SetServiceStatus 28 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 27 |sub|_Split(void)|_memcpy|sub|_Eos(uint)|_Split(void)|_Tidy(bool)|sub 27 |strlen|sub 27 |memcpy 27 |_strcmpi|memset|memset|CreateToolhelp32Snapshot|Process32First|sprintf|strcat|Process32Next|CloseHandle|_strcmpi|OpenSCManagerA|EnumServicesStatusExA|operator new(uint)|CloseServiceHandle|strcat|EnumServicesStatusExA|sprintf|strcat|operator delete(void *)|CloseServiceHandle|_strcmpi|GetLogicalDrives|sprintf|strcat|sprintf|strcat|lstrcatA|GetDriveTypeA|strcat|GetVolumeInformationA|strcat|strcat|sprintf|strcat 27 |_strcmpi|atoi|OpenProcess|TerminateProcess|CloseHandle|strcat|_strcmpi|OpenSCManagerA|OpenServiceA|GetLastError|strcat|CloseServiceHandle|ControlService|GetLastError|strcat|CloseServiceHandle|CloseServiceHandle 27 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub 27 |GetProcAddress 27 |GetExitCodeProcess|PeekNamedPipe|Sleep|ReadFile|CloseHandle|CloseHandle|memset|strcpy|strlen 26 |sub|sub|sub|sub|_memcpy_s 26 |sub|eax|sub|eax|sub 26 |sub|_Tidy(bool)|_Tidy(bool)|sub 26 |strstr|strchr|operator new(uint)|strchr|strchr|strchr|strchr|strchr|strchr|strchr|strchr|strchr|operator delete(void *) 26 |strlen|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 26 |sprintf|HttpAddRequestHeadersA|HttpSendRequestA|GetLastError|InternetQueryOptionA|InternetSetOptionA|sprintf 26 |__ld12cvt 26 |___strgtold12|sub 26 |__EH_prolog3|sub|sub|_CxxThrowException(x,x) 26 |InternetOpenA|InternetSetOptionA|InternetSetOptionA|InternetSetOptionA|InternetConnectA|HttpOpenRequestA|strlen|HttpAddRequestHeadersA 26 |$+5 25 |rand 25 |malloc|CreatePipe|CreatePipe|CloseHandle|CloseHandle|CloseHandle|CloseHandle|free|sub|CloseHandle|CloseHandle 25 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 25 |__invalid_parameter_noinfo|__invalid_parameter_noinfo 25 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 25 |URLDownloadToFileA|strcat 24 |sub|sub|sub|sub|sub|GetProcAddress|sub|sub|sub 24 |sub|edx|sub 24 |sub|_Split(void)|_wmemmove|sub|_Eos(uint)|_Split(void)|sub|sub 24 |shutdown|closesocket 24 |send 24 |fopen|fseek|fread|fseek|ftell|fseek|fread|fclose|fclose|fread|fclose|sub 24 |edx 24 |dword ptr [eax+40h] 24 |_beginthreadex|CloseHandle 24 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 24 |GetModuleHandleA|GetProcAddress 23 |unknown_libname_1 23 |sub|sub|sub|sub|operator delete(void *) 23 |sub|OpenProcess|TerminateProcess|Sleep|CloseHandle|sub 23 |strlen|CreateFileA|strlen|operator new(uint)|memset|WriteConsoleInputA|operator delete(void *)|CloseHandle 23 |strcat|sub|WaitForSingleObject|strcat|strcat|strlen|sub 23 |j_free|j_free 23 |j_free|_CxxThrowException 23 |LoadStringA|sub 23 |CloseHandle 22 |~type_info(void)|operator delete(void *) 22 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 22 |sub|operator new(unsigned __int64)|exception(char const * const &)|_CxxThrowException|sub|sub|j_free 22 |operator new(uint)|operator new(uint)|sub 22 |operator new(uint)|operator delete(void *) 22 |operator delete(void *)|operator delete(void *)|operator delete(void *) 22 |exception(char const * const &) 22 |eax|sub|sub|sub 22 |GetCurrentProcess|GetCurrentProcess|DuplicateHandle|CreateProcessA|CloseHandle 22 |CompareStringA 22 |$+5|sub|sub 21 |sub|_wcslen|sub|sub|sub|sub 21 |sprintf|sprintf|sub 21 |malloc|recv|sub|sub|_strnicmp|WriteFile|recv|free|ExitThread|SetEvent|free|ExitThread 21 |malloc|PeekNamedPipe|ReadFile|sub|sub|_itoa|send|sub|Sleep|PeekNamedPipe|free|ExitThread 21 |_strcmpi|memset|CreateProcessA|strcat|CloseHandle|_strcmpi|OpenSCManagerA|strcat|OpenServiceA|GetLastError|strcat|CloseServiceHandle|StartServiceA|GetLastError|strcat|CloseServiceHandle|CloseHandle 21 |__get_sse2_info 21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub 21 |GetCurrentProcess|OpenProcess|GetLastError|sprintf|strcat|OpenProcessToken|memset|sprintf|CreateProcessAsUserA|strcat|CloseHandle|CloseHandle|GetLastError|sprintf|strcat|CloseHandle|GetLastError|sprintf|strcat|CloseHandle 21 |CreateEventA|CreateEventA|sub|WaitForSingleObject|CloseHandle 21 |$+5|sub
Using these shortened procedures for cluster generations gives some promising results e.g.:
sub DeleteFileW DeleteFileA 1328eaceb140a3863951d18661b097af.asm 31e5e58dbdfad05175613e795298ebb5.asm 6f9992c486195edcf0bf2f6ee6c3ec74.asm c99fa835350aa9e2427ce69323b061a9.asm e476e4a24f8b4ff4c8a0b260aa35fc9f.asm ea1b44094ae4d8e2b63a1771a3e61fd5.asm fc1937c1aa536b3744ebdfb1716fd54d.asm
LoadLibraryA GetProcAddress GetProcAddress GetProcAddress 3f8682ab074a097ebbaadbf26dfff560.asm 4b19a2a6d40a5825e868c6ef25ae445e.asm 54d5d171a482278cc8eacf08d9175fd7.asm 56de2854ef64d869b5df7af5e4effe3e.asm 75dad1ccabae8adeb5bae899d0c630f8.asm 8462a62f13f92c34e4b89a7d13a185ad.asm
htons socket connect closesocket 468ff2c12cffc7e5b2fe0ee6bb3b239e.asm 727a6800991eead454e53e8af164a99c.asm bd8b082b7711bc980252f988bb0ca936.asm db05df0498b59b42a8e493cf3c10c578.asm e1b6940985a23e5639450f8391820655.asm
ecx eax dword ptr [esi+10h] sub ecx eax sub sub sub sub sub sub sub sub 12f25ce81596aeb19e75cc7ef08f3a38.asm 268eef019bf65b2987e945afaf29643f.asm 468ff2c12cffc7e5b2fe0ee6bb3b239e.asm 4c6bddcca2695d6202df38708e14fc7e.asm 5a728cb9ce56763dccb32b5298d0f050.asm 727a6800991eead454e53e8af164a99c.asm 8e8622c393d7e832d39e620ead5d3b49.asm bd8b082b7711bc980252f988bb0ca936.asm c6a4bb1a4e4f69ec71855d70d6960859.asm db05df0498b59b42a8e493cf3c10c578.asm e1b6940985a23e5639450f8391820655.asm ef8e0fb20e7228c7492ccdc59d87c690.asm
LoadLibraryA GetProcAddress sub sub strstr strchr GetSystemDirectoryA time srand malloc sub sub strncmp Sleep sub Sleep sub Sleep CreatePipe CreatePipe GetStartupInfoA CreateProcessA GetLastError _snprintf sub CreateProcessA CreateThread CreateThread WaitForMultipleObjects GetExitCodeThread TerminateThread GetExitCodeThread TerminateThread GetExitCodeProcess TerminateProcess sub sub GetLastError _snprintf sub CloseHandle CloseHandle CloseHandle CloseHandle sub sub Sleep PeekNamedPipe ReadFile sub
0dd3677594632ce270bcf8af94819caf.asm 270d42f292105951ee81e4085ea45054.asm 523f56515221161579ee6090c962e5b1.asm
Notably, the disassembled code – after some selective processing and normalization – can be treated in a same way as student source code submissions for their assessments at uni and… be checked for plagiarism. The most common technique used for this purpose relies on measuring the cosine similarity. I am currently playing with it and will write more about my findings in another post.
Thanks for reading!