Clustering and Batch Analysis of APT1 sampleset, part 3

Part 1, Part 2, Part 3

In the last three posts I talked about batch analysis, clustering and applying these techniques to APT sampleset.

Batch processing is a step necessary for retrieving ‘clusterable’ data from samples in an automated fashion.

Clustering is a way of putting these samples into buckets, potentially grouping them into some families.

I want to see if w/o using any assumption/knowledge (retrieved from the white paper or other blogs) it is possible to cluster these samples in a reliable way. It is an interesting experiment and I am curious if I will ever get closer to already known clusters. Quite frankly, I don’t know yet. We shall see.

The clustering I have done so far was focused on dynamic analysis and a little bit on the source code analysis. In this post I will exploit code analysis further – this time focusing on disassembled .asm files generated as usual by the IDA Pro.

The resulting assembly code is quite nice for parsing as each line contains only one line of code – this allows to group the code into blocks on function boundaries and for each call to API or to another subroutine (including calls via registers), we can extract a simplified code of the program procedures e.g.

sub_401000    proc near        ; CODE XREF: _main+20Ap
[...]

lea    ecx, [esp+310h+szLongPath]
push    104h        ; nSize
push    ecx        ; lpFilename
push    0        ; hModule
call    ds:GetModuleFileNameA

lea    edx, [esp+310h+szLongPath]
push    104h        ; cchBuffer
lea    eax, [esp+314h+szLongPath]
push    edx        ; lpszShortPath
push    eax        ; lpszLongPath
call    ds:GetShortPathNameA

lea    ecx, [esp+310h+Parameters]
push    offset String2    ; "/c del "
push    ecx        ; lpString1
call    ds:lstrcpyA

mov    esi, ds:lstrcatA
lea    edx, [esp+310h+szLongPath]
lea    eax, [esp+310h+Parameters]
push    edx        ; lpString2
push    eax        ; lpString1
call    esi ; lstrcatA

lea    ecx, [esp+310h+Parameters]
push    offset s->>>nul    ; " >>NUL"
push    ecx        ; lpString1
call    esi ; lstrcatA

mov    esi, ds:ShellExecuteA
push    0        ; nShowCmd
push    offset Directory ; lpDirectory
lea    edx, [esp+318h+File]
push    offset Parameters ; "/c    del wuauclt.exe"
push    edx        ; lpFile
push    offset Operation ; "open"
push    0        ; hwnd
call    esi ; ShellExecuteA

push    0        ; nShowCmd
push    offset Directory ; lpDirectory
lea    eax, [esp+318h+File]
push    offset s->CDelSvchost_exe ; "/c    del svchost.exe"
push    eax        ; lpFile
push    offset Operation ; "open"
push    0        ; hwnd
call    esi ; ShellExecuteA

[...]
retnsub_401000    endp

becomes

GetModuleFileNameA
GetShortPathNameA
lstrcpyA
lstrcatA
lstrcatA
ShellExecuteA
ShellExecuteA
ShellExecuteA

and can be written as a single line of code

GetModuleFileNameA|GetShortPathNameA|lstrcpyA|lstrcatA|lstrcatA|ShellExecuteA|ShellExecuteA|ShellExecuteA

Applying such methodology on procedure boundaries and to each disassembled program I eventually came up with a shortened and flattened source code of each sample. I then built a histogram of the most common sequences of such code blocks across all the source code from all files and got the following stats:

   5514 |sub
   2507 |sub|sub
   1332 |sub|sub|sub
    860 |sub|sub|sub|sub
    558 |__security_check_cookie(x)
    479 |__security_check_cookie(x)|__security_check_cookie(x)
    475 |sub|sub|sub|sub|sub
    392 |sub|sub|sub|sub|sub|sub
    353 |operator delete(void *)
    276 |sub|operator delete(void *)
    269 |sub|sub|sub|sub|sub|sub|sub
    235 |sub|sub|sub|sub|sub|sub|sub|sub
    185 |sub|sub|sub|sub|sub|sub|sub|sub|sub
    168 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
    165 |__alloca_probe|sub|sub
    137 |eax
    132 |sub|sub|ecx
    132 |__alloca_probe|sub
    130 |_atexit
    123 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
    110 |_chkstk|sub|sub
    108 |strlen|operator delete(void *)|operator new(uint)|strcpy
    106 |nullsub
    106 |__alloca_probe
    101 |_chkstk|sub
     97 |eax|sub
     92 |__alloca_probe|sub|sub|sub|sub
     91 |__alloca_probe|sub|sub|sub
     88 |_chkstk|sub|sub|sub
     88 |__alloca_probe|sub|sub|sub|sub|sub|sub
     85 |__alloca_probe|sub|sub|sub|sub|sub
     80 |exception const &)
     75 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     73 |strlen
     73 |_chkstk|sub|sub|sub|sub|sub
     72 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     71 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     71 |_Tidy(bool,uint)
     69 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     68 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub
     68 |InternetCloseHandle|InternetCloseHandle|InternetCloseHandle
     67 |sub|eax
     63 |_chkstk|sub|sub|sub|sub|sub|sub
     62 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     62 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub
     61 |free
     60 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     60 |allocator<char>>(char const *)|_atexit
     59 |sub|_CxxThrowException(x,x)
     56 |_CxxThrowException
     56 |InternetReadFile
     55 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     55 |_chkstk
     55 |SetUnhandledExceptionFilter
     52 |operator new(uint)|exception(char const * const &)|_CxxThrowException(x,x)
     52 |operator delete(void *)|_CxxThrowException(x,x)
     52 |_flsall
     51 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     51 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub
     50 |_chkstk|sub|sub|sub|sub
     49 |j_free
     48 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     47 |sub|sub|_CxxThrowException(x,x)
     47 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     45 |sub|sub|sub|sub|eax
     44 |strchr|strchr
     44 |malloc|sub|sub|free
     43 |dword ptr [ecx+8]
     42 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     40 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     40 |sub|_Split(void)|_wmemmove|sub|_Eos(uint)|_Split(void)|_Tidy(bool)|sub
     40 |operator delete(void *)|operator delete(void *)
     40 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub
     40 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     40 |ReadFile|_memcpy_0
     39 |sub|_CxxThrowException
     39 |GetModuleFileNameA|GetShortPathNameA|GetEnvironmentVariableA|lstrcpyA|lstrcatA|lstrcatA|GetCurrentProcess|SetPriorityClass|GetCurrentThread|SetThreadPriority|ShellExecuteExA|SetPriorityClass|SetProcessPriorityBoost|SHChangeNotify|GetCurrentProcess|SetPriorityClass|GetCurrentThread|SetThreadPriority
     38 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     38 |_chkstk|sub|sub|sub|sub|sub|sub|sub
     37 |GetCurrentProcess|OpenProcessToken|LookupPrivilegeValueA|AdjustTokenPrivileges|CloseHandle|GetLastError
     36 |sub|sub|dword ptr [eax]|sub|sub|sub
     36 |sub|ecx
     36 |dword ptr [ecx+4]
     36 |_memset|sub|__security_check_cookie(x)
     35 |sub|sub|__security_check_cookie
     35 |sub|operator delete(void *)|operator delete(void *)|operator delete(void *)|operator delete(void *)
     35 |__invalid_parameter_noinfo
     34 |operator new(uint)
     34 |_free
     34 |_LocaleUpdate(localeinfo_struct *)|___strgtold12_l|sub|__security_check_cookie(x)
     33 |sub|sub|eax|sub
     33 |sub|operator delete(void *)|operator delete(void *)
     33 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     33 |__errno|__invalid_parameter
     32 |operator delete(void *)|operator new(uint)
     32 |memset
     31 |operator new(uint)|sub
     31 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     30 |eax|sub|sub|sub|sub
     30 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     30 |__EH_prolog|_Tidy(bool)|_strlen|sub|sub|_CxxThrowException(x,x)
     30 |SetServiceStatus
     28 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     27 |sub|_Split(void)|_memcpy|sub|_Eos(uint)|_Split(void)|_Tidy(bool)|sub
     27 |strlen|sub
     27 |memcpy
     27 |_strcmpi|memset|memset|CreateToolhelp32Snapshot|Process32First|sprintf|strcat|Process32Next|CloseHandle|_strcmpi|OpenSCManagerA|EnumServicesStatusExA|operator new(uint)|CloseServiceHandle|strcat|EnumServicesStatusExA|sprintf|strcat|operator delete(void *)|CloseServiceHandle|_strcmpi|GetLogicalDrives|sprintf|strcat|sprintf|strcat|lstrcatA|GetDriveTypeA|strcat|GetVolumeInformationA|strcat|strcat|sprintf|strcat
     27 |_strcmpi|atoi|OpenProcess|TerminateProcess|CloseHandle|strcat|_strcmpi|OpenSCManagerA|OpenServiceA|GetLastError|strcat|CloseServiceHandle|ControlService|GetLastError|strcat|CloseServiceHandle|CloseServiceHandle
     27 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub
     27 |GetProcAddress
     27 |GetExitCodeProcess|PeekNamedPipe|Sleep|ReadFile|CloseHandle|CloseHandle|memset|strcpy|strlen
     26 |sub|sub|sub|sub|_memcpy_s
     26 |sub|eax|sub|eax|sub
     26 |sub|_Tidy(bool)|_Tidy(bool)|sub
     26 |strstr|strchr|operator new(uint)|strchr|strchr|strchr|strchr|strchr|strchr|strchr|strchr|strchr|operator delete(void *)
     26 |strlen|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     26 |sprintf|HttpAddRequestHeadersA|HttpSendRequestA|GetLastError|InternetQueryOptionA|InternetSetOptionA|sprintf
     26 |__ld12cvt
     26 |___strgtold12|sub
     26 |__EH_prolog3|sub|sub|_CxxThrowException(x,x)
     26 |InternetOpenA|InternetSetOptionA|InternetSetOptionA|InternetSetOptionA|InternetConnectA|HttpOpenRequestA|strlen|HttpAddRequestHeadersA
     26 |$+5
     25 |rand
     25 |malloc|CreatePipe|CreatePipe|CloseHandle|CloseHandle|CloseHandle|CloseHandle|free|sub|CloseHandle|CloseHandle
     25 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     25 |__invalid_parameter_noinfo|__invalid_parameter_noinfo
     25 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     25 |URLDownloadToFileA|strcat
     24 |sub|sub|sub|sub|sub|GetProcAddress|sub|sub|sub
     24 |sub|edx|sub
     24 |sub|_Split(void)|_wmemmove|sub|_Eos(uint)|_Split(void)|sub|sub
     24 |shutdown|closesocket
     24 |send
     24 |fopen|fseek|fread|fseek|ftell|fseek|fread|fclose|fclose|fread|fclose|sub
     24 |edx
     24 |dword ptr [eax+40h]
     24 |_beginthreadex|CloseHandle
     24 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     24 |GetModuleHandleA|GetProcAddress
     23 |unknown_libname_1
     23 |sub|sub|sub|sub|operator delete(void *)
     23 |sub|OpenProcess|TerminateProcess|Sleep|CloseHandle|sub
     23 |strlen|CreateFileA|strlen|operator new(uint)|memset|WriteConsoleInputA|operator delete(void *)|CloseHandle
     23 |strcat|sub|WaitForSingleObject|strcat|strcat|strlen|sub
     23 |j_free|j_free
     23 |j_free|_CxxThrowException
     23 |LoadStringA|sub
     23 |CloseHandle
     22 |~type_info(void)|operator delete(void *)
     22 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     22 |sub|operator new(unsigned __int64)|exception(char const * const &)|_CxxThrowException|sub|sub|j_free
     22 |operator new(uint)|operator new(uint)|sub
     22 |operator new(uint)|operator delete(void *)
     22 |operator delete(void *)|operator delete(void *)|operator delete(void *)
     22 |exception(char const * const &)
     22 |eax|sub|sub|sub
     22 |GetCurrentProcess|GetCurrentProcess|DuplicateHandle|CreateProcessA|CloseHandle
     22 |CompareStringA
     22 |$+5|sub|sub
     21 |sub|_wcslen|sub|sub|sub|sub
     21 |sprintf|sprintf|sub
     21 |malloc|recv|sub|sub|_strnicmp|WriteFile|recv|free|ExitThread|SetEvent|free|ExitThread
     21 |malloc|PeekNamedPipe|ReadFile|sub|sub|_itoa|send|sub|Sleep|PeekNamedPipe|free|ExitThread
     21 |_strcmpi|memset|CreateProcessA|strcat|CloseHandle|_strcmpi|OpenSCManagerA|strcat|OpenServiceA|GetLastError|strcat|CloseServiceHandle|StartServiceA|GetLastError|strcat|CloseServiceHandle|CloseHandle
     21 |__get_sse2_info
     21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
     21 |GetCurrentProcess|OpenProcess|GetLastError|sprintf|strcat|OpenProcessToken|memset|sprintf|CreateProcessAsUserA|strcat|CloseHandle|CloseHandle|GetLastError|sprintf|strcat|CloseHandle|GetLastError|sprintf|strcat|CloseHandle
     21 |CreateEventA|CreateEventA|sub|WaitForSingleObject|CloseHandle
     21 |$+5|sub

Using these shortened procedures for cluster generations gives some promising results e.g.:

sub
DeleteFileW
DeleteFileA

1328eaceb140a3863951d18661b097af.asm
31e5e58dbdfad05175613e795298ebb5.asm
6f9992c486195edcf0bf2f6ee6c3ec74.asm
c99fa835350aa9e2427ce69323b061a9.asm
e476e4a24f8b4ff4c8a0b260aa35fc9f.asm
ea1b44094ae4d8e2b63a1771a3e61fd5.asm
fc1937c1aa536b3744ebdfb1716fd54d.asm
LoadLibraryA
GetProcAddress
GetProcAddress
GetProcAddress

3f8682ab074a097ebbaadbf26dfff560.asm
4b19a2a6d40a5825e868c6ef25ae445e.asm
54d5d171a482278cc8eacf08d9175fd7.asm
56de2854ef64d869b5df7af5e4effe3e.asm
75dad1ccabae8adeb5bae899d0c630f8.asm
8462a62f13f92c34e4b89a7d13a185ad.asm
htons
socket
connect
closesocket

468ff2c12cffc7e5b2fe0ee6bb3b239e.asm
727a6800991eead454e53e8af164a99c.asm
bd8b082b7711bc980252f988bb0ca936.asm
db05df0498b59b42a8e493cf3c10c578.asm
e1b6940985a23e5639450f8391820655.asm
ecx
eax
dword ptr [esi+10h]
sub
ecx
eax
sub
sub
sub
sub
sub
sub
sub
sub

12f25ce81596aeb19e75cc7ef08f3a38.asm
268eef019bf65b2987e945afaf29643f.asm
468ff2c12cffc7e5b2fe0ee6bb3b239e.asm
4c6bddcca2695d6202df38708e14fc7e.asm
5a728cb9ce56763dccb32b5298d0f050.asm
727a6800991eead454e53e8af164a99c.asm
8e8622c393d7e832d39e620ead5d3b49.asm
bd8b082b7711bc980252f988bb0ca936.asm
c6a4bb1a4e4f69ec71855d70d6960859.asm
db05df0498b59b42a8e493cf3c10c578.asm
e1b6940985a23e5639450f8391820655.asm
ef8e0fb20e7228c7492ccdc59d87c690.asm
LoadLibraryA
GetProcAddress
sub
sub
strstr
strchr
GetSystemDirectoryA
time
srand
malloc
sub
sub
strncmp
Sleep
sub
Sleep
sub
Sleep
CreatePipe
CreatePipe
GetStartupInfoA
CreateProcessA
GetLastError
_snprintf
sub
CreateProcessA
CreateThread
CreateThread
WaitForMultipleObjects
GetExitCodeThread
TerminateThread
GetExitCodeThread
TerminateThread
GetExitCodeProcess
TerminateProcess
sub
sub
GetLastError
_snprintf
sub
CloseHandle
CloseHandle
CloseHandle
CloseHandle
sub
sub
Sleep
PeekNamedPipe
ReadFile
sub
0dd3677594632ce270bcf8af94819caf.asm
270d42f292105951ee81e4085ea45054.asm
523f56515221161579ee6090c962e5b1.asm

Notably, the disassembled code – after some selective processing and normalization – can be treated in a same way as student source code submissions for their assessments at uni and… be checked for plagiarism. The most common technique used for this purpose relies on measuring the  cosine similarity. I am currently playing with it and will write more about my findings in another post.

Thanks for reading!