Part 1, Part 2, Part 3
In the last three posts I talked about batch analysis, clustering and applying these techniques to APT sampleset.
Batch processing is a step necessary for retrieving ‘clusterable’ data from samples in an automated fashion.
Clustering is a way of putting these samples into buckets, potentially grouping them into some families.
I want to see if w/o using any assumption/knowledge (retrieved from the white paper or other blogs) it is possible to cluster these samples in a reliable way. It is an interesting experiment and I am curious if I will ever get closer to already known clusters. Quite frankly, I don’t know yet. We shall see.
The clustering I have done so far was focused on dynamic analysis and a little bit on the source code analysis. In this post I will exploit code analysis further – this time focusing on disassembled .asm files generated as usual by the IDA Pro.
The resulting assembly code is quite nice for parsing as each line contains only one line of code – this allows to group the code into blocks on function boundaries and for each call to API or to another subroutine (including calls via registers), we can extract a simplified code of the program procedures e.g.
sub_401000 proc near ; CODE XREF: _main+20Ap
[...]
lea ecx, [esp+310h+szLongPath]
push 104h ; nSize
push ecx ; lpFilename
push 0 ; hModule
call ds:GetModuleFileNameA
lea edx, [esp+310h+szLongPath]
push 104h ; cchBuffer
lea eax, [esp+314h+szLongPath]
push edx ; lpszShortPath
push eax ; lpszLongPath
call ds:GetShortPathNameA
lea ecx, [esp+310h+Parameters]
push offset String2 ; "/c del "
push ecx ; lpString1
call ds:lstrcpyA
mov esi, ds:lstrcatA
lea edx, [esp+310h+szLongPath]
lea eax, [esp+310h+Parameters]
push edx ; lpString2
push eax ; lpString1
call esi ; lstrcatA
lea ecx, [esp+310h+Parameters]
push offset s->>>nul ; " >>NUL"
push ecx ; lpString1
call esi ; lstrcatA
mov esi, ds:ShellExecuteA
push 0 ; nShowCmd
push offset Directory ; lpDirectory
lea edx, [esp+318h+File]
push offset Parameters ; "/c del wuauclt.exe"
push edx ; lpFile
push offset Operation ; "open"
push 0 ; hwnd
call esi ; ShellExecuteA
push 0 ; nShowCmd
push offset Directory ; lpDirectory
lea eax, [esp+318h+File]
push offset s->CDelSvchost_exe ; "/c del svchost.exe"
push eax ; lpFile
push offset Operation ; "open"
push 0 ; hwnd
call esi ; ShellExecuteA
[...]
retnsub_401000 endp
becomes
GetModuleFileNameA
GetShortPathNameA
lstrcpyA
lstrcatA
lstrcatA
ShellExecuteA
ShellExecuteA
ShellExecuteA
and can be written as a single line of code
GetModuleFileNameA|GetShortPathNameA|lstrcpyA|lstrcatA|lstrcatA|ShellExecuteA|ShellExecuteA|ShellExecuteA
Applying such methodology on procedure boundaries and to each disassembled program I eventually came up with a shortened and flattened source code of each sample. I then built a histogram of the most common sequences of such code blocks across all the source code from all files and got the following stats:
5514 |sub
2507 |sub|sub
1332 |sub|sub|sub
860 |sub|sub|sub|sub
558 |__security_check_cookie(x)
479 |__security_check_cookie(x)|__security_check_cookie(x)
475 |sub|sub|sub|sub|sub
392 |sub|sub|sub|sub|sub|sub
353 |operator delete(void *)
276 |sub|operator delete(void *)
269 |sub|sub|sub|sub|sub|sub|sub
235 |sub|sub|sub|sub|sub|sub|sub|sub
185 |sub|sub|sub|sub|sub|sub|sub|sub|sub
168 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
165 |__alloca_probe|sub|sub
137 |eax
132 |sub|sub|ecx
132 |__alloca_probe|sub
130 |_atexit
123 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
110 |_chkstk|sub|sub
108 |strlen|operator delete(void *)|operator new(uint)|strcpy
106 |nullsub
106 |__alloca_probe
101 |_chkstk|sub
97 |eax|sub
92 |__alloca_probe|sub|sub|sub|sub
91 |__alloca_probe|sub|sub|sub
88 |_chkstk|sub|sub|sub
88 |__alloca_probe|sub|sub|sub|sub|sub|sub
85 |__alloca_probe|sub|sub|sub|sub|sub
80 |exception const &)
75 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
73 |strlen
73 |_chkstk|sub|sub|sub|sub|sub
72 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
71 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
71 |_Tidy(bool,uint)
69 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
68 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
68 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub
68 |InternetCloseHandle|InternetCloseHandle|InternetCloseHandle
67 |sub|eax
63 |_chkstk|sub|sub|sub|sub|sub|sub
62 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
62 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub
61 |free
60 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
60 |allocator<char>>(char const *)|_atexit
59 |sub|_CxxThrowException(x,x)
56 |_CxxThrowException
56 |InternetReadFile
55 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
55 |_chkstk
55 |SetUnhandledExceptionFilter
52 |operator new(uint)|exception(char const * const &)|_CxxThrowException(x,x)
52 |operator delete(void *)|_CxxThrowException(x,x)
52 |_flsall
51 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
51 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub
50 |_chkstk|sub|sub|sub|sub
49 |j_free
48 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
47 |sub|sub|_CxxThrowException(x,x)
47 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
45 |sub|sub|sub|sub|eax
44 |strchr|strchr
44 |malloc|sub|sub|free
43 |dword ptr [ecx+8]
42 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
40 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
40 |sub|_Split(void)|_wmemmove|sub|_Eos(uint)|_Split(void)|_Tidy(bool)|sub
40 |operator delete(void *)|operator delete(void *)
40 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub
40 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
40 |ReadFile|_memcpy_0
39 |sub|_CxxThrowException
39 |GetModuleFileNameA|GetShortPathNameA|GetEnvironmentVariableA|lstrcpyA|lstrcatA|lstrcatA|GetCurrentProcess|SetPriorityClass|GetCurrentThread|SetThreadPriority|ShellExecuteExA|SetPriorityClass|SetProcessPriorityBoost|SHChangeNotify|GetCurrentProcess|SetPriorityClass|GetCurrentThread|SetThreadPriority
38 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
38 |_chkstk|sub|sub|sub|sub|sub|sub|sub
37 |GetCurrentProcess|OpenProcessToken|LookupPrivilegeValueA|AdjustTokenPrivileges|CloseHandle|GetLastError
36 |sub|sub|dword ptr [eax]|sub|sub|sub
36 |sub|ecx
36 |dword ptr [ecx+4]
36 |_memset|sub|__security_check_cookie(x)
35 |sub|sub|__security_check_cookie
35 |sub|operator delete(void *)|operator delete(void *)|operator delete(void *)|operator delete(void *)
35 |__invalid_parameter_noinfo
34 |operator new(uint)
34 |_free
34 |_LocaleUpdate(localeinfo_struct *)|___strgtold12_l|sub|__security_check_cookie(x)
33 |sub|sub|eax|sub
33 |sub|operator delete(void *)|operator delete(void *)
33 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
33 |__errno|__invalid_parameter
32 |operator delete(void *)|operator new(uint)
32 |memset
31 |operator new(uint)|sub
31 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
30 |eax|sub|sub|sub|sub
30 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
30 |__EH_prolog|_Tidy(bool)|_strlen|sub|sub|_CxxThrowException(x,x)
30 |SetServiceStatus
28 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
27 |sub|_Split(void)|_memcpy|sub|_Eos(uint)|_Split(void)|_Tidy(bool)|sub
27 |strlen|sub
27 |memcpy
27 |_strcmpi|memset|memset|CreateToolhelp32Snapshot|Process32First|sprintf|strcat|Process32Next|CloseHandle|_strcmpi|OpenSCManagerA|EnumServicesStatusExA|operator new(uint)|CloseServiceHandle|strcat|EnumServicesStatusExA|sprintf|strcat|operator delete(void *)|CloseServiceHandle|_strcmpi|GetLogicalDrives|sprintf|strcat|sprintf|strcat|lstrcatA|GetDriveTypeA|strcat|GetVolumeInformationA|strcat|strcat|sprintf|strcat
27 |_strcmpi|atoi|OpenProcess|TerminateProcess|CloseHandle|strcat|_strcmpi|OpenSCManagerA|OpenServiceA|GetLastError|strcat|CloseServiceHandle|ControlService|GetLastError|strcat|CloseServiceHandle|CloseServiceHandle
27 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub
27 |GetProcAddress
27 |GetExitCodeProcess|PeekNamedPipe|Sleep|ReadFile|CloseHandle|CloseHandle|memset|strcpy|strlen
26 |sub|sub|sub|sub|_memcpy_s
26 |sub|eax|sub|eax|sub
26 |sub|_Tidy(bool)|_Tidy(bool)|sub
26 |strstr|strchr|operator new(uint)|strchr|strchr|strchr|strchr|strchr|strchr|strchr|strchr|strchr|operator delete(void *)
26 |strlen|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
26 |sprintf|HttpAddRequestHeadersA|HttpSendRequestA|GetLastError|InternetQueryOptionA|InternetSetOptionA|sprintf
26 |__ld12cvt
26 |___strgtold12|sub
26 |__EH_prolog3|sub|sub|_CxxThrowException(x,x)
26 |InternetOpenA|InternetSetOptionA|InternetSetOptionA|InternetSetOptionA|InternetConnectA|HttpOpenRequestA|strlen|HttpAddRequestHeadersA
26 |$+5
25 |rand
25 |malloc|CreatePipe|CreatePipe|CloseHandle|CloseHandle|CloseHandle|CloseHandle|free|sub|CloseHandle|CloseHandle
25 |_chkstk|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
25 |__invalid_parameter_noinfo|__invalid_parameter_noinfo
25 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
25 |URLDownloadToFileA|strcat
24 |sub|sub|sub|sub|sub|GetProcAddress|sub|sub|sub
24 |sub|edx|sub
24 |sub|_Split(void)|_wmemmove|sub|_Eos(uint)|_Split(void)|sub|sub
24 |shutdown|closesocket
24 |send
24 |fopen|fseek|fread|fseek|ftell|fseek|fread|fclose|fclose|fread|fclose|sub
24 |edx
24 |dword ptr [eax+40h]
24 |_beginthreadex|CloseHandle
24 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
24 |GetModuleHandleA|GetProcAddress
23 |unknown_libname_1
23 |sub|sub|sub|sub|operator delete(void *)
23 |sub|OpenProcess|TerminateProcess|Sleep|CloseHandle|sub
23 |strlen|CreateFileA|strlen|operator new(uint)|memset|WriteConsoleInputA|operator delete(void *)|CloseHandle
23 |strcat|sub|WaitForSingleObject|strcat|strcat|strlen|sub
23 |j_free|j_free
23 |j_free|_CxxThrowException
23 |LoadStringA|sub
23 |CloseHandle
22 |~type_info(void)|operator delete(void *)
22 |sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
22 |sub|operator new(unsigned __int64)|exception(char const * const &)|_CxxThrowException|sub|sub|j_free
22 |operator new(uint)|operator new(uint)|sub
22 |operator new(uint)|operator delete(void *)
22 |operator delete(void *)|operator delete(void *)|operator delete(void *)
22 |exception(char const * const &)
22 |eax|sub|sub|sub
22 |GetCurrentProcess|GetCurrentProcess|DuplicateHandle|CreateProcessA|CloseHandle
22 |CompareStringA
22 |$+5|sub|sub
21 |sub|_wcslen|sub|sub|sub|sub
21 |sprintf|sprintf|sub
21 |malloc|recv|sub|sub|_strnicmp|WriteFile|recv|free|ExitThread|SetEvent|free|ExitThread
21 |malloc|PeekNamedPipe|ReadFile|sub|sub|_itoa|send|sub|Sleep|PeekNamedPipe|free|ExitThread
21 |_strcmpi|memset|CreateProcessA|strcat|CloseHandle|_strcmpi|OpenSCManagerA|strcat|OpenServiceA|GetLastError|strcat|CloseServiceHandle|StartServiceA|GetLastError|strcat|CloseServiceHandle|CloseHandle
21 |__get_sse2_info
21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
21 |__alloca_probe|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub|sub
21 |GetCurrentProcess|OpenProcess|GetLastError|sprintf|strcat|OpenProcessToken|memset|sprintf|CreateProcessAsUserA|strcat|CloseHandle|CloseHandle|GetLastError|sprintf|strcat|CloseHandle|GetLastError|sprintf|strcat|CloseHandle
21 |CreateEventA|CreateEventA|sub|WaitForSingleObject|CloseHandle
21 |$+5|sub
Using these shortened procedures for cluster generations gives some promising results e.g.:
sub
DeleteFileW
DeleteFileA
1328eaceb140a3863951d18661b097af.asm
31e5e58dbdfad05175613e795298ebb5.asm
6f9992c486195edcf0bf2f6ee6c3ec74.asm
c99fa835350aa9e2427ce69323b061a9.asm
e476e4a24f8b4ff4c8a0b260aa35fc9f.asm
ea1b44094ae4d8e2b63a1771a3e61fd5.asm
fc1937c1aa536b3744ebdfb1716fd54d.asm
LoadLibraryA
GetProcAddress
GetProcAddress
GetProcAddress
3f8682ab074a097ebbaadbf26dfff560.asm
4b19a2a6d40a5825e868c6ef25ae445e.asm
54d5d171a482278cc8eacf08d9175fd7.asm
56de2854ef64d869b5df7af5e4effe3e.asm
75dad1ccabae8adeb5bae899d0c630f8.asm
8462a62f13f92c34e4b89a7d13a185ad.asm
htons
socket
connect
closesocket
468ff2c12cffc7e5b2fe0ee6bb3b239e.asm
727a6800991eead454e53e8af164a99c.asm
bd8b082b7711bc980252f988bb0ca936.asm
db05df0498b59b42a8e493cf3c10c578.asm
e1b6940985a23e5639450f8391820655.asm
ecx
eax
dword ptr [esi+10h]
sub
ecx
eax
sub
sub
sub
sub
sub
sub
sub
sub
12f25ce81596aeb19e75cc7ef08f3a38.asm
268eef019bf65b2987e945afaf29643f.asm
468ff2c12cffc7e5b2fe0ee6bb3b239e.asm
4c6bddcca2695d6202df38708e14fc7e.asm
5a728cb9ce56763dccb32b5298d0f050.asm
727a6800991eead454e53e8af164a99c.asm
8e8622c393d7e832d39e620ead5d3b49.asm
bd8b082b7711bc980252f988bb0ca936.asm
c6a4bb1a4e4f69ec71855d70d6960859.asm
db05df0498b59b42a8e493cf3c10c578.asm
e1b6940985a23e5639450f8391820655.asm
ef8e0fb20e7228c7492ccdc59d87c690.asm
LoadLibraryA
GetProcAddress
sub
sub
strstr
strchr
GetSystemDirectoryA
time
srand
malloc
sub
sub
strncmp
Sleep
sub
Sleep
sub
Sleep
CreatePipe
CreatePipe
GetStartupInfoA
CreateProcessA
GetLastError
_snprintf
sub
CreateProcessA
CreateThread
CreateThread
WaitForMultipleObjects
GetExitCodeThread
TerminateThread
GetExitCodeThread
TerminateThread
GetExitCodeProcess
TerminateProcess
sub
sub
GetLastError
_snprintf
sub
CloseHandle
CloseHandle
CloseHandle
CloseHandle
sub
sub
Sleep
PeekNamedPipe
ReadFile
sub
0dd3677594632ce270bcf8af94819caf.asm
270d42f292105951ee81e4085ea45054.asm
523f56515221161579ee6090c962e5b1.asm
Notably, the disassembled code – after some selective processing and normalization – can be treated in a same way as student source code submissions for their assessments at uni and… be checked for plagiarism. The most common technique used for this purpose relies on measuring the cosine similarity. I am currently playing with it and will write more about my findings in another post.
Thanks for reading!