HAPI – API extractor

Note

This software has been discontinued. Please use HexDive (it has all HAPI features plus lots more).

Also, check our other tools.

Old post

In one of my previous posts (Extracting Strings from PE sections), I demonstrated (ya… right, what a big word)  how easy it is to extract sections of PE file into separate files using 7-Zip so that they can be later used for targeted strings analysis. As I mentioned, splitting a file into sections can be really useful as it helps to reduce a number of random string-alike non-strings we see in the output of  ‘strings’ type of tools. Just to be on a secure side though – you may want to refer to my original post to find out more about caveats of such approach as there are cases when it may not be such a good idea.

There are many other techniques that can help in noise reduction and I am going to demonstrate one more today.

Analyzing Portable Executable (PE) files usually kicks off with running multiple static analysis tools including ‘strings’ and other tools that can help in determining what APIs are being used by a sample. One can use tools like PEDump, LordPE, PETools, Stud_PE, Dependency Walker, and lots of others that process sample’s import/export tables and help guessing what specific functionality is embedded in the sample.

Now, before we proceed further – three warnings here.

  • You should never, ever conclude your malware analysis with the output of ‘strings’, or PE parsing tools. This is a first step to shooting yourself in the foot. Always do code analysis. I will come back to this topic in the future in a separate post.
  • Ensure you actually know how these PE tools work. I know I don’t need to say this, but I have seen once a person using the Dependency Walker tool and analyzing malicious file by looking at the full list of functions exported from one of the Operating System DLLs. The DLL has not even been linked directly to a malware and was referenced only by a DLL that was directly linked to malicious .exe. In other words, the sample.exe was linked to kernel32.dll, kernel32.dll links to ntdll.dll. The guy was looking at the pane listing all functions exported by ntdll.dll. And while he was right that ntdll.dll does contain a lot of APIs used typically by malware, he was completely off the track! Oh, boy…
  • Obviously, APIs can often be found outside the import table since many packers, protectors, wrappers move them from import tables to internal data structures – they are often visible only when the memory of the protected process is dumped to a file; thus, none of typical PE parsing tools can ‘see’ them

 

So, now back to the original topic.

One simple noise reduction technique that is well known and used by many analysts is based on lists of patterns; these can be keywords, ANSI or Unicode strings, regular expressions, and practically speaking – any string of bytes that is unique and can be helpful in identifying interesting stuff inside the samples. This technique is used to some extent by projects like Yara, PEiD, and of course, it is extensively used by antivirus and IDS software. Having a good pattern list that identifies certain class of artifacts inside a file is a very attractive idea and I must confess that I am using such lists myself for a number of years.

After thinking one day on how to improve typical ‘strings’ analysis process I cooked a little program that focuses on one class of such patterns – APIs.

First, I built a list of over 50,000 thousands clean APIs, including:

  • Windows API
  • native APIs
  • kernel mode APIs

All of these are exported and imported by native Windows programs, drivers and DLLs. I combined them together into a large list. I then created a program that uses this list and searches for all of these inside the analyzed binary (note again: I run it most of the time on memory dumps, since many malicious samples come protected).

Yup. It’s that simple.

Now, you may be asking yourself – searching for 10-15 strings using a naive searching method (i.e. walk 10-15 times though the whole data searching for each string, or even using one regular expression) works well, but it is quite probable that for 50,000 and more strings we need to do better.

You are right.

This is a non-trivial problem, and naive algorithm doesn’t work here. Luckily, there are smart people out there who already figured it out.  I looked around and researched various multi-pattern search algorithms – eventually deciding to use a very well-known multi-pattern algorithm – Aho-Corasick. It uses a very clever method of finding patterns by walking a trie anytime new character is fetched from the input, so it can search for a large set of patterns simultaneously (well, it’s more complicated than that, but let’s say it is very fast even for 50k patterns).

Since building the search trie that Aho-Corasick algorithm relies on takes quite some time, I precompiled it and included it directly into an executable. So, here it is – a simple tool that extracts known API names from a given binary.

I hope you will find it useful.

Usage:

hapi <filename>

Download

Example

Used on a random malicious sample, it produces the following results:

————————————————————–
HAPI v0.1 (c) Hexacorn 2012. All rights reserved.
Visit us at https://www.hexacorn.com
————————————————————–
DnsQuery_A
DnsRecordListFree
EnumDeviceDrivers
GetDeviceDriverBaseNameA
UuidToStringW
SRRemoveRestorePoint
SRSetRestorePointA
ConvertStringSidToSidA
GetAdaptersInfo
IsUserAdmin
InternetOpenUrlA
HttpOpenRequestA
InternetCloseHandle
InternetConnectA
InternetOpenA
InternetSetOptionA
InternetQueryOptionA
HttpQueryInfoA
HttpSendRequestA
InternetReadFile
HttpAddRequestHeadersA
memmove
memcmp
_itoa
malloc
free
memset
wcstombs
strtok
mbstowcs
strlen
_itow
srand
rand
memcpy
wcsrchr
tolower
towlower
atoi
strcpy
__dllonexit
_onexit
_XcptFilter
_initterm
_amsg_exit
exit
_adjust_fdiv
lstrlenA
lstrcpyA
lstrcatA
CreateFileA
DeviceIoControl
CloseHandle
GetVersionExA
CreateFileW
WriteFile
FlushFileBuffers
GetFileSize
VirtualAlloc
ReadFile
VirtualFree
CreateThread
GetModuleFileNameW
lstrcpyW
lstrlenW
OpenMutexW
WaitForSingleObject
WaitForMultipleObjects
GetExitCodeThread
SetFilePointer
SetEndOfFile
CreateMutexW
ReleaseMutex
GetModuleFileNameA
DisableThreadLibraryCalls
ExitProcess
LoadLibraryW
Sleep
GetLastError
InitializeCriticalSection
DeleteCriticalSection
EnterCriticalSection
lstrcatW
LeaveCriticalSection
GetCurrentThreadId
TerminateThread
GetSystemTimeAsFileTime
GetProcAddress
GetModuleHandleA
OpenProcess
RaiseException
VirtualAllocEx
WriteProcessMemory
CreateRemoteThread
VirtualFreeEx
CreateToolhelp32Snapshot
Process32First
lstrcmpiA
Process32Next
GetCurrentProcess
FreeLibrary
LoadLibraryA
lstrcmpiW
GetWindowsDirectoryA
GetVolumeInformationA
GetSystemTime
SystemTimeToFileTime
GetTickCount
GetLogicalDriveStringsW
GetDriveTypeW
DeleteFileW
CreateDirectoryW
LocalFree
CreateProcessW
OpenMutexA
OpenEventA
GetCurrentThread
SetFileTime
CreateEventW
TerminateProcess
DeleteFileA
WideCharToMultiByte
HeapAlloc
GetProcessHeap
HeapFree
SetFileAttributesW
InterlockedIncrement
InterlockedDecrement
GetVersion
InterlockedExchange
InterlockedCompareExchange
RtlUnwind
QueryPerformanceCounter
GetCurrentProcessId
UnhandledExceptionFilter
SetUnhandledExceptionFilter
CallNextHookEx
SetWindowsHookExA
PostMessageA
wsprintfA
CharUpperW
GetSystemMetrics
RegQueryValueExW
RegSetValueExW
RegFlushKey
RegCloseKey
RegOpenKeyExW
OpenProcessToken
LookupPrivilegeValueA
AdjustTokenPrivileges
GetTokenInformation
RegCreateKeyExW
SetEntriesInAclA
SetSecurityInfo
DuplicateTokenEx
OpenSCManagerA
OpenServiceA
ControlService
ChangeServiceConfigA
AllocateAndInitializeSid
CheckTokenMembership
FreeSid
InitializeSecurityDescriptor
SetSecurityDescriptorDacl
SetTokenInformation
GetLengthSid
SetThreadToken
IsValidSid
ConvertSidToStringSidW
RegDeleteValueW
RegQueryValueW
RegQueryInfoKeyW
RegEnumKeyExW
RegEnumValueW
RegDeleteKeyW
CloseServiceHandle
QueryServiceConfigA
QueryServiceStatusEx
StartServiceA
SHGetFolderPathW
SHGetFolderPathA
CoCreateInstance
CoInitialize
CoUninitialize
CoCreateGuid
CoTaskMemFree
_except_handler3
_local_unwind2
_CxxThrowException
DllCanUnloadNow
DllGetClassObject

Okay, it’s not random. It’s the same one I used to demonstrate Anti-forensics – live examples 🙂