Clustering and Advanced/In-depth Malware Analysis with HexDive Pro

A few months ago I introduced a new tool called HexDive. The tool speeds up analysis of strings that are extracted from portable executable files (PE). It does it by showing only these strings that are the most relevant from a malware analysis perspective.

Strings extracted directly from a PE file have certainly some value, but it’s limited by many factors including:

  • Compression (code and/or data is decompressed only when program is executed)
  • Encryption (code and/or data is decrypted only when program is executed)
  • Obfuscation (code and/or data are hidden between a lot of junk code and data)
  • Wrapping (code and/or data is hidden deep inside the file and ‘unwrapped’ only when program is executed)
  • Dynamic code loading (code injects, shellcodes that may be hidden using techniques described above)
  • The environment (code and/or data is not a part of the malware itself, but is extracted from the system on which it is executed)
  • The nature of run-time (code and data seen depends on the environment and code branches inside the malware)
  • Anti- tricks (what we see depends heavily on malware’s ability to detect it is running inside the sandbox, or under monitoring tools e.g. debugger)

To address this, HexDive Pro takes analysis to the next level and allows to extract many run-time artifacts produced by a running program.

This includes:

  • API calls and their parameters
  • Hex dumps and Strings extracted from buffers allocated during the run-time (including stack)
  • Code Injects and shellcodes
  • Wrapped code
  • Screenshots of all windows
  • Very specific features of the malware that can help to uniquely identify it
  • and it can do a few other things that I will keep secret at the moment, but will reveal in next posts 🙂

To demonstrate what HexDive Pro can do, all I have to do is to provide a reference to what I posted in last few months.

In fact, most of the clustering, batch analysis and malware analysis posts were heavily influenced by results provided by HexDive Pro. The results the tool provided thus far helped me to:

  • … discover the hidden code inside ZeroAccess
  • … cluster ZeroAccess samples I have in my collection to find out which contain code using Extended Attributes (NTFS) and to create a list of all known EA names used by this malware
  • … cluster APT sampleset in many ways.
  • … instantly discover strings in Flame malware
  • and others, more or less influenced by it (including various statistics)

The results of these experiments helped me a lot to tweak the code so that it is as useful as possible.

On the surface, HexDive Pro is working like a typical API monitor – running malware under its control and using various tricks to intercept traces of its execution. Going deeper, it combines best pieces of Application Monitor, Hex Dive, HMFT, Hstrings and also leverages information from numerous databases of artifacts (both static and dynamic) I gathered over the years of malware analysis.

All of these combined efforts produce a tool that makes it possible to gain an in-depth knowledge about the analyzed malware within 30-180 seconds.

In fact, the APT1 clustering data I posted here has been generated pretty quickly using HexDive Pro. The results posted were just a tip of the iceberg as the output contained all the juice one can extract manually only after hours of painstaking analysis. If you multiply it by a number of samples, the performance gain is tremendous.

Anyone who does malware analysis professionally knows how tedious in-depth analysis can be. Anyone who doesn’t, is forced to rely on writeups written by the antivirus companies, peers’ help and search engines.

With HexDive Pro you will be able to often learn more about malware than you can read online, you will be also able to verify what you read in AV writeups. On occasion, the tool will also miserably fail which could mean that you have stumbled upon a new trick  to inject code, new trick to escape tracing, or new 0day that helps the malware to run free. Or there may be a bug.

Such is a life of software like this 🙂

Last, but not least – the audience for the tool are:

  • Forensic investigators who don’t have malware analysis skills.
  • Beginners and intermediate level malware analysts.
  • Anyone who wants to do batch analysis and clustering of their samplesets.
  • Anyone who wants to analyze not only malware, but any Windows software (32-bit only); the tool provides in-depth look into internals working of the software applications and may be useful in security/vulnerability assessments.
  • Hardcore malware analysts may benefit from the tool as well, but they probably already have adequate or better private tools on their own.

I have tested it extensively and since it’s a private tool that evolved from a few API monitors I wrote in the past, as well as many other tools/scripts I have written and finally my own experience doing in-depth malware analysis I have a hope it will be useful for the community.

The first version is coming soon. Stay tuned!

Note: The software will be available commercially only.

Some more examples

The following artifacts are extracted instantly:

  • List of API extracted during run-time:
    • Gets Procedure Address: WS2_32.dll, accept
    • Gets Procedure Address: WS2_32.dll, bind
    • Gets Procedure Address: WS2_32.dll, closesocket
    • Gets Procedure Address: WS2_32.dll, connect
    • Gets Procedure Address: WS2_32.dll, getpeername
    • Gets Procedure Address: WS2_32.dll, getsockname
    • Gets Procedure Address: WS2_32.dll, getsockopt
  • User agents used by malware
  • Information about stealing capabilities of malware (e.g. targeted applications)
  • Files that malware tries to find on the system (e.g. to actually run)
  • Various tricks to escape analysis/HIPS
  • Various tricks to detect monitoring tools
  • Access to PhysicalDevices (memory, drives) – usually bypassing HIPS and infecting MBR
  • Buffers (read/written files, read/written memory, etc.)
Injected/wrapped .exe
4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 - MZ.............. 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ 
0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 - ........!..L.!Th 
74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 - t be run in DOS  
D7 52 82 ED 93 33 EC BE 93 33 EC BE 93 33 EC BE - .R...3...3...3.. 
10 3B B0 BE 92 33 EC BE 1D 3B B3 BE 97 33 EC BE - .;...3...;...3.. 
52 69 63 68 93 33 EC BE 00 00 00 00 00 00 00 00 - Rich.3..........
50 45 00 00 4C 01 06 00 01 A6 4A 46 00 00 00 00 - PE..L.....JF....
B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 - ........@.......
00 00 00 00 00 00 00 00 00 00 00 00 E0 00 00 00 - ................
69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F - is program canno
6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 - mode....$.......
10 3B B1 BE 94 33 EC BE 93 33 ED BE 8A 33 EC BE - .;...3...3...3..
10 3B B2 BE 92 33 EC BE 10 3B B6 BE 92 33 EC BE - .;...3...;...3..
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................
00 00 00 00 E0 00 02 21 0B 01 05 0C 00 90 00 00 - .......!........
                                               
MBR code
33 C0 8E D0 BC 00 7C FB 50 07 50 1F FC BE 1B 7C - 3.....|.P.P....|
38 6E 00 7C 09 75 13 83 C5 10 E2 F4 CD 18 8B F5 - 8n.|.u..........
F0 AC 3C 00 74 FC BB 07 00 B4 0E CD 10 EB F2 88 - ..<.t...........
80 7E 04 0C 74 05 A0 B6 07 75 D2 80 46 02 06 83 - .~..t....u..F...
BC 81 3E FE 7D 55 AA 74 0B 80 7E 10 00 74 C8 A0 - ..>.}U.t..~..t..
00 B4 08 CD 13 72 23 8A C1 24 3F 98 8A DE 8A FC - .....r#..$?.....
0A 77 23 72 05 39 46 08 73 1C B8 01 02 BB 00 7C - .w#r.9F.s......|
BF 1B 06 50 57 B9 E5 01 F3 A4 CB BD BE 07 B1 04 - ...PW...........
83 C6 10 49 74 19 38 2C 74 F6 A0 B5 07 B4 07 8B - ...It.8,t.......
4E 10 E8 46 00 73 2A FE 46 10 80 7E 04 0B 74 0B - N..F.s*.F..~..t.
46 08 06 83 56 0A 00 E8 21 00 73 05 A0 B6 07 EB - F...V...!.s.....
B7 07 EB A9 8B FC 1E 57 8B F5 CB BF 05 00 8A 56 - .......W.......V
43 F7 E3 8B D1 86 D6 B1 06 D2 EE 42 F7 E2 39 56 - C..........B..9V
8B 4E 02 8B 56 00 CD 13 73 51 4F 74 4E 32 E4 8A - .N..V...sQOtN2..

UVWATAUAVAWH – Meet The Pushy String

The title of this post is not a secret message and I am not intoxicated.

UVWATAUAVAWH happens to be the most popular string extracted from all .exe, .dll and .sys OS files on my 64-bit Windows. The string is so popular and at the same time suspicious that if you google it you will find people theorizing about it having something to do with BSODs / being a part of some internal ZeroAccess secret language.

If you convert the characters into hex:

UVWATAUAVAWH

you will get a string of bytes like these:

55 56 57 41 54 41 55 41 56 41 57 48

and these can be also represented as opcodes:

U  - push    rbp
V  - push    rsi
W  - push    rdi
AT - push    r12
AU - push    r13
AV - push    r14
AW - push    r15
H  - part of sub rsp, xxx opcode

The sequence is a very typical prologue for functions  (64-bit code) – so typical that it is all over the place together with its variants (see below); the ‘vowelized’ properties of these strings remind me an interesting paper about shellcodes that look like English text.

UVWATAUAVAWH
WATAUH
WATAUAVAWH
SUVWATAUAVAWH
SUVWATH
VWATAUAVH
SUVWATAUH
ATAUAVH
USVWATAUAVAWH
UVWATAUH
SUVWATAUAVH
SVWATAUAVAWH
USVWATH
USVWATAUH
USVWATAUAVH
VWATAUAVAWH
WAVAWH
ATAUAVAWH
VWATAUAWH
WATAVH
UVWATAUAVH