Clustering and Advanced/In-depth Malware Analysis with HexDive Pro

June 6, 2013 in Batch Analysis, Forensic Analysis, HexDive Pro, Malware Analysis, Software Releases

A few months ago I introduced a new tool called HexDive. The tool speeds up analysis of strings that are extracted from portable executable files (PE). It does it by showing only these strings that are the most relevant from a malware analysis perspective.

Strings extracted directly from a PE file have certainly some value, but it’s limited by many factors including:

  • Compression (code and/or data is decompressed only when program is executed)
  • Encryption (code and/or data is decrypted only when program is executed)
  • Obfuscation (code and/or data are hidden between a lot of junk code and data)
  • Wrapping (code and/or data is hidden deep inside the file and ‘unwrapped’ only when program is executed)
  • Dynamic code loading (code injects, shellcodes that may be hidden using techniques described above)
  • The environment (code and/or data is not a part of the malware itself, but is extracted from the system on which it is executed)
  • The nature of run-time (code and data seen depends on the environment and code branches inside the malware)
  • Anti- tricks (what we see depends heavily on malware’s ability to detect it is running inside the sandbox, or under monitoring tools e.g. debugger)

To address this, HexDive Pro takes analysis to the next level and allows to extract many run-time artifacts produced by a running program.

This includes:

  • API calls and their parameters
  • Hex dumps and Strings extracted from buffers allocated during the run-time (including stack)
  • Code Injects and shellcodes
  • Wrapped code
  • Screenshots of all windows
  • Very specific features of the malware that can help to uniquely identify it
  • and it can do a few other things that I will keep secret at the moment, but will reveal in next posts 🙂

To demonstrate what HexDive Pro can do, all I have to do is to provide a reference to what I posted in last few months.

In fact, most of the clustering, batch analysis and malware analysis posts were heavily influenced by results provided by HexDive Pro. The results the tool provided thus far helped me to:

  • … discover the hidden code inside ZeroAccess
  • … cluster ZeroAccess samples I have in my collection to find out which contain code using Extended Attributes (NTFS) and to create a list of all known EA names used by this malware
  • … cluster APT sampleset in many ways.
  • … instantly discover strings in Flame malware
  • and others, more or less influenced by it (including various statistics)

The results of these experiments helped me a lot to tweak the code so that it is as useful as possible.

On the surface, HexDive Pro is working like a typical API monitor – running malware under its control and using various tricks to intercept traces of its execution. Going deeper, it combines best pieces of Application Monitor, Hex Dive, HMFT, Hstrings and also leverages information from numerous databases of artifacts (both static and dynamic) I gathered over the years of malware analysis.

All of these combined efforts produce a tool that makes it possible to gain an in-depth knowledge about the analyzed malware within 30-180 seconds.

In fact, the APT1 clustering data I posted here has been generated pretty quickly using HexDive Pro. The results posted were just a tip of the iceberg as the output contained all the juice one can extract manually only after hours of painstaking analysis. If you multiply it by a number of samples, the performance gain is tremendous.

Anyone who does malware analysis professionally knows how tedious in-depth analysis can be. Anyone who doesn’t, is forced to rely on writeups written by the antivirus companies, peers’ help and search engines.

With HexDive Pro you will be able to often learn more about malware than you can read online, you will be also able to verify what you read in AV writeups. On occasion, the tool will also miserably fail which could mean that you have stumbled upon a new trick  to inject code, new trick to escape tracing, or new 0day that helps the malware to run free. Or there may be a bug.

Such is a life of software like this 🙂

Last, but not least – the audience for the tool are:

  • Forensic investigators who don’t have malware analysis skills.
  • Beginners and intermediate level malware analysts.
  • Anyone who wants to do batch analysis and clustering of their samplesets.
  • Anyone who wants to analyze not only malware, but any Windows software (32-bit only); the tool provides in-depth look into internals working of the software applications and may be useful in security/vulnerability assessments.
  • Hardcore malware analysts may benefit from the tool as well, but they probably already have adequate or better private tools on their own.

I have tested it extensively and since it’s a private tool that evolved from a few API monitors I wrote in the past, as well as many other tools/scripts I have written and finally my own experience doing in-depth malware analysis I have a hope it will be useful for the community.

The first version is coming soon. Stay tuned!

Note: The software will be available commercially only.

Some more examples

The following artifacts are extracted instantly:

  • List of API extracted during run-time:
    • Gets Procedure Address: WS2_32.dll, accept
    • Gets Procedure Address: WS2_32.dll, bind
    • Gets Procedure Address: WS2_32.dll, closesocket
    • Gets Procedure Address: WS2_32.dll, connect
    • Gets Procedure Address: WS2_32.dll, getpeername
    • Gets Procedure Address: WS2_32.dll, getsockname
    • Gets Procedure Address: WS2_32.dll, getsockopt
  • User agents used by malware
  • Information about stealing capabilities of malware (e.g. targeted applications)
  • Files that malware tries to find on the system (e.g. to actually run)
  • Various tricks to escape analysis/HIPS
  • Various tricks to detect monitoring tools
  • Access to PhysicalDevices (memory, drives) – usually bypassing HIPS and infecting MBR
  • Buffers (read/written files, read/written memory, etc.)
Injected/wrapped .exe
4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 - MZ.............. 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ 
0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 - ........!..L.!Th 
74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 - t be run in DOS  
D7 52 82 ED 93 33 EC BE 93 33 EC BE 93 33 EC BE - .R...3...3...3.. 
10 3B B0 BE 92 33 EC BE 1D 3B B3 BE 97 33 EC BE - .;...3...;...3.. 
52 69 63 68 93 33 EC BE 00 00 00 00 00 00 00 00 - Rich.3..........
50 45 00 00 4C 01 06 00 01 A6 4A 46 00 00 00 00 - PE..L.....JF....
B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 - ........@.......
00 00 00 00 00 00 00 00 00 00 00 00 E0 00 00 00 - ................
69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F - is program canno
6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 - mode....$.......
10 3B B1 BE 94 33 EC BE 93 33 ED BE 8A 33 EC BE - .;...3...3...3..
10 3B B2 BE 92 33 EC BE 10 3B B6 BE 92 33 EC BE - .;...3...;...3..
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................
00 00 00 00 E0 00 02 21 0B 01 05 0C 00 90 00 00 - .......!........
                                               
MBR code
33 C0 8E D0 BC 00 7C FB 50 07 50 1F FC BE 1B 7C - 3.....|.P.P....|
38 6E 00 7C 09 75 13 83 C5 10 E2 F4 CD 18 8B F5 - 8n.|.u..........
F0 AC 3C 00 74 FC BB 07 00 B4 0E CD 10 EB F2 88 - ..<.t...........
80 7E 04 0C 74 05 A0 B6 07 75 D2 80 46 02 06 83 - .~..t....u..F...
BC 81 3E FE 7D 55 AA 74 0B 80 7E 10 00 74 C8 A0 - ..>.}U.t..~..t..
00 B4 08 CD 13 72 23 8A C1 24 3F 98 8A DE 8A FC - .....r#..$?.....
0A 77 23 72 05 39 46 08 73 1C B8 01 02 BB 00 7C - .w#r.9F.s......|
BF 1B 06 50 57 B9 E5 01 F3 A4 CB BD BE 07 B1 04 - ...PW...........
83 C6 10 49 74 19 38 2C 74 F6 A0 B5 07 B4 07 8B - ...It.8,t.......
4E 10 E8 46 00 73 2A FE 46 10 80 7E 04 0B 74 0B - N..F.s*.F..~..t.
46 08 06 83 56 0A 00 E8 21 00 73 05 A0 B6 07 EB - F...V...!.s.....
B7 07 EB A9 8B FC 1E 57 8B F5 CB BF 05 00 8A 56 - .......W.......V
43 F7 E3 8B D1 86 D6 B1 06 D2 EE 42 F7 E2 39 56 - C..........B..9V
8B 4E 02 8B 56 00 CD 13 73 51 4F 74 4E 32 E4 8A - .N..V...sQOtN2..