Clustering and Advanced/In-depth Malware Analysis with HexDive Pro

A few months ago I introduced a new tool called HexDive. The tool speeds up analysis of strings that are extracted from portable executable files (PE). It does it by showing only these strings that are the most relevant from a malware analysis perspective.

Strings extracted directly from a PE file have certainly some value, but it’s limited by many factors including:

  • Compression (code and/or data is decompressed only when program is executed)
  • Encryption (code and/or data is decrypted only when program is executed)
  • Obfuscation (code and/or data are hidden between a lot of junk code and data)
  • Wrapping (code and/or data is hidden deep inside the file and ‘unwrapped’ only when program is executed)
  • Dynamic code loading (code injects, shellcodes that may be hidden using techniques described above)
  • The environment (code and/or data is not a part of the malware itself, but is extracted from the system on which it is executed)
  • The nature of run-time (code and data seen depends on the environment and code branches inside the malware)
  • Anti- tricks (what we see depends heavily on malware’s ability to detect it is running inside the sandbox, or under monitoring tools e.g. debugger)

To address this, HexDive Pro takes analysis to the next level and allows to extract many run-time artifacts produced by a running program.

This includes:

  • API calls and their parameters
  • Hex dumps and Strings extracted from buffers allocated during the run-time (including stack)
  • Code Injects and shellcodes
  • Wrapped code
  • Screenshots of all windows
  • Very specific features of the malware that can help to uniquely identify it
  • and it can do a few other things that I will keep secret at the moment, but will reveal in next posts 🙂

To demonstrate what HexDive Pro can do, all I have to do is to provide a reference to what I posted in last few months.

In fact, most of the clustering, batch analysis and malware analysis posts were heavily influenced by results provided by HexDive Pro. The results the tool provided thus far helped me to:

  • … discover the hidden code inside ZeroAccess
  • … cluster ZeroAccess samples I have in my collection to find out which contain code using Extended Attributes (NTFS) and to create a list of all known EA names used by this malware
  • … cluster APT sampleset in many ways.
  • … instantly discover strings in Flame malware
  • and others, more or less influenced by it (including various statistics)

The results of these experiments helped me a lot to tweak the code so that it is as useful as possible.

On the surface, HexDive Pro is working like a typical API monitor – running malware under its control and using various tricks to intercept traces of its execution. Going deeper, it combines best pieces of Application Monitor, Hex Dive, HMFT, Hstrings and also leverages information from numerous databases of artifacts (both static and dynamic) I gathered over the years of malware analysis.

All of these combined efforts produce a tool that makes it possible to gain an in-depth knowledge about the analyzed malware within 30-180 seconds.

In fact, the APT1 clustering data I posted here has been generated pretty quickly using HexDive Pro. The results posted were just a tip of the iceberg as the output contained all the juice one can extract manually only after hours of painstaking analysis. If you multiply it by a number of samples, the performance gain is tremendous.

Anyone who does malware analysis professionally knows how tedious in-depth analysis can be. Anyone who doesn’t, is forced to rely on writeups written by the antivirus companies, peers’ help and search engines.

With HexDive Pro you will be able to often learn more about malware than you can read online, you will be also able to verify what you read in AV writeups. On occasion, the tool will also miserably fail which could mean that you have stumbled upon a new trick  to inject code, new trick to escape tracing, or new 0day that helps the malware to run free. Or there may be a bug.

Such is a life of software like this 🙂

Last, but not least – the audience for the tool are:

  • Forensic investigators who don’t have malware analysis skills.
  • Beginners and intermediate level malware analysts.
  • Anyone who wants to do batch analysis and clustering of their samplesets.
  • Anyone who wants to analyze not only malware, but any Windows software (32-bit only); the tool provides in-depth look into internals working of the software applications and may be useful in security/vulnerability assessments.
  • Hardcore malware analysts may benefit from the tool as well, but they probably already have adequate or better private tools on their own.

I have tested it extensively and since it’s a private tool that evolved from a few API monitors I wrote in the past, as well as many other tools/scripts I have written and finally my own experience doing in-depth malware analysis I have a hope it will be useful for the community.

The first version is coming soon. Stay tuned!

Note: The software will be available commercially only.

Some more examples

The following artifacts are extracted instantly:

  • List of API extracted during run-time:
    • Gets Procedure Address: WS2_32.dll, accept
    • Gets Procedure Address: WS2_32.dll, bind
    • Gets Procedure Address: WS2_32.dll, closesocket
    • Gets Procedure Address: WS2_32.dll, connect
    • Gets Procedure Address: WS2_32.dll, getpeername
    • Gets Procedure Address: WS2_32.dll, getsockname
    • Gets Procedure Address: WS2_32.dll, getsockopt
  • User agents used by malware
  • Information about stealing capabilities of malware (e.g. targeted applications)
  • Files that malware tries to find on the system (e.g. to actually run)
  • Various tricks to escape analysis/HIPS
  • Various tricks to detect monitoring tools
  • Access to PhysicalDevices (memory, drives) – usually bypassing HIPS and infecting MBR
  • Buffers (read/written files, read/written memory, etc.)
Injected/wrapped .exe
4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 - MZ.............. 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ 
0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 - ........!..L.!Th 
74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 - t be run in DOS  
D7 52 82 ED 93 33 EC BE 93 33 EC BE 93 33 EC BE - .R...3...3...3.. 
10 3B B0 BE 92 33 EC BE 1D 3B B3 BE 97 33 EC BE - .;...3...;...3.. 
52 69 63 68 93 33 EC BE 00 00 00 00 00 00 00 00 - Rich.3..........
50 45 00 00 4C 01 06 00 01 A6 4A 46 00 00 00 00 - PE..L.....JF....
B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 - ........@.......
00 00 00 00 00 00 00 00 00 00 00 00 E0 00 00 00 - ................
69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F - is program canno
6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 - mode....$.......
10 3B B1 BE 94 33 EC BE 93 33 ED BE 8A 33 EC BE - .;...3...3...3..
10 3B B2 BE 92 33 EC BE 10 3B B6 BE 92 33 EC BE - .;...3...;...3..
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................
00 00 00 00 E0 00 02 21 0B 01 05 0C 00 90 00 00 - .......!........
                                               
MBR code
33 C0 8E D0 BC 00 7C FB 50 07 50 1F FC BE 1B 7C - 3.....|.P.P....|
38 6E 00 7C 09 75 13 83 C5 10 E2 F4 CD 18 8B F5 - 8n.|.u..........
F0 AC 3C 00 74 FC BB 07 00 B4 0E CD 10 EB F2 88 - ..<.t...........
80 7E 04 0C 74 05 A0 B6 07 75 D2 80 46 02 06 83 - .~..t....u..F...
BC 81 3E FE 7D 55 AA 74 0B 80 7E 10 00 74 C8 A0 - ..>.}U.t..~..t..
00 B4 08 CD 13 72 23 8A C1 24 3F 98 8A DE 8A FC - .....r#..$?.....
0A 77 23 72 05 39 46 08 73 1C B8 01 02 BB 00 7C - .w#r.9F.s......|
BF 1B 06 50 57 B9 E5 01 F3 A4 CB BD BE 07 B1 04 - ...PW...........
83 C6 10 49 74 19 38 2C 74 F6 A0 B5 07 B4 07 8B - ...It.8,t.......
4E 10 E8 46 00 73 2A FE 46 10 80 7E 04 0B 74 0B - N..F.s*.F..~..t.
46 08 06 83 56 0A 00 E8 21 00 73 05 A0 B6 07 EB - F...V...!.s.....
B7 07 EB A9 8B FC 1E 57 8B F5 CB BF 05 00 8A 56 - .......W.......V
43 F7 E3 8B D1 86 D6 B1 06 D2 EE 42 F7 E2 39 56 - C..........B..9V
8B 4E 02 8B 56 00 CD 13 73 51 4F 74 4E 32 E4 8A - .N..V...sQOtN2..

The Hades haz you. Phantom (유령) – The DFIR drama from South Korea

The way the movies portray hacking, forensics, security research and coding is obviously metaphoric and usually made as visually rich as possible to ensure the audience ‘gets it’ and as a bonus can see how cool the process is. Anyone who spent a few sleepless nights with Olly and Ida Pro, worked around the clock on forensic cases, reviewed vulnerability reports or source code, or worked in their head on a particular algorithm for a few weeks before actually sitting down and writing the code knows that the reality is a bit more boring 🙂

If you ask a random security pro what are ‘the best’ hacking movies they will surely laugh pointing out at least a few from the following list:

..and perhaps at some stage they will suddenly become a bit more serious and mention that ‘but Matrix did show NMAP in action’.

Luckily, there are actually movies out there that beat all the above mentioned productions in terms of technical accuracy, and show a relatively realistic representation of  IT security work.

This post is about one of them.

A while ago I happened to stumble upon a Korean TV Drama called “Phantom” (also know as “Ghost“) that made my jaws drop. The drama was produced by a Korean Network SBS.

The plot of the drama is simple – The Hades haz you 🙂

hades

Copyright notice: The picture of Hades logo was taken from the clip on Youtube. The copyright belongs to SBS.

Okay, the plot is a bit more complicated – it’s a “Face off” meet “Jason Bourne” meet CSI.

Or

Evil Hackers from Korea and Hong Kong vs. Forensic guys from Korean Police.

Since it’s not IMDB, just a short note on the movie – I have already described bits of the plot; I don’t want to spoil it so I won’t add more information here. The music is all right. The acting is so so (the lead characters are a little bit too stiff and rarely smile). There are gaps in the story as well, but it’s a TV Drama after all, and it’s Korean so there is lots of melodrama ‘by default’. There is also a very strong product placement, but if this the only way to get funds to make TV dramas then so be it.

Okay, back to ‘technical’ stuff.

What makes this particular TV Drama stand out is the attention to details. While they didn’t completely escape typical Hollywood cliché (computers with the evidence are thrown out of the window, logic bombs with a progress bar, etc.) the makers really did their homework and put quite an effort to demonstrate how a typical hacking works. And how forensic guys investigate it.

Lots of scenes are taken in the forensic lab, or on the crime scene – in internet coffee shops, data centers, etc.. We also witness the actual data acquisition, evidence analysis (HDD, mobile, CCTV footage, video manipulation analysis, social media, Event Logs) and most importantly – lots of popular DFIR/RCE software is used to ‘understand’ the data and code. This is really not just a single random tool or a hand made HTML page that is supposed to look like ‘analysis results’. Quite the opposite – many of the most common tools from the DFIR/RCE/pentesting arsenal somehow found its way to the drama.

The software I remember seeing includes:

  • Encase
  • WinHex
  • Metasploit
  • OllyDbg
  • DCode
  • SecureCRT
  • Wireshark
  • XRY
  • BackTrack
  • Process Explorer

and lots more (I wish I took notes!).

Last, but not least – there are also realistic attacks being used as a part of the plot including, but not limited to:

  • 0Day exploits (using documents from Hangul Word Processor)
  • malware infections
  • billboard hacking
  • spoofed emails
  • identity theft
  • SCADA attacks
  • car hacking
  • hacking back in real time
  • DDoS attacks
  • Wi-Fi hacking
  • social engineering

and lo and behold – even STUXNET is mentioned!

Thumbs up South Korea!!!