tl; dr; reduces time needed for strings review by extracting selected strings from analyzed samples omitting lots of junk seen in an output of a typical strings tool + as a bonus gets these strings classified
HexDive is a new toy of mine. I liked the way HAPI worked, but always planned to write something a bit smarter than just exporting known APIs from the analyzed files. HAPI was actually a first test of the idea that I had for a very long time, yet my ongoing research has not been completed by the time I wrote it. What I wanted to write was a tool that generates output that can immediately give an analyst a power to classify file functionality on the spot. This may also help automation that can be driven by cherry picked known-strings from the analyzed file. It may (and hopefully will) help a lot with batch analysis.
Existing, similar projects exist of course, but their databases are very small. More advanced projects are usually private (AV companies use them). In order to do it right, a large database of good malware-related and good keywords is needed. This can’t be obtained easily as there are literally tones of samples and each contains lots of strings. So, one needs to be selective and decide what strings exported from a sample or a memory dump are the good ones (or bad ones). Often, dynamic analysis is needed with a process instrumentation helping in picking up interesting stuff. This is tough and it took me over a year of collecting different artifacts from 250000 unique samples as well as taking notes from various places on the web or my own system. My notes file contains now lots of data and I am slowly working through it. And just to be clear, the data I am looking for are not file names of known malware, but the stuff that is common amongst malware files – registry keys, etc.
I am finishing the testing and there is a lot of work of updating precompiled foriests of tries (no, it’s not a typo :), but am already happy to present an excerpt from the output from the first beta version. First public version of a tool will be published within a week or so.
-------------------------------------------------------------- hexdive v0.1 (c) Hexacorn 2012. All rights reserved. Visit us at https://www.hexacorn.com --------------------------------------------------------------
A|ACL|Privileges|SeDebugPrivilege A|Environment variable|User Profile|%USERPROFILE% A|Directory|Program Files directory (32-bit)|Program Files A|Interesting keywords|-|Explorer A|api|generic|RtlAnsiStringToUnicodeString A|anti-routine|process name|avp.exe A|ACL|Privileges|SeDebugPrivilege A|Interesting keywords|-|Userinit A|IRC|-|PING A|IRC|-|PONG A|IRC|-|JOIN A|Placeholder|IP|%d.%d.%d.%d A|Environment variable|Date|%date% A|File Extension|-|.com U|anti-routine|process name|avp.exe U|File Extension|-|.exe U|Interesting keywords|-|desktop.ini