HexDive 0.6 – new strings and more -Context…

Update

I have received a question from Pedro about the APIs that are commonly used by keyloggers which I mentioned in a context of one of the screenshots; The APIs I had in mind were MonitorFromPoint and GetMonitorInfoA (used for taking screenshots on multiple monitors) and a few others that can be seen on both screenshot and inside the example_hdive_qC.txt file; this was an ambiguous statement for a few reasons (APIs can be part of a clean framework or unit/module, keylogger is not an infostealer, etc.), so I am clarifying it for the future reader;

Last, but not least – obviously the only way to confirm that any APIs highlighted by HexDive are used for malicious purposes is by doing more in-depth analysis – the only thing HexDive does is identification of APIs and strings of interest for the malware analyst 🙂

Old post

New version is 25% larger (what a bloatware! :)) as it brings in a huge number of new strings:

  • PE Section names and other packer identifiers
  • Installer-related strings
  • Identifiers of script-to-exe type tools e.g. perl2exe, py2exe, exerb, winbatch
  • Lots of known CLSID strings

It slowly gets to the point where I wanted it to be when I started writing it. I also think I finally got it right on how to present the data extracted from a file in a way that:

  • shows as many interesting strings as possible
  • makes it as readable as possible
  • with all that it still provides information about the string’s context
  • allows to quickly find the string in a hex editor
  • in a full-output mode allows for an easy parsing
  • avoid missing strings as much as possible

So, with all that said, the new contextual output is introduced in this version.

It works the same way as the old version -c, but it removes keywords and duplicated lines from output (not perfectly, but good enough). I must mention here that contextual output requires a wide screen (terminal at least 120 columns), but I hope if you do malware analysis you have this available 🙂  (feel free to let me know if you need a more narrower output, so I can accommodate that in a future version).

The new contextual output option is available as capitalized -c i.e. -C – You can run it in many ways, e.g.

hdive -C
hdive -aC
hdive -afC

See example below and as usual, I would be grateful if you let me know if it works for you or if you spot issues.

Example Session

This is a sample of a new malware, downloaded quite recently.

Running hdive on it first:

hdive -C // note capital letter

 

The file is UPXd, and we can see some Borland strings (Boolean/False/True/Char/etc.).

We can unpack it using upx.exe

upx -d test\sample.exe -o test\sample.exe.unpacked

…and then run hdive again:

hdive -qC test\sample.exe.unpacked

Now it looks much better and it’s definitely Borland.

Scrolling down we can see lots of juicy info – APIs that are commonly used by keyloggers,

then going further, we can see winsock functions and strings, as well as Delphi components (units) listed as well together with ‘username’, ‘password’:

and finally lots of HTTP-related strings, as well as another unit-name from Borland:

There are more interesting strings there – you can see output of the command by viewing all the attached text files; read on.

Out of curiosity, I compared the output of the following commands:

  • strings -q -n 6 // this is usually a good length allowing removing a lot of junk
  • hdive -q
  • hdive -qC

on the very same sample and then compared the file sizes and number of lines in each file.

These are the results:

dir example_*
2012-10-19  01:24            17,185 example_hdive_q.txt
2012-10-19  01:24            61,364 example_hdive_qC.txt
2012-10-19  01:24            58,199 example_strings_qn6.txt

wc -l example*   1336 example_hdive_q.txt    529 example_hdive_qC.txt   3777 example_strings_qn6.tx

It would seem (and mind you, it is a very subjective statement :)) that hdive can be quite a time saver! Instead of reviewing over 3.5K, you end up doing 35% of it and immediately getting juicy keywords and their context (this can be of course still improved).

You can download the files here:

  • examples:

Enjoy!

HexDive 0.5 – Adding a bit of a context…

It’s time for a new version of HexDive!

Today’s changes introduce many new keywords and some new features + bug fixes:

Keywords:

  • Delphi package/library/unit names (I posted some subset of this list previously)
  • Compiler-related strings (not that really useful for malware analysis, but may help to identify the compiler-specific strings)
  • Copyright banners (I posted some previously)
  • Registry key/value names (also posted some previously)
  • More information stealing-related strings (some more software targeted by infostealers, including some old ones e.g. The Bat, ICQ, AOL, etc.)
  • Game-related strings (to highlight malware targeting various computer games)
  • A lot new generic malware strings (from the top of the histogram of all strings extracted from 1M+ samples); many of these strings are not categorized yet, but still – better to have them being picked up than wait for a classification to be complete 🙂 – use -a option to see what ‘juicy’ stuff is being picked up

New features:

  • The output produced by -a option now includes physical offsets and may include context (see next point)
  • I added a new experimental feature that shows context of the strings – basically, some bytes before and after the string in a file; this should help to quickly assess what’s the potential usefulness of the string and its context; it may also help to find other strings that are not picked up by HexDive for various reasons and that are stored inside the file within a close proximity of a found string. To see context, use a new command line options ‘-c’. See example below to see how it works in practice and how to use it to quickly locate strings of interest in a hex viewer.

Bug fixes:

  • sometimes some strings were not picked up due to a bug in the processing algorithm; this affected strings that were using mixed lower/uppercase; should be fixed now; note: this bugfix introduces a side-effect that makes the output a bit noisier (e.g. New, NEW, NeW are being picked up; I may introduce some filtering of the output if it becomes an issue)
  • sometimes some strings were printed twice – should be fixed now
  • strings were not picked up  at the end of the file – should be fixed now

You can download current version of HexDive here.

If your .exe download is blocked, you can try a zip file.

Example of strings with a context

When ran with -c option, HexDive shows a string with a context:

At the moment, it shows a string in one row, then in a next row the actual context of the string and finally 10 hexadecimal values

that you can copy and paste into a Search/Find in your favorite Hex Viewer

to quickly locate the string of interest and it’s context without worrying about Unicode/ANSI/non-printable values: