February | 2012 | Hexacorn

Running typical ‘strings’ tools over a suspicious file provides lots of useful information.The output typically provides an immediate clue what the file’s purpose is e.g. is it a text file, binary file, what is its file format, character encoding, is it compressed, what APIs , file names and URLs it is referring to and so on and so forth. If you are lucky, you may sometimes get a visual output as well e.g. an ASCII art as it is in a case of well-known web shell r57.

RUStrings0

Now, the problem with ‘strings’ tools is that they are usually monolingual. They extract English strings in ANSI and Unicode format, but forget about other languages. That is, they are unable to recognize strings that are non-English. Of course, it is non-trivial to write a tool that will recognize strings in a few dozens of languages, as they all use various types of character encodings and each character can occupy not only a single byte, but in many cases multiple bytes.

RUStrings.pl is a simple perl script that tries to address this issue and while it focuses only on Russian strings, it can be relatively easily extended to cover other languages. The strings it extracts include

ANSI
Unicode
4 different Russian character encodings

The output will contain Cyrillic characters and has to be viewed with a proper program supporting various character encodings.

Compare the following:

obtained via ‘strings’:

RUStrings1

and

via ‘RUStrings’:

RUStrings2
In case you are wondering what tool I am using to preview these – it is Total Commander’s built-in Lister viewer – it has a very cool feature that allows changing the character encoding on the spot making Cyryllic (and others) characters ‘visible’.

Download

RUStrings.pl

One of the first things we do when we analyze malware is strings extraction. This is a good approach, but there is a problem – neither Sysinternals’ strings nor UNiX/cygwin version provide an ability to extract strings from a specific PE section. Being able to extract strings this way may be handy. It may simplify static analysis and even more importantly, it helps to avoid noise coming from bad strings. Examples of bad strings are sequences of machine instructions coming from a code section that are interpreted as actual strings. The same goes for ‘strings’ from resource section. This part of file often contain bitmaps, icons and other data that often holds a lot of data that ‘looks’ like strings. We may not want to see these in the output.

So, having an ability to extract strings from each section separately would be certainly helpful. There are many way to do so – if you like to code, you can write your own script.

Or…You can just use a simple method presented below.

It turns out that 7zip has an ability to extract sections from PE files. It is available from both GUI and command line. GUI is option is straightforward, as per the command line, use the following:

“c:\Program Files\7-Zip\7zG.exe” x <filename> -osections

Example for Notepad.exe is shown below. Note that 7zip also extracts resources into a subdirectory – another handy feature.

We can now extract strings from .text section only:

Note:

There are executables for which extracting strings from specific sections won’t help and may even make you miss something or draw wrong conclusions; these include Borland applications (code and data is mangled together), position-independent code (shellcodes, viruses, code injects), etc.

Hexacorn

Hexacorn

Monthly Archives: February 2012

RUStrings – extracting Russian strings from files

Extracting Strings from PE sections