IDAPython – making strings decompiler-friendly

Update

As pointed out by 0stracon there is an option in Hexrays that actually enables it to print all strings. Go to Hex-Rays Decompiler Analysis Options and untick ‘Print only constant string literals’.

To make it permanent, enable it in hexrays.cfg:

#define HO_CONST_STRINGS   0x0040   // Only print string literals if they reside
                                    // in read-only memory (e.g. .rodata segment).
                                    // When off, all strings are printed as literals.
                                    // You can override decompiler's decision by
                                    // adding 'const' or 'volatile' to the
                                    // string variable's type declaration
HEXOPTIONS               = 0x....   // Combination of HO_... bits

I was not aware of this option and reinvented the wheel 🙂

Old post

One of the features of IDA is its ability to recognize strings. This is a great feature, especially useful when you combine it with a power of HexRays decompiler – together they can produce a very nice pseudocode.

There is only one annoying bit there: if strings are recognized and defined inside a writable segment, they will not be presented by the decompiler as strings, but as variable names referring to strings.

Let’s have a look at the example.

In the below example (Dexter sample) IDA recognizes the string “UpdateMutex:”

strings_1When we now switch to the decompiler view, we will see that the decompiler changes it to s__Updatemutex:

strings_1a

(the ‘s__’ prefix comes from the string prefix I typically use i.e. ‘s->’ which decompiler ‘flattens’ to ‘s__’). The s__Updatemutex refers to a string as shown below i.e. “UpdateMutex:” :

strings_2Obviously, a  decompiled code that refers to the actual string is much more readable – see the same piece of code as shown above where data is referred to by actual strings:

strings_2aIn order to make the decompiler use these actual strings (not the reference) we have two options:

  • Make the segment where the string is recognized read-only (by disabling ‘Write’ in segment properties):

strings_3Unfortunately, this will confuse the decompiler and will make the output not trustworthy (it is often truncated). You will also receive a friendly reminder that you are doing something stupid 😉 a.k.a. a red card from the decompiler’s authors:

strings_3a

  • The second option is to use a ‘proper’ method of fixing the issue by telling the IDA that the string is a read-only a.k.a. constant i.e. you can change the type of the string from existing one to the one prefixed with a keyword ‘const’:

strings_4Since most of the time strings are static it is handy to convert all the strings in IDA to read-only i.e. retyping all of them using the ‘const’ trick.

This is exactly what the strings_to_const.py script is intended to do.

It enumerates all segments, finds all strings recognized by IDA (note the comment about the prefix I use, you may need to adapt it to your needs), and then converts them to read-only.

The result?

See below – before and after:

strings_before_after

Heaven’s gate and a chameleon code (x86/64)

A so-called heaven’s gate is not only a built-in feature of a 64-bit Windows, but also a neat reversing trick. It can be used (and is) by malware authors to temporarily switch the code execution between 32- (WOW64) and 64-bit long mode. While operating in a 64-bit long mode it executes the 64-bit instructions and this can be used to execute some funny stuff before returning to 32-bit code (f.ex. can be used to detect a debugger).

The trick is very old, many blogs describe how to mix 32- and 64-bit code execution pipelines while using it and that’s why it is a part of the topic I am going to talk about today.

A few years back I was looking at a sample that used the heaven’s gate trick, but apart from this, it also contained another trick – a chameleon code – a stream of bytes that could be executed as both 32- and 64-bit code, depending on the context. I found it to be quite cool and took a mental note of that malware family.

I recently came across a different sample from the same family malware and since its analysis reminded me about that supercool trick, I thought it would be nice to write a post about it.

The sample hash is E4AB5596CB8FBE932670A6A5420E7AB9 (note it is old, from 2013).

Note: Mind you that before it reaches the heaven’s gate/chameleon code it will try to stop you by using a couple of known and lesser-known anti-reversing tricks (there is a number of them, and they are quite creative; I won’t describe it in detail not to spoil the fun in case you want to take a stab at the sample yourself).

The 32-bit code right before jumping far to 64-bit code:

heavensgate1Immediately after the far jump we land in 64-bit code.

heavensgate2Note the offsets of instructions on both screenshots.

Btw. while I am not the biggest fan of windbg for day to day work, its ability to reverse such chameleon code ‘on the fly’ comes really handy.

After some more jumps and calls the code eventually ends in these 2 places (left 32-bit, right 64-bit – 2 different VMs):

heavensgate3We can compare the opcodes and their meanings side by side:

heavensgate4They both execute in their respective modes (32- and 64-).

The inability to distinguish between code and data is a well known fact. Ability to code a program that is binary level-identical and executes flawlessly on two different architectures is a completely different animal.

For what it’s worth – it was written in fasm.