IDA, function alignment and signatures that don’t work…

More and more malware is compiled with the more recent Visual Studio versions and often as 64-bit portable executables. It’s a commonly known fact that the official flirt signatures may not be yet available for some of these libraries. To address this, I often compile my own sigs based on available SDK and VS libraries. I have done it a few times before and since I recently came across a standard function that my libs didn’t recognize I decided to build yet another sig file.

A standard, routine task.

I quickly identified the version of VC the portable executable was built with, got the appropriate libcmt.lib, built the .pat file, confirmed the signature is present and matches ‘my’ unrecognized function, and compiled the final .sig file.

To my surprise, the sigs didn’t work and despite my efforts I couldn’t make them work. I eventually asked Hex-Rays for support and in the end they provided a detailed explanation as they identified the root cause of the issue: the alignment bytes (Thanks to Ilfak for help).

To explain what it is, you have to look at the following example of the memset function:

The code is an excerpt from a memset .obj file inside the libcmt.lib.

You can immediately notice that there is a sequence of bytes prefixed with CC (int 3) at the top of the file.

When you create a .pat signature file for it it will look like this:

CCCCCCCCCCCC66660F1F840000000000488BC14983F80872530FB6D249B90101
DA B01D 00FA :0010 memset :003E@ mset10 :004E@ mset20 :0060@ mset30
:006C@ mset40 :0071@ mset50 :007B@ mset60 :0087@ mset70 :0090@

mset80 :00C0@ mset90

As you can see the signature includes the alignment bytes (these few CCs at the front of the sig).

If you now create a .sig file from such a .pat file you will get a signature file that will not work for many static occurrences of memset. If you run IDA with the -z4 option you may get messages stating that the function was skipped (‘skip func’).

The reason for this behavior is that the alignment is present not only in the .obj files, but also inside the portable executable files.

As such, you may come across a code sequence like this (inside a sample):

The actual memset function is prefixed with an alignment added by a compiler (but different than the one inside the .obj file), and this particular alignment sequence was already recognized by IDA – it has been properly named and wrapped up. However, this wrapped-up alignment overlaps with the full code of the actual function (remember from the .pat file that it has to be prefixed with that CC sequence representing the alignment inside the .obj file!). So, as a result of this overlap, the flirt signature will fail to recognize the memset function.

Let’s look at the binary one more time:

This is how memset is ‘remembered’ by the .sig/.pat files (from the .obj file):

And this is how it is present inside the sample – the highlighted part is the actual alignment that IDA already recognized and wrapped up for this particular sample:

– that wrapped-up alignment basically ‘stole’ a few bytes that would normally be part of the ‘remembered’ alignment of the memset function recognizable by the .sig.

There are two solutions at least:

  • create signatures w/o alignment bytes (need a fix to the IDA pcf.exe tool)
  • undefine the alignments done by IDA

The first one may be addressed in the future versions of IDA. The second option is actually very easy – if you come across a similar situation consider running the below script first. It’s a quick & ugly hack that removes the alignments that IDA adds automatically. Once these are removed, the sigs should work (unless the issue is completely different, of course).

import idaapi
import idautils

for s in Segments():
    segname = str(idc.SegName(s)).rstrip('\x00')
    print "Segment %s" % segname
    i = idc.SegStart(s)

    while i<idc.SegEnd(s):
       b = Byte (i)
       if b==0xCC:
          a = GetDisasm(i)
          if a.startswith('align'):
             print "%08lX: %s, %x" % (i, a,b)
             MakeUnkn(i,0)
       i=i+1
print "Done"

Why decompiling LUA scripts doesn’t work all the time…

In one of my posts this year I presented a bunch of decompiled LUA scripts associated with FLAME malware. The scripts were decompiled using the Lua decompiler – and since the decompilation process is non-trivial – it brings us closer to the subject of this post – how to work with the tool that doesn’t work all the time.

First of all, the Lua Decompiler is only available as a source code and you need to compile it. This can be quite a big obstacle.

I won’t go into details on how to compile it, but will mention that on a plain vanilla Ubuntu ISO (v16.0) it worked like a charm, but only after updating the environment with the developers’ tools and fixing a few things here and there (think: 2h of research and work at least). Most of the required steps require to install additional (missing packages). If you never compiled open source stuff  you are in for a big fun and lots of googling (think: 4-8h of your life 😉

Secondly, the compiled LUA scripts are a pain in the neck.

Why?

They store the size of various types in the header of the compiled LUA script. These types affect the way decompiler works.

Yes, you hear that right.

To decompile the byte-coded LUA script you need a version of Lua Decomiler that _matches_ the settings inside the header of a compiled LUA script!

The below is a fragment of Lua Decompiler code that refers to this – the header of Lua compiled script is not fixed and it depends on the actual architecture of the CPU and compiler settings:

/*
* make header
*/
void luaU_header (char* h)
{
 int x=1;
 memcpy(h,LUA_SIGNATURE,sizeof(LUA_SIGNATURE)-1);
 h+=sizeof(LUA_SIGNATURE)-1;
 *h++=(char)LUAC_VERSION;
 *h++=(char)LUAC_FORMAT;
 *h++=(char)*(char*)&x;                /* endianness */
 *h++=(char)sizeof(int);
 *h++=(char)sizeof(size_t);
 *h++=(char)sizeof(Instruction);
 *h++=(char)sizeof(lua_Number);
 *h++=(char)(((lua_Number)0.5)==0);        /* is lua_Number integral? */
}

An example of one of the Flame files (the header) is shown below:

You can quickly decipher that most of the structures are 4-bytes long i.e. 32-bit – as such you need a 32-bit version of LuaDec compiled for this particular version of compiled bytecode. In my tests I actually compiled various versions of LuaDec and preserved them for further use.

That’s it.

The best advice I can give you is to get the Luadec yourself and either compile it on a system with the architectural settings that match your compiled *.lua files, or tweak the compiler settings for Luadec to achieve the same result (I am not claiming this is possible as I have not tried it).

I am not sure why Lua scripts are compiled this way, but it’s pretty much nonsensical as it’s not very portable. But if the interpreter for the specific encoded Lua script is incorporated into the final malicious package the devs don’t really need to care – it simply works out of the box for them.

Reversers – as it’s often the case – don’t have it that easy…