IDA, function alignment and signatures that don’t work…

More and more malware is compiled with the more recent Visual Studio versions and often as 64-bit portable executables. It’s a commonly known fact that the official flirt signatures may not be yet available for some of these libraries. To address this, I often compile my own sigs based on available SDK and VS libraries. I have done it a few times before and since I recently came across a standard function that my libs didn’t recognize I decided to build yet another sig file.

A standard, routine task.

I quickly identified the version of VC the portable executable was built with, got the appropriate libcmt.lib, built the .pat file, confirmed the signature is present and matches ‘my’ unrecognized function, and compiled the final .sig file.

To my surprise, the sigs didn’t work and despite my efforts I couldn’t make them work. I eventually asked Hex-Rays for support and in the end they provided a detailed explanation as they identified the root cause of the issue: the alignment bytes (Thanks to Ilfak for help).

To explain what it is, you have to look at the following example of the memset function:

The code is an excerpt from a memset .obj file inside the libcmt.lib.

You can immediately notice that there is a sequence of bytes prefixed with CC (int 3) at the top of the file.

When you create a .pat signature file for it it will look like this:

CCCCCCCCCCCC66660F1F840000000000488BC14983F80872530FB6D249B90101
DA B01D 00FA :0010 memset :003E@ mset10 :004E@ mset20 :0060@ mset30
:006C@ mset40 :0071@ mset50 :007B@ mset60 :0087@ mset70 :0090@

mset80 :00C0@ mset90

As you can see the signature includes the alignment bytes (these few CCs at the front of the sig).

If you now create a .sig file from such a .pat file you will get a signature file that will not work for many static occurrences of memset. If you run IDA with the -z4 option you may get messages stating that the function was skipped (‘skip func’).

The reason for this behavior is that the alignment is present not only in the .obj files, but also inside the portable executable files.

As such, you may come across a code sequence like this (inside a sample):

The actual memset function is prefixed with an alignment added by a compiler (but different than the one inside the .obj file), and this particular alignment sequence was already recognized by IDA – it has been properly named and wrapped up. However, this wrapped-up alignment overlaps with the full code of the actual function (remember from the .pat file that it has to be prefixed with that CC sequence representing the alignment inside the .obj file!). So, as a result of this overlap, the flirt signature will fail to recognize the memset function.

Let’s look at the binary one more time:

This is how memset is ‘remembered’ by the .sig/.pat files (from the .obj file):

And this is how it is present inside the sample – the highlighted part is the actual alignment that IDA already recognized and wrapped up for this particular sample:

– that wrapped-up alignment basically ‘stole’ a few bytes that would normally be part of the ‘remembered’ alignment of the memset function recognizable by the .sig.

There are two solutions at least:

  • create signatures w/o alignment bytes (need a fix to the IDA pcf.exe tool)
  • undefine the alignments done by IDA

The first one may be addressed in the future versions of IDA. The second option is actually very easy – if you come across a similar situation consider running the below script first. It’s a quick & ugly hack that removes the alignments that IDA adds automatically. Once these are removed, the sigs should work (unless the issue is completely different, of course).

import idaapi
import idautils

for s in Segments():
    segname = str(idc.SegName(s)).rstrip('\x00')
    print "Segment %s" % segname
    i = idc.SegStart(s)

    while i<idc.SegEnd(s):
       b = Byte (i)
       if b==0xCC:
          a = GetDisasm(i)
          if a.startswith('align'):
             print "%08lX: %s, %x" % (i, a,b)
             MakeUnkn(i,0)
       i=i+1
print "Done"

Creating IDT/IDS files for IDA from MS libraries with symbols

In a reversing world it is a regular experience to come across samples that are linked to OS APIs that are imported from well-known libraries. However, on occasion we can come across files that use importing in a slightly different way – they import not via names but via ordinals. A good example are samples linking to MFC libraries.

When loaded into IDA, such samples contain lots of autogenerated function names f.ex. mfc_1234. This is pretty annoying. Of course (and luckily) there exists a lot descriptions and solutions to it – we need an IDT or an IDS file. An IDT (or its compressed version IDS) file is a ‘translator’ between ordinal numbers and actual API names – many of these exist in a default installation package of IDA, but not all… One can generate these by hand – using existing scripts – and in case the MS symbols exist for a given library – one can try to generate these automagically using a simple script I am attaching to this post.

This is the recipe:

  • Ensure your IDA is set up to use symbols from Microsoft
  • Open the MS library you analyze
  • Load its symbols from the MS web site (you are either asked, or they are loaded automatically – depends on your config)
  • When the database is fully loaded and autoanalysis is completed, launch the following script:
import idaapi
import idc
import types
import os

idt = GetIdbPath()

print "Original IDB: %s" % idt

idt = idt.replace('.idb','.idt')
idt = idt.replace('.i64','.idt')

dll = GetInputFile()

print "Saving to %s" % idt

f = open(idt, 'wb')
f.write("0 Name=%s\n" % (dll))
for i in xrange(idaapi.get_entry_qty()):
    fn = idaapi.getn_func(i)
    a = fn.startEA
    if a != BADADDR:
       eo = GetEntryOrdinal(i)
       nm = GetFunctionName(GetEntryPoint(eo))
       #cm = GetFunctionCmt(a,0)
       #print "%x: %0d, %s, %s" %  (a,eo,nm,cm)
       if nm!='':
          f.write("%d Name=%s\n" % (eo,nm))
f.close()
print "done!"
  • Now you should have the IDT file autogenerated in the same directory where the library is f.ex.
    • mfcXYZ.idb
    • mfcXYZ.idt  — this is the IDT file
  • You can now
    • Open sample linking to the MS library via ordinals
    • Load newly created IDT file
    • All mfc_1234 function names should be automatically converted to respective function/method names
  • You can also use zipids.exe to convert IDT file to IDS, but it’s not necessary