Going reverse on reversing tools…

One of the oldest and most popular reversing tools is IDA Pro (usually bundled with its multiple decompilers&plug-ins). Over the years, the creators of this tool introduced a lot of substantial changes to this software, and in parallel, a lot of changes have been introduced to the programming frameworks newer IDA Pro versions rely on (namely, Python).

I bet some old OG IDA Pro users and reversers still remember the time they spent writing their first IDC scripts… and I bet some younger OGs reminisce similar memorable moments and feels about their early idapython scripts based on Python 2.x. Unfortunately, with many changes introduced to Python and IdaPython over the years we now live in a world where multiple parallel IDA Pro universes exist. The world in which many very useful plug-ins, scripts that just… used to ‘work’… today fail to work and they do so miserably…

Over the last 10 years or so, I’ve spent a substantial amount of time trying to port many of these older IDA Python-based plug-ins/scripts to that ‘latest, newest version’ of IDAPython du jour. It actually takes a lot of time, and it’s mainly because I am not the best person to be tasked with this upgrading task, plus the frequent changes on so many fronts are really hard to keep up with, so… in many of these past instances where I actually tried, I often did end up just throwing a towel in the end… It’s just not worth it, and I must say here that I have arrived at this sad conclusion on more than just one occasion…

BUT I STILL LOVE SOME OF THESE OLD PLUG-INS!

Then one day it hit me.

The answer to all our IDA Plugin code incompatibility problems is… keeping multiple IDA Pro versions installed at the same time! And then, using an appropriate IDA Pro version for which these plug-ins or scripts were created for – run them, collect their output and… incorporate this output into our ‘working’ IDA database, typically created by ‘the latest and greatest’ version of IDA.

It may sound stupid, but it actually does work quite well!

Today I temporarily install older IDA Pro versions on my test malware box on regular basis — they are often a few years old, totally obsolete, but they offer one important feature to me – they still run that specific old plug-in/script code for me!

How does it work in practice?

In generic terms, you just install the old version of IDA, open the sample you are working on in that old IDA version, and then you run the actual code (plug-in or script) that this particular IDA Pro version supports. After that, you export the IDC script from the database (File -> Produce File -> Dump database to IDC file). Then you edit that exported IDC script to only call out to the functions or snippets of code that introduce changes you want… Then you import it into your ‘current’/’working’ IDA database….

And yeah, at first sight, this exported IDC script may look messy, but it’s easy to navigate, plus we can quickly notice that we can comment out all the unnecessary function calls inside its main function, and then we just focus on the functions we really want to execute – some of them adding structures, enums, naming locations, adding comments, etc.

The main function of a typical IDA-exported IDC script looks like this:

static main(void)
{
        // set 'loading idc file' mode
        set_inf_attr(INF_GENFLAGS, INFFL_LOADIDC|get_inf_attr(INF_GENFLAGS));
        GenInfo();            // various settings
        Segments();           // segmentation
        Enums();              // enumerations
        Structures();         // structure types
        ApplyStrucTInfos();   // structure type infos
        Patches();            // manual patches
        SegRegs();            // segment register values
        Bytes();              // individual bytes (code,data)
        Functions();          // function definitions
        // clear 'loading idc file' mode
        set_inf_attr(INF_GENFLAGS, ~INFFL_LOADIDC&get_inf_attr(INF_GENFLAGS));
}

Many of these high-level functions include call outs to similarly named, second-level functions – it’s actually really easy to follow and edit them. For example, if the main function calls out a Bytes function it’s most likely we will see it making call outs to multiple functions prefixed with the word Bytes:

static Bytes(void) {
	Bytes_0();
	Bytes_1();
	Bytes_2();
	Bytes_3();
	Bytes_4();
        end_type_updating(UTP_STRUCT);
}

You can quickly eyeball all of these functions’ bodies and decide which code/function call to comment out…

It usually takes less than 5-10 minutes to do so, and as a result you cherry-pick the exact metadata you want to import into your ‘working’ database (usually the one created with the latest available version of IDA Pro).

In the last few years I have used this approach many times, often during time-sensitive malware analysis engagements, and am happy to report that it does work quite well.

Aka: don’t fight the system, use it.

The other avenue to pursue here is to introduce subtle, cosmetic modifications to the old plug-ins’ code that you can modify to generate a code that is ‘compatible’ with the most up-to-date version of Ida and its IdaPython modules.

We can do it, because the output of many reversing IDA Python scrips is pretty predictable:

  • new names
  • new labels
  • new comments
  • list of offsets for code/data patching and actual patches
  • extracted / decrypted configs, strings
  • etc.

It’s actually very easy to add small code snippets to the old plug-ins that will generate a precise list of instructions encoded using the latest version of IdaPython, then save them into a temporary IdaPython script file that is compatible with the latest and greatest version of IDA Python. Such dynamically generated code can be then executed within a context of the database opened with the latest version of Ida Pro. Easy Peasy.

The bottom line is this:

  • make multiple versions of your reverse engineering tools work for you

Hunting for Windows API prototypes and descriptions…

Over the years I have made a lot of attempts to systematically extract Windows API information from various sources, but primarily, of course, from Microsoft help documentation available at different times, in different forms and file formats. If you need to ask… I really needed an ‘actionable’ dump of these for my API monitor, and I also wanted to have it all available for quick & dirty reference, for both coding and reversing purposes. Plus, as I will explain later, for other purposes. Unsurprisingly, this strange journey ended up being closely aligned with the never-ending changes to Microsoft help system, and it naturally ended up with me fighting a ‘lost by default’, bitter battle against the odds, for many years…

~20 years ago win32.hlp was THE file you needed and wanted. It included descriptions of many Windows API functions and was a gold mine when it came to understanding the myriads of parameters, return values, and context required to use most of these popular Windows APIs properly. Interestingly, one could decompile the content of that .hlp file into a super large RTF file. The result was a bit difficult to parse, but lots of textual data could be made accessible this way, kinda programmatically, and kinda easily.

HLP files were the WinHelp files. Microsoft Help system 1.0.

Next, if I remember correctly, some of the Microsoft DDKs started including .chm files. One could decompile these to get access to raw, yet kinda uniformly formatted HTML files, and these could be parsed, as well. I don’t recall this format really taking off too much though but I may be mistaken.

CHM files were the Microsoft Compiled HTML Help files. Microsoft Help system 1.x.

Then came the HxS files. I loved them very much, because these were JUICY. Decompiling them was not difficult, and as a result you would get lots of very nicely formatted data files for parsing. I think it was also the first time XML was used for windows API help, but again, I may be mistaken. I don’t have many of my working files left from these times, sadly.

HxS files were the Microsoft Help 2 files. Microsoft Help system 2.0.

And then the Help files migrated one more time. This time to a local, online system…

http://127.0.0.1:47873/help/<version>/ms.help...

The address above was where all the juice was stored. By sending a set of additional requests one could enumerate all the pages, one by one, and many of these covered functions, methods, structures, etc… These could be then saved and parsed. Interestingly, while requesting all of these pages we were able to choose the format of the delivered pages, and XML was both a novelty at that time, and something we also wanted very much! That’s probably for the first time ever, the Windows API information was stored, and was made accessible in such consistent and parsable format to everyone who _knew_!

It was Microsoft Help Viewer aka Microsoft Help system 3.x.

Today API help is no longer that interesting (okay, it is a lie!), but thankfully, it is stored primarily online. Interestingly, after all the different formats from the past, it is now stored in a Markdown language format (*.md).

Now, the main reason I am writing about the history of help files is to bring your attention to msdocsviewer. This is a new IDA plugin written by Alexander Hanel. Once you install this plug-in, all you have to do is go to any Windows API referenced in a code you analyze in IDA and then press CTRL+SHIFT+Z. The panel with all the information about that ‘highlighted’ API will pop-up. You can dock that panel and then continue pressing CTRL+SHIFT+Z on other API functions to their see details as you go along. In my eyes, as of 2023, this is the best Windows API helper that has ever been written. Idascope was cool, Mandiant’s plug-in was cool, but now we have msdocsviewer and it’s TRULY COOL. It works like a charm and I highly recommend it.

I will end this post with a few data dumps.

You may think this is the end of the post, but it’s not. If you look at the file content of 2013_apis.zip/list_final8 you will notice one thing: not only I extracted function information that is typically available (a prototype), but I also tried to extract information about all the constants this or that particular function’s parameter or argument would refer to, hence f.ex. for CreateFile I would generate this information:

TITLE=CreateFile function
  FUN=CreateFile
    ARG=_In_      LPCTSTR lpFileName,
    ARG=_In_      DWORD dwDesiredAccess,
    ARG=_In_      DWORD dwShareMode,
    ARG=_In_opt_  LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    ARG=_In_      DWORD dwCreationDisposition,
    ARG=_In_      DWORD dwFlagsAndAttributes,
    ARG=_In_opt_  HANDLE hTemplateFile
    RET=HANDLE WINAPI
      PAR=lpFileName  [in]
      PAR=dwDesiredAccess  [in]
      PAR=dwShareMode  [in]
      VALUES=
           VAL=0
           VAL=FILE_SHARE_DELETE
           VAL=FILE_SHARE_READ
           VAL=FILE_SHARE_WRITE
      PAR=lpSecurityAttributes  [in, optional]
      PAR=dwCreationDisposition  [in]
      VALUES=
           VAL=CREATE_ALWAYS
           VAL=CREATE_NEW
           VAL=OPEN_ALWAYS
           VAL=OPEN_EXISTING
           VAL=TRUNCATE_EXISTING
      PAR=dwFlagsAndAttributes  [in]
      PAR=hTemplateFile  [in, optional]
    LIB=Kernel32.lib
    DLL=Kernel32.dll
    HDR=FileAPI.h (include Windows.h); WinBase.h on Windows Server 2008 R2, Windows 7, Windows Server 2008, Windows Vista, Windows Server 2003, and Windows XP (include Windows.h)
    UNI=CreateFileW
    ANS=CreateFileA
    MINC=Windows XP
    MINS=Windows Server 2003

Do you see where it is heading?

Yes, I was writing all these parsers with one thing in my mind. If I can not only use this information to build a list of APIs, their arguments, their in/out properties, but ALSO reference constants they refer to, or expect, then I may be in a position to generate stubs for handling some of the hooked APIs in my sandbox that almost (with minor edits) can give me ‘string’ representations of immediate values, or boolean masks for most APIs!

And it worked! It was a HUGE helper at that time as I could just generate these stubs, edit them a bit, and within minutes I would be in a position to support yet another API w/o going through a painful process of analysing documentation of each API individually. And by ‘handling’ I mean adding code that was showing both decimal/hexadecimal values passed to, or returned by a function, but also showing their string equivalents, where applicable, as well. Of course I had to correct some of these automatically generated stubs, but it was far easier than doing everything from the scratch, for each and every API I wanted to hook.

And when I worked on my Frida monitor, I used the very same principle, hence some of the code covers constants pretty well. In my eyes, a good sandbox is the one that understands both arguments and result values well, and then present them to the user to better contextualize what is actually happening…