Update

It’s been a while since I wrote anything here. This is due to me being on holidays and moving to a new place right after coming back. I finally settled down in a new apartment and looking forward to play with some new ideas.

So, here is a short update:

  • I fixed a silly bug in HAPI – I mixed up CR & LF characters in the output and it looked awkward to say the least, not to mention potential parsing issues; Thx to Pedro L. for spotting this and notifying me
  • HAPI may occasionally print some strings that look like non-API, e.g. ‘version’; this is not a bug, but a feature 😉 it turns out that there is such an API exported by one of the Microsoft DLLs ; since I don’t want to miss any API, I made a trade off and include all of them; still… I use some little heuristics to prevent printing many of them, but some of them will sometimes go through; so, please always verify the output manually; and for the curious – some Microsoft programmers decided to name certain APIs using one, or two characters; I dunno why do you do stuff like this, but there are legitimate system DLLs exporting functions named ‘u’, ‘vo’, etc.
  • Discovered recently that Symantec’s VBN files can be encrypted not only with 0x5A, but also 0xA5; these files are still handled by DeXRAY since it relies on a XRAYS technique that searches and extracts encrypted executables without needing to know a specific key; but if you parse VBN files yourself, knowing that 0xA5 is being used may help you to save some time

MZ File format flavors & malware

Analyzing files starting with the ‘MZ’ magic value can be called a “daily bread” for reverse engineers. The reason for this is pretty simple – if you look at the top of your average executable file you will notice that majority of them start with these 2 magic letters. Since it’s the most common file format that malware analysts work with, in this post I will have a deeper (but still high-level) look at files of this type.

There are so many types of executables starting with ‘MZ’ that looking at the first 2 bytes is often not enough. In fact, there are so many various flavors of MZ files, that it’s pretty hard to list them all, but let’s try anyway:

  • 16-bit, 32-bit and 64-bit executables
  • PC and mobile executables
  • x32, x64, IA64, AMD64, etc.
  • .NET
  • Executables for Windows 3.1 and Windows 9x/NT ( ‘NE’ vs. ‘PE’)
  • Drivers for Windows 3.1/Windows 9x and Windows NT ( ‘LE’ vs. ‘PE’)
  • GUI applications and console applications
  • User mode executables (processes, services – usually saved as files with the .exe, .scr, .cpl extension) and Dynamically Loaded Libraries (saved as files with .dll extension; others are saved as .ocx, .vbx, etc.)
  • User mode executables (processes) and services (service processes)
  • Kernel mode drivers (.sys, .drv) and kernel mode libraries (also saved with a .sys file extension)
  • Standard DLLs and COM DLLs (e.g. ActiveX, Browser Helper Objects)
  • Standard DLLs and Service DLLs (loaded by svchost.exe)
  • Dedicated DLL files (e.g. LSP, Shell extensions, deskbands, Plugins, MSGINA, windows hooks, etc.)
  • Old-school standalone executables (‘DOS type’)
  • Files produced by various compilers: Microsoft Visual Studio, Borland Delphi, Visual Basic, mingw32, gcc and many more.
  • Files produced by various script compilers e.g. perl2exe, py2exe, php2exe, AutoIt, WinBatch, etc.
  • Installers e.g. Nullsoft, InnoSetup, Wise, Vyse, etc.,
  • Resource-only files e.g. fonts
  • Executables with overlays
  • Executables with appended data

From malware analysis point of view, we have to also include another categorization as well, which is very much related to “extra” file properties often added by malware authors, including:

  • compression (packing)
  • encryption
  • wrapping
  • obfuscation
  • protection
  • corruption
  • virtualization
  • misleading information
  • anti-techniques

Finally, we can use as a classifier the presence and the content of the following metadata:

  • Rich header
  • Number of Sections
  • Characteristics of Sections (writable, readable, executable, etc.)
  • Characteristics of Import and export table
  • Debugging information (including timestamps and paths to .PDB files)
  • Resources information
  • Digital signatures
  • Appended data
  • Compiler specific information e.g. debug information, or PACKAGEINFO for Delphi application

It is super high-level, but as you may guess, analyzing any single executable listed on this list requires completely different approach.

 

Update #1:

fixed a mistake related to NE/PE – NE files have been replaced by PE files on 32-bit Windows; thx to Imaginative (one of the best reversers I know) for picking it up 🙂

Update #2:

Just to clarify: NE files still run on Win XP + this file format is being used to store .fon files (Thx Ange @ corkami.com – he is one of the best binary magicians out there!)