Analyzing files starting with the ‘MZ’ magic value can be called a “daily bread” for reverse engineers. The reason for this is pretty simple – if you look at the top of your average executable file you will notice that majority of them start with these 2 magic letters. Since it’s the most common file format that malware analysts work with, in this post I will have a deeper (but still high-level) look at files of this type.
There are so many types of executables starting with ‘MZ’ that looking at the first 2 bytes is often not enough. In fact, there are so many various flavors of MZ files, that it’s pretty hard to list them all, but let’s try anyway:
- 16-bit, 32-bit and 64-bit executables
- PC and mobile executables
- x32, x64, IA64, AMD64, etc.
- .NET
- Executables for Windows 3.1 and Windows 9x/NT ( ‘NE’ vs. ‘PE’)
- Drivers for Windows 3.1/Windows 9x and Windows NT ( ‘LE’ vs. ‘PE’)
- GUI applications and console applications
- User mode executables (processes, services – usually saved as files with the .exe, .scr, .cpl extension) and Dynamically Loaded Libraries (saved as files with .dll extension; others are saved as .ocx, .vbx, etc.)
- User mode executables (processes) and services (service processes)
- Kernel mode drivers (.sys, .drv) and kernel mode libraries (also saved with a .sys file extension)
- Standard DLLs and COM DLLs (e.g. ActiveX, Browser Helper Objects)
- Standard DLLs and Service DLLs (loaded by svchost.exe)
- Dedicated DLL files (e.g. LSP, Shell extensions, deskbands, Plugins, MSGINA, windows hooks, etc.)
- Old-school standalone executables (‘DOS type’)
- Files produced by various compilers: Microsoft Visual Studio, Borland Delphi, Visual Basic, mingw32, gcc and many more.
- Files produced by various script compilers e.g. perl2exe, py2exe, php2exe, AutoIt, WinBatch, etc.
- Installers e.g. Nullsoft, InnoSetup, Wise, Vyse, etc.,
- Resource-only files e.g. fonts
- Executables with overlays
- Executables with appended data
- …
From malware analysis point of view, we have to also include another categorization as well, which is very much related to “extra” file properties often added by malware authors, including:
- compression (packing)
- encryption
- wrapping
- obfuscation
- protection
- corruption
- virtualization
- misleading information
- anti-techniques
Finally, we can use as a classifier the presence and the content of the following metadata:
- Rich header
- Number of Sections
- Characteristics of Sections (writable, readable, executable, etc.)
- Characteristics of Import and export table
- Debugging information (including timestamps and paths to .PDB files)
- Resources information
- Digital signatures
- Appended data
- Compiler specific information e.g. debug information, or PACKAGEINFO for Delphi application
It is super high-level, but as you may guess, analyzing any single executable listed on this list requires completely different approach.
Update #1:
fixed a mistake related to NE/PE – NE files have been replaced by PE files on 32-bit Windows; thx to Imaginative (one of the best reversers I know) for picking it up 🙂
Update #2:
Just to clarify: NE files still run on Win XP + this file format is being used to store .fon files (Thx Ange @ corkami.com – he is one of the best binary magicians out there!)