String analysis for n00bs

I like to demo this little windows executable to everyone who thinks they are doing the reverse engineering bit right, by using available automated static and dynamic analysis tools, and trusting them blindly.

The sample is a PE32 that is 2560 bytes long. Running ‘strings’ over it produces these results:

!This program cannot be run in DOS mode.
Rich
.text
`.rdata
@.data
8/u
ExitProcess
GetCommandLineA
kernel32.dll
GetStdHandle
WriteFile
Hello World!

Running it from a CLI gives us the following text being printed out to the STDOUT:

Hello World!

One can say that both static and dynamic analysis give us the same output. Based on this info it’s kinda obvious to conclude that this small binary is a simple CLI program that prints out ‘Hello World!’ when executed.

Except, only code analysis can help us to determine that the program behaves differently if we pass a ‘/h’ argument to it.

In such case, the dynamic analysis will show that the following string is being printed out to the STDOUT:

Hello Baby!!

Static analysis was done right. Default dynamic analysis was done right. And code analysis was done right too. It’s just the automation that failed.

Just a reminder that we can’t blindly trust the automation, because it only sees the obvious. And command line arguments are not the only way to trigger execution of a different branch of code. It could be a guard rail of any sort: time of the day, locale of the OS, delayed payload, payload downloaded from a site that is not available atm, etc.

in the interest of full disclosure: I have not ‘analyzed’ this sample with any AI framework, so am still hopeful that at least some of them would see through this little mischief.

ZydisInfo – the disassembler that breaks the code, twice

The moment I heard of machine code and its opcodes… I fell in love. Being able to understand machine code from just looking at the binary (okay, mostly its hexadecimal representation) felt like magic. And since many simple x86 assembly instructions are quite easy to decipher, I really liked the fact I could not only ‘read some of the code’ by just looking at binary, but also use that knowledge to patch code here and there, too.

Of course, today everyone knows about nopping code with 0x90, or changing the conditional jumps from 0x74, 0x75 to 0xEB, but back then it was something special. Unfortunately, once you learn the basics, this feeling doesn’t last for too long, because the opcodes got … complicated, and they did so, pretty quickly, too. The FPU, MMX, SSEn, AVXn instructions are not for the faint-hearted, and it takes a lot effort to understand them on a mathematical level, let alone memorizing their opcodes. And on top of that, the new CPUs arrived, bytecode in many different forms is a thing, and on top of that we have code virtualizers, so now it’s really prohibitive to even think of learning any of it… unless you are a dedicated low-level code fan.

Still, even in 2023 it really helps to know some of the most important opcodes, at least in the x86/x64 world. Malware uses many tricks to obfuscate code, use opcodes to enforce incorrect disassembly, or trigger exceptions on undocumented instructions. Patching is also still a thing, and knowing at least a subset of most popular opcodes helps to quickly understand what is going on. For example, if some random routine is looking for some specific byte values that correspond to known opcodes it’s really handy to know some of them to quickly make an educated guess that we are looking at some sort of length disassembler, or a hooking/unhooking routine…

Let’s admit it though – we can’t learn it all, so, it’s time to cheat a bit and then hopefully win some…

Knowing how complicated all of this became, for a long time I dreamed of a tool that takes a series of bytes, interprets it as code, and breaks it down into smaller chunks where the respective parts of the alleged machine instruction are clearly deconstructed, described, and represented; that is, the prefixes, the opcode itself, the operation direction, the size of the argument, the R/M, MOD, REG, SIB, and IMM and DISP parts, etc. and all are extracted and presented in a nice way to the user…

And after thinking of it for a long time I only last week asked about a tool like this…

Thanks to Steve Eckels, we now know that such tool does exist! It’s called Zydisinfo, and It was created by Joel Höner et al (with Florian Bernd creating most of Zydisinfo, as per this twit).

Over last few days I spent some time playing around with Zydisinfo and I am really impressed. This is a fantastic educational tool that many students and assembler lovers will find absolutely delightful to work with.

Let’s see a few examples:

ZydisInfo -64 “90” (NOP)

no surprise here…

ZydisInfo -64 “74 01” (short jump)

no surprise here either…

ZydisInfo -64 “67 8B 04 C1” (mov eax, dword ptr ds:[ecx+eax*8])

a more complicated example and it still works like a charm…

Isn’t that cool?

Joel et al, you really killed it! Touche!