Reversing w/o reversing – how to become Alex in practice, Part 2
April 12, 2019 in Archaeology, Malware Analysis
My post from yesterday was written in a hurry so I didn’t have a chance to cover everything. So, time for the part II.
Okay, let’s start from the old stuff.
The really old stuff
There is a great web site called vetusware.com. It is collecting stuff that is abandonware. When you start searching the page you will get stuck and will spend hours downloading some really esoteric software. There is tones of 16-bit software. There is also lots of 32-bit software from 90s and noughties. There _are_very old SDK, DDK packages there. You do want to download them in case they include descriptions, definitions that have been removed in later versions of SDK/DDK.
You just need to go there and start downloading. The gems you can find include software from early days of Microsoft, Borland, Wordstar, IBM, OS/2 and so so and so forth. This is where it all started. For PC, at least.
The echoes of Int 21h
This one is for Alex. Just kidding – the thing is that before the internet took the shape it has today many coders and reverses relied on just a bunch of knowledge sources.
One of the most important things you wanted to put your hands on back in a day was Ralf Brown’s Interrupt List. This is a nostalgic piece of beauty. It was way ahead of its time and is to date one of the best ever compilations of descriptions of programming interface of tones of APIs. It was a Bible for DOS coders.
The fact these API functions were called or executed via software interrupts doesn’t matter. Ralf collected an impressive collection of knowledge in one piece. There is Microsoft DOS int 21h, int 25h, int 26h, there is VESA for graphic cards, there are low-level int 13h functions of HDD, there are hardware interrupts int 08h, int 09h, there are interrupts internally used by viruses, as well as extensions used by various software, and so on and so forth.
If you ever need a reference for analysing the 16-bit code, the Ralf’s Interrupt List it is.
Echos of early 32-bit coding
Okay, today you have StackOverflow, and everyone programs in QT, .NET, Electron, etc.. Back in a day it was Win32 API, MFC, AFC, Borland, Delphi, Code Gear, and finally Embarcadero. And of course, Alex’s favorite – Visual Basic (I don’t mention Java, because Java people are from a different camp). And people talking about programming either talked on Usenet, IRC, or on web sites like codeproject.com, or codeguru.com.
Many early malware creations were borrowing code from these two sites I mentioned, because the code quality was decent, and most importantly – this was the only place apart from some articles in MSDN, Dr Dobbs, sometimes Usenet where you could find some ‘juice’ back then.
Even today you will find a lot of great articles there. Even if old, they do cover the foundation of many technologies we take today for granted or have already forgotten about e.g.:
- DDE (Dynamic Data Exchange)
- OLE (Object Linking and Embedding)
- COM (Component Object Model)
- MFC (Microsoft Foundation Class), and
- AFC (Application Foundation Classes).
There is also a lot of information and code in pure Windows API – it makes it much more valuable than some easy to digest .NET code that hides a lot of details from you (this is not to say .NET is bad; not at all, and quite the opposite; what you can do with .NET via PowerShell today is absolutely amazing). Still, good to look at the old-school stuff if you want to know how ‘raw’ COM interfaces work. There are multiple layers today that both simplify and obfuscate a lot, but when you start digging you will get to the bottom of it.
On this note, you should also get familiar with tools like OleView, OleWoo that allow you to analyze interfaces embedded inside many system DLLs. And there is also OleView’s .NET equivalent from James Forshaw called OleViewDotNet.
Old Software
It’s great to get access to old software. You gonna like OldApps.com, and oldversion.com. If you need to do diffs between versions of the same software, or play around with the legacy software to see if it can still be used e.g. connected to some old legacy servers — this is a good place to start.
I cover why you need a repo of clean software below. Read on.
Old Software and PADs
Ever heard of PAD files?
Back in a day everyone was selling shareware. To sell shareware you had to publish and promote it. Publishing on one site was easy, publishing on 200 sites is hard. And updating all this was even tougher.
This is why some clever shareware authors (Association of Software Professionals) came up with an idea of a PAD file.
It’s basically a XML file that includes vital information about the software e.g. name, vendor name, web site, and also places you can download the software from. PAD stands for Portable Application Description and apart from the page I linked to you can read more about it on wikipedia. For us, the most important is the juice and these are actual PAD files, and the more the merrier.
Why?
Every single one leads you to an executable. And its future updates.
If you can collect a large sampleset of legitimate software you can actually build a nice repo of so-called clean samples. This can help you to extract e.g. clean strings, signatures of clean functions, actual authenticode signatures of vendors (if software is signed), feed your engine with a list of clean URLs, download stuff on regular basis to ensure they are whitelisted, and so on and so forth.
From today’s perspective there are many caveats, of course. There are many cases of PADs being abused by PUA, adware, etc. Secondly, we now are very aware of supply-chain attacks, so can’t fully trust all the downloaded binaries. Nevertheless PADs are an ‘easy win’ when it comes to a source of many clean samples. Yes, you need them. Even if just for testing your yara sigs, AV definitions, etc.
And… that’s it for the part 2. And there is a part III coming 🙂
Comments are closed.