Reversing w/o reversing – how to become Alex in practice, Part 3

If you managed to read part 1, and part 2 you are probably buying new HDDs now to make room for all this goodness 🙂

You now got access to SDKs, DDKs, old and new OS ISOs, copies of old 16- and 32- bit software from 10-25- years ago, documentation for 16-bit assembly, know where to look for code of old programs, etc.

PAD Seed

I mentioned PAD files. I know, it’s hard to find them. Try this PAD Repository from QArchive. It’s from 2012, so many links are most likely dead, but it will give you something to start with.

Even moar repos

Alex mentioned ReactOS. Download it.

Then Wine. Download it.

Then MinGW. Download it.

Then QT. Download it.

Download any available source code of cool reversing tools e.g. Process Hacker.

I really need to emphasize/reiterate that Sourceforge and github have tones of sources available. It’s almost guaranteed you will find a lot of gems there. I would say that together with their older siblings codeproject and codeguru these four sites cover the entire Windows ‘codesphere’ one way or another. At least the userland.

Seriously, I am amazed how many times in the past (especially when I was looking for some esoteric functionality that I randomly stumbled upon) I was almost always able to find some interesting code related to the topic on one of these sites.

For the easy wins…

Today there are so many code examples demonstrating all the possible code injection techniques on Windows. They are so common and all over the place that almost… banal at this stage. But yes, if you have never looked at it — there you go – today you can study these code snippets from any possible angle. Go for keywords like: QueueUserAPC, NtQueueApcThread, NtResumeProcess, SetThreadContext etc.

But this is trivial. Let’s not talk trivial.

When I looked at iphone apps for one of my clients (around 2013) I couldn’t find too many tools or docs about some specific files I was finding on iOS and used by the app. After googling around I eventually landed in a github repo that not only knew how to parse the file format, but also included full source code! I didn’t port it to a Windows-friendly programming language, but just reading it was enough for me to understand what these files contained. Interestingly, the very same code disappeared when I was looking for it a few years later.

The golden rule is to download & hamster. Always. You never know how long it will be there.

Another example is IME (Input Method Editors). Most of people outside of Asia have no clue what IME is. It turns out there is a whole community that focuses on working in this area, because… well, they want to be able to make the typing in Asian languages easy and efficient. It is almost for granted that you will find one of the best articles (and code) about IME… in Japanese.

Language is NOT a barrier

Yes. Japanese. Yes. Most of us don’t speak it.

But… with translation services available today you can easily translate pages written in almost any non-English language and at least ‘get’ the meaning of the article, or blog post. Plus, code snippets are universal. Plus, Twitter and other social media make asking questions easier.

So… don’t be put off by the language when you search for stuff. Make sure your search engine shows results from all sites, not only English. You will be nicely surprised how much good stuff is posted in Russian, Chinese, Japanese, Korean, Spanish, French, German, and other languages.

Again, for a quick story to illustrate the point – I am usually excited when I discover some new persistence technique. I always try to do a home work to see if anyone published anything about it before so I can reference it in my posts. After googling around it’s not just once that I discovered that someone posted an article about the same trick 10-15 years earlier! They are not always in a context of persistence, but you can imagine my disappointment when I see I was not the first one to look at it. And in many cases it was not in English. Bummer!

Language is not a barrier #2

Don’t look for C code only, or ASM, or python. Look for _any_ code, ideas. Today it’s so easy to port things or use very high-level interfaces that it’s all about ‘knowing what’s out there’. And reading code is usually easy. Understanding it is a bit harder. But it’s actual programming that is the hardest.

Yup, you don’t need to know every programming language. I touched probably 20 programming languages so far, but I am still a really poor coder, because I don’t know any of them on a developer level. One can argue that reversers are more engineers than coders & they simply use what’s available to piece things together. It’s comforting to think of it as a great excuse to be a jack of all trades and a poor coder 😉

But seriously…

Once you read, or even eyeball code relying on Win API, NT API, a COM interface, trust me, you will remember it in a context.

And even more code…

How?

The same trick as that unique file name – always know precisely what you are looking for first and make it unique enough. Then build a Google dork.

For example, Wine contains lots of definitions of constants that are often not present yet in available Microsoft SDKs, or on MSDN. Again, to illustrate the point – it was Wine and its tool winedump what helped me when I first wrote about the GCTL debug sections. There was no other documentation about it at that time. These guys are often ahead of a curve.

And to give you a practical example on how to look for some low-level code. yes, it’s trivial, but you need to start somewhere. If you want to see if there are any new constants available for e.g. NtQuerySystemInformation, or if you just want to find some Nt code using it – just google any two constants used by this function. You will quickly find lots of useful information. Other trick is to add the keyword Define. This helps to find actual header files.

Ultimate Gurus

Time to introduce Geoff Chappell and Raymond Chen. Their content is 100% signal to noise ratio. I can’t count how many times I relied on information they presented on their blogs/sites.

The Old New Thing blog from Raymond Chen is a gold mine of anecdotes from the Microsoft trenches. Over the years he patiently explained and contextualized so many Windows Internals quirks and answered so many ‘stupid Microsoft’ things that it’s really a humble lesson for all of us who don’t see a big picture such a large company has to deal with.

And Geoff is one of the pioneers of independent Windows internals research. I don’t know how much time he spent documenting all this undocumented stuff, but we can just simply read it on his site and for that, he deserves an award. Or perhaps good holidays 🙂

Old zines & underground stuff

Download phrack. Download stuff from Tuts4you. There are many other good zin+cracking collections available online that you can find easily, but are of questionable legal status, so I am not mentioning them.

Your Repo

What to do with all this source code/data goodness?

I usually take all these sources I can put hands on and unpack them into a one, single, large repository. I install SDK, DDKs in VM and copy all installed files to the same repo. I don’t have a backup of it, so if my drive goes kaput, I will lose all of this. But I can always re-create it as I keep the ISOs/ZIPs elsewhere.

When I want to look for existing API prototypes, discover new constants or structures, collect new GUIDs/CLSIDs, new interfaces, new API names I just crawl this beast with a simple command line, or using Total Commander or a dedicated script.

Sometimes the script runs overnight and in the morning I have the results. Yup, it’s a lot of files to parse, but it’s your strength. The more files from various sources, the better chances you will get a hit. And yes, there are source code management/search suites that may help, but I have not used them.

And how to use it?

For example, mapping CLSIDs to their interface name is very handy when you reverse code that is using COM. Frank Boldewin created a really cool script for it a very loooong time ago. All these new CLSIDs from all the possible sources will make this script work much better.

Everything you can pull out of the repo in a consistent, systematic way is a win. It will save you lots of reversing time in the future.

And quite frankly, perhaps it’s really time w have some cloud solution to share data like this between reversers? Maybe Ghidra community will somehow enable it. Dunno.

And if I can’t find something I google around. Again, a good dork is a name that is very specific. I remember discovering some new constant names for some Nt function by just googling for header files that include already known constants (usually at last 2, a technique I explained earlier in this post). Yup, this is that easy. It always comes back to search-fu.

Moar code

Okay, there is still more.

Ever heard of Undocumented Functions web site?

Visit it today. Grab what you can. At least Bookmark it. Yes, it’s superold, probably not accurate anymore. But this is one of the original sources where it all started. And you can get the .chm file too. Do you see where it is hosted? CodeProject. Sounds familiar?

MSDN Help files

Another source of great knowledge, parsing pleasure and all that reversing jazz is the MSDN help.

Depending on the help version used to ship the documentation you can actually decompress/decompile the files and extract a lot of useful data from the help files and drop them into your local repository. It will be then searchable.

I have done it with a couple iterations of MSDN/SDK help.

Winhelp filse (.hlp) are easy to decompile with e.g. HelpDeco; today .hlp files have a historical value, but don’t underestimate them; if you find any, even one from 90s, please include the decompressed version (.rtf) in your repo; there may be some surprises… many API prototypes magically disappear in newer versions of help, so having a point of reference from a long-time-ago is great; you will always come across malware samples using these old functions, but are no longer documented in MSDN
The chm files are easy to decompile with a hh.exe (hh.exe -decompile <folder> <file.chm>); they are also pretty old for today’s standard, but again, have a look as it may include some cool info from the past glory times (e.g. Netbios stuff is not documented in newer versions of MSDN, but may be still covered in these old ‘chum’ files)
The hxs files can be decompiled easily too (hxcomp -u <file.hxs>); these are pretty good as are relatively new; some IDA plugins actually use it
There is also a version of MSDN help shipped with older Visual Studio that is web-based (localhost); it can be queried using HTTP protocol (data can be retrieved as XML/HTML and is easy to parse)
Most of modern MSDN Help is delivered online — still, you can download sections of it via a very handy PDF Download feature

All of these can be used to generate a rich local database containing references to tones of keywords: APIs, API parameters names, data flow/direction for arguments (in/out/both), GUIDs, interface names, and their definitions, constant names and values, etc.

And since everything needs a story – I actually used these decompiled Help files ~15 years ago to generate a list of many API prototypes that I used in my API monitor (for the context: I hooked APis in a generic way based on the prototypes as opposed to hooking each API with a dedicated callback which would require a lot of effort for a 11K APIs my tool covered).

Web Archive again

If you don’t use Web Archive you are doing it wrong.

This is an extremely rich repository of good, old stuff. As long as you know what to look for you can:

Read old blogs that are ‘dead’ (offline)
Dox security researchers that do OPSEC for last 15 years, but had a web page on AngelFire or Geocities in 1995 with all the personal details present 😉
Download old software often not present anywhere else
Download old ISOs, .exe, zip and other files, as long as they are archived
In the worst case scenario, at least find file names (and then you can google dork them)

It’s pretty obvious, but many people are not very used to it. This is literally a copy of the Internet of 90s and noughties preserved for us to use.

Say, you want to download all the versions of Sysinternal tools e.g. psexec. You can obviously google dork it, but web.archive.org allows you to grab all the copies it archived in one go.

How?

They obviously visited sysinternals many times before and each time mirrored its content.

All you have to do today is just use their schema to discover all links to all copies of a particular web page (including links to tools). The following URL will show you all the snapshots of sysinternals.com on the web.archive.org:

http://web.archive.org/web/timemap/link/http://www.sysinternals.com/

You can use this http://web.archive.org/web/timemap/link/<URL> schema for any URL really, including links to binary files.

For instance, the Sysinternals’ PSTools.zip link leads to this list.

These then can be downloaded in a batch job…

So now you have copied of all known instance of psexec. It _could_ be useful in hunting, building a list of hashes for dual-purpose tools (DFIR analysis), etc.

Final word

This is the final part of this short series. As you can imagine it was fun to write, because:

Alex is a great researcher; following his steps is actually extremely difficult given the emergence of so many infosec camps (one can’t do it all anymore, specialization is inevitable)
still, one can try; and given availability of information today I believe (or more: hope) that some researchers can catch up and overtake; these posts try to lead in a right direction on how to collect information, but less on how to ingest it; this is a tough one and requires a lot of sacrifice from an interested individual
nostalgia 🙂
- it has lots of references to ‘good old days’
- there were less people involved back then so everyone ‘knew’ each other, even if just by their handle/online presence
- we (reversers) used to have a really tough time getting access to stuff; it’s not only about tools that were sparse, but also documentation that had to be re-created in parallel, and often underground
- plus, there was a strong anti-reversing sentiment (primarily related to cracking software)
offered a chance to take look at the future of reversing
- activities that ‘old reversers’ take for granted (their BAU) may be a complete novelty to many newcomers; in the era of ‘easy availability’ it’s easy to to just focus on using available cool tools, but knowing how we got there is important
- Why? the thing is that we need more tools, new ideas, and perhaps more online collaboration to combine resources everyone is sitting on

I personally doubt it’s easy to replicate Alex’s success. Today’s ‘ecosystem’ is more and more reversing-unfriendly. Sooner or later we may need to jailbreak everything before we can even launch a debugger. Sad, but true.

And this is why we need more hardcore reversers, tool builders, and fundamentally – all these naughty people that cultivate the spirit of understanding of how things work under the hood…

Good luck with your reversing!

Hexacorn

Hexacorn

Reversing w/o reversing – how to become Alex in practice, Part 3