You are browsing the archive for Archaeology.

Reversing w/o reversing – how to become Alex in practice, Part 3

April 14, 2019 in Archaeology, Malware Analysis

If you managed to read part 1, and part 2 you are probably buying new HDDs now to make room for all this goodness 🙂

You now got access to SDKs, DDKs, old and new OS ISOs, copies of old 16- and 32- bit software from 10-25- years ago, documentation for 16-bit assembly, know where to look for code of old programs, etc.

PAD Seed

I mentioned PAD files. I know, it’s hard to find them. Try this PAD Repository from QArchive. It’s from 2012, so many links are most likely dead, but it will give you something to start with.

Even moar repos

Alex mentioned ReactOS. Download it.

Then Wine. Download it.

Then MinGW. Download it.

Then QT. Download it.

Download any available source code of cool reversing tools e.g. Process Hacker.

I really need to emphasize/reiterate that Sourceforge and github have tones of sources available. It’s almost guaranteed you will find a lot of gems there. I would say that together with their older siblings codeproject and codeguru these four sites cover the entire Windows ‘codesphere’ one way or another. At least the userland.

Seriously, I am amazed how many times in the past (especially when I was looking for some esoteric functionality that I randomly stumbled upon) I was almost always able to find some interesting code related to the topic on one of these sites.

For the easy wins…

Today there are so many code examples demonstrating all the possible code injection techniques on Windows. They are so common and all over the place that almost… banal at this stage. But yes, if you have never looked at it — there you go – today you can study these code snippets from any possible angle. Go for keywords like: QueueUserAPC, NtQueueApcThread, NtResumeProcess, SetThreadContext etc.

But this is trivial. Let’s not talk trivial.

When I looked at iphone apps for one of my clients (around 2013) I couldn’t find too many tools or docs about some specific files I was finding on iOS and used by the app. After googling around I eventually landed in a github repo that not only knew how to parse the file format, but also included full source code! I didn’t port it to a Windows-friendly programming language, but just reading it was enough for me to understand what these files contained. Interestingly, the very same code disappeared when I was looking for it a few years later.

The golden rule is to download & hamster. Always. You never know how long it will be there.

Another example is IME (Input Method Editors). Most of people outside of Asia have no clue what IME is. It turns out there is a whole community that focuses on working in this area, because… well, they want to be able to make the typing in Asian languages easy and efficient. It is almost for granted that you will find one of the best articles (and code) about IME… in Japanese.

Language is NOT a barrier

Yes. Japanese. Yes. Most of us don’t speak it.

But… with translation services available today you can easily translate pages written in almost any non-English language and at least ‘get’ the meaning of the article, or blog post. Plus, code snippets are universal. Plus, Twitter and other social media make asking questions easier.

So… don’t be put off by the language when you search for stuff. Make sure your search engine shows results from all sites, not only English. You will be nicely surprised how much good stuff is posted in Russian, Chinese, Japanese, Korean, Spanish, French, German, and other languages.

Again, for a quick story to illustrate the point – I am usually excited when I discover some new persistence technique. I always try to do a home work to see if anyone published anything about it before so I can reference it in my posts. After googling around it’s not just once that I discovered that someone posted an article about the same trick 10-15 years earlier! They are not always in a context of persistence, but you can imagine my disappointment when I see I was not the first one to look at it. And in many cases it was not in English. Bummer!

Language is not a barrier #2

Don’t look for C code only, or ASM, or python. Look for _any_ code, ideas. Today it’s so easy to port things or use very high-level interfaces that it’s all about ‘knowing what’s out there’. And reading code is usually easy. Understanding it is a bit harder. But it’s actual programming that is the hardest.

Yup, you don’t need to know every programming language. I touched probably 20 programming languages so far, but I am still a really poor coder, because I don’t know any of them on a developer level. One can argue that reversers are more engineers than coders & they simply use what’s available to piece things together. It’s comforting to think of it as a great excuse to be a jack of all trades and a poor coder 😉

But seriously…

Once you read, or even eyeball code relying on Win API, NT API, a COM interface, trust me, you will remember it in a context.

And even more code…

How?

The same trick as that unique file name – always know precisely what you are looking for first and make it unique enough. Then build a Google dork.

For example, Wine contains lots of definitions of constants that are often not present yet in available Microsoft SDKs, or on MSDN. Again, to illustrate the point – it was Wine and its tool winedump what helped me when I first wrote about the GCTL debug sections. There was no other documentation about it at that time. These guys are often ahead of a curve.

And to give you a practical example on how to look for some low-level code. yes, it’s trivial, but you need to start somewhere. If you want to see if there are any new constants available for e.g. NtQuerySystemInformation, or if you just want to find some Nt code using it – just google any two constants used by this function. You will quickly find lots of useful information. Other trick is to add the keyword Define. This helps to find actual header files.

Ultimate Gurus

Time to introduce Geoff Chappell and Raymond Chen. Their content is 100% signal to noise ratio. I can’t count how many times I relied on information they presented on their blogs/sites.

The Old New Thing blog from Raymond Chen is a gold mine of anecdotes from the Microsoft trenches. Over the years he patiently explained and contextualized so many Windows Internals quirks and answered so many ‘stupid Microsoft’ things that it’s really a humble lesson for all of us who don’t see a big picture such a large company has to deal with.

And Geoff is one of the pioneers of independent Windows internals research. I don’t know how much time he spent documenting all this undocumented stuff, but we can just simply read it on his site and for that, he deserves an award. Or perhaps good holidays 🙂

Old zines & underground stuff

Download phrack. Download stuff from Tuts4you. There are many other good zin+cracking collections available online that you can find easily, but are of questionable legal status, so I am not mentioning them.

Your Repo

What to do with all this source code/data goodness?

I usually take all these sources I can put hands on and unpack them into a one, single, large repository. I install SDK, DDKs in VM and copy all installed files to the same repo. I don’t have a backup of it, so if my drive goes kaput, I will lose all of this. But I can always re-create it as I keep the ISOs/ZIPs elsewhere.

When I want to look for existing API prototypes, discover new constants or structures, collect new GUIDs/CLSIDs, new interfaces, new API names I just crawl this beast with a simple command line, or using Total Commander or a dedicated script.

Sometimes the script runs overnight and in the morning I have the results. Yup, it’s a lot of files to parse, but it’s your strength. The more files from various sources, the better chances you will get a hit. And yes, there are source code management/search suites that may help, but I have not used them.

And how to use it?

For example, mapping CLSIDs to their interface name is very handy when you reverse code that is using COM. Frank Boldewin created a really cool script for it a very loooong time ago. All these new CLSIDs from all the possible sources will make this script work much better.

Everything you can pull out of the repo in a consistent, systematic way is a win. It will save you lots of reversing time in the future.

And quite frankly, perhaps it’s really time w have some cloud solution to share data like this between reversers? Maybe Ghidra community will somehow enable it. Dunno.

And if I can’t find something I google around. Again, a good dork is a name that is very specific. I remember discovering some new constant names for some Nt function by just googling for header files that include already known constants (usually at last 2, a technique I explained earlier in this post). Yup, this is that easy. It always comes back to search-fu.

Moar code

Okay, there is still more.

Ever heard of Undocumented Functions web site?

Visit it today. Grab what you can. At least Bookmark it. Yes, it’s superold, probably not accurate anymore. But this is one of the original sources where it all started. And you can get the .chm file too. Do you see where it is hosted? CodeProject. Sounds familiar?

MSDN Help files

Another source of great knowledge, parsing pleasure and all that reversing jazz is the MSDN help.

Depending on the help version used to ship the documentation you can actually decompress/decompile the files and extract a lot of useful data from the help files and drop them into your local repository. It will be then searchable.

I have done it with a couple iterations of MSDN/SDK help.

  • Winhelp filse (.hlp) are easy to decompile with e.g. HelpDeco; today .hlp files have a historical value, but don’t underestimate them; if you find any, even one from 90s, please include the decompressed version (.rtf) in your repo; there may be some surprises… many API prototypes magically disappear in newer versions of help, so having a point of reference from a long-time-ago is great; you will always come across malware samples using these old functions, but are no longer documented in MSDN
  • The chm files are easy to decompile with a hh.exe (hh.exe -decompile <folder> <file.chm>); they are also pretty old for today’s standard, but again, have a look as it may include some cool info from the past glory times (e.g. Netbios stuff is not documented in newer versions of MSDN, but may be still covered in these old ‘chum’ files)
  • The hxs files can be decompiled easily too (hxcomp -u <file.hxs>); these are pretty good as are relatively new; some IDA plugins actually use it
  • There is also a version of MSDN help shipped with older Visual Studio that is web-based (localhost); it can be queried using HTTP protocol (data can be retrieved as XML/HTML and is easy to parse)
  • Most of modern MSDN Help is delivered online — still, you can download sections of it via a very handy PDF Download feature

All of these can be used to generate a rich local database containing references to tones of keywords: APIs, API parameters names, data flow/direction for arguments (in/out/both), GUIDs, interface names, and their definitions, constant names and values, etc.

And since everything needs a story – I actually used these decompiled Help files ~15 years ago to generate a list of many API prototypes that I used in my API monitor (for the context: I hooked APis in a generic way based on the prototypes as opposed to hooking each API with a dedicated callback which would require a lot of effort for a 11K APIs my tool covered).

Web Archive again

If you don’t use Web Archive you are doing it wrong.

This is an extremely rich repository of good, old stuff. As long as you know what to look for you can:

  • Read old blogs that are ‘dead’ (offline)
  • Dox security researchers that do OPSEC for last 15 years, but had a web page on AngelFire or Geocities in 1995 with all the personal details present 😉
  • Download old software often not present anywhere else
  • Download old ISOs, .exe, zip and other files, as long as they are archived
  • In the worst case scenario, at least find file names (and then you can google dork them)

It’s pretty obvious, but many people are not very used to it. This is literally a copy of the Internet of 90s and noughties preserved for us to use.

Say, you want to download all the versions of Sysinternal tools e.g. psexec. You can obviously google dork it, but web.archive.org allows you to grab all the copies it archived in one go.

How?

They obviously visited sysinternals many times before and each time mirrored its content.

All you have to do today is just use their schema to discover all links to all copies of a particular web page (including links to tools). The following URL will show you all the snapshots of sysinternals.com on the web.archive.org:

  • http://web.archive.org/web/timemap/link/http://www.sysinternals.com/

You can use this http://web.archive.org/web/timemap/link/<URL> schema for any URL really, including links to binary files.

For instance, the Sysinternals’ PSTools.zip link leads to this list.

These then can be downloaded in a batch job…

So now you have copied of all known instance of psexec. It _could_ be useful in hunting, building a list of hashes for dual-purpose tools (DFIR analysis), etc.

Final word

This is the final part of this short series. As you can imagine it was fun to write, because:

  • Alex is a great researcher; following his steps is actually extremely difficult given the emergence of so many infosec camps (one can’t do it all anymore, specialization is inevitable)
  • still, one can try; and given availability of information today I believe (or more: hope) that some researchers can catch up and overtake; these posts try to lead in a right direction on how to collect information, but less on how to ingest it; this is a tough one and requires a lot of sacrifice from an interested individual
  • nostalgia 🙂
    • it has lots of references to ‘good old days’
    • there were less people involved back then so everyone ‘knew’ each other, even if just by their handle/online presence
    • we (reversers) used to have a really tough time getting access to stuff; it’s not only about tools that were sparse, but also documentation that had to be re-created in parallel, and often underground
    • plus, there was a strong anti-reversing sentiment (primarily related to cracking software)
  • offered a chance to take look at the future of reversing
    • activities that ‘old reversers’ take for granted (their BAU) may be a complete novelty to many newcomers; in the era of ‘easy availability’ it’s easy to to just focus on using available cool tools, but knowing how we got there is important
    • Why? the thing is that we need more tools, new ideas, and perhaps more online collaboration to combine resources everyone is sitting on

I personally doubt it’s easy to replicate Alex’s success. Today’s ‘ecosystem’ is more and more reversing-unfriendly. Sooner or later we may need to jailbreak everything before we can even launch a debugger. Sad, but true.

And this is why we need more hardcore reversers, tool builders, and fundamentally – all these naughty people that cultivate the spirit of understanding of how things work under the hood…

Good luck with your reversing!

Reversing w/o reversing – how to become Alex in practice, Part 2

April 12, 2019 in Archaeology, Malware Analysis

My post from yesterday was written in a hurry so I didn’t have a chance to cover everything. So, time for the part II.

Okay, let’s start from the old stuff.

The really old stuff

There is a great web site called vetusware.com. It is collecting stuff that is abandonware. When you start searching the page you will get stuck and will spend hours downloading some really esoteric software. There is tones of 16-bit software. There is also lots of 32-bit software from 90s and noughties. There _are_very old SDK, DDK packages there. You do want to download them in case they include descriptions, definitions that have been removed in later versions of SDK/DDK.

You just need to go there and start downloading. The gems you can find include software from early days of Microsoft, Borland, Wordstar, IBM, OS/2 and so so and so forth. This is where it all started. For PC, at least.

The echoes of Int 21h

This one is for Alex. Just kidding – the thing is that before the internet took the shape it has today many coders and reverses relied on just a bunch of knowledge sources.

One of the most important things you wanted to put your hands on back in a day was Ralf Brown’s Interrupt List. This is a nostalgic piece of beauty. It was way ahead of its time and is to date one of the best ever compilations of descriptions of programming interface of tones of APIs. It was a Bible for DOS coders.

The fact these API functions were called or executed via software interrupts doesn’t matter. Ralf collected an impressive collection of knowledge in one piece. There is Microsoft DOS int 21h, int 25h, int 26h, there is VESA for graphic cards, there are low-level int 13h functions of HDD, there are hardware interrupts int 08h, int 09h, there are interrupts internally used by viruses, as well as extensions used by various software, and so on and so forth.

If you ever need a reference for analysing the 16-bit code, the Ralf’s Interrupt List it is.

Echos of early 32-bit coding

Okay, today you have StackOverflow, and everyone programs in QT, .NET, Electron, etc.. Back in a day it was Win32 API, MFC, AFC, Borland, Delphi, Code Gear, and finally Embarcadero. And of course, Alex’s favorite – Visual Basic (I don’t mention Java, because Java people are from a different camp). And people talking about programming either talked on Usenet, IRC, or on web sites like codeproject.com, or codeguru.com.

Many early malware creations were borrowing code from these two sites I mentioned, because the code quality was decent, and most importantly – this was the only place apart from some articles in MSDN, Dr Dobbs, sometimes Usenet where you could find some ‘juice’ back then.

Even today you will find a lot of great articles there. Even if old, they do cover the foundation of many technologies we take today for granted or have already forgotten about e.g.:

  • DDE (Dynamic Data Exchange)
  • OLE (Object Linking and Embedding)
  • COM (Component Object Model)
  • MFC (Microsoft Foundation Class), and
  • AFC (Application Foundation Classes).

There is also a lot of information and code in pure Windows API – it makes it much more valuable than some easy to digest .NET code that hides a lot of details from you (this is not to say .NET is bad; not at all, and quite the opposite; what you can do with .NET via PowerShell today is absolutely amazing). Still, good to look at the old-school stuff if you want to know how ‘raw’ COM interfaces work. There are multiple layers today that both simplify and obfuscate a lot, but when you start digging you will get to the bottom of it.

On this note, you should also get familiar with tools like OleView, OleWoo that allow you to analyze interfaces embedded inside many system DLLs. And there is also OleView’s .NET equivalent from James Forshaw called OleViewDotNet.

Old Software

It’s great to get access to old software. You gonna like OldApps.com, and oldversion.com. If you need to do diffs between versions of the same software, or play around with the legacy software to see if it can still be used e.g. connected to some old legacy servers — this is a good place to start.

I cover why you need a repo of clean software below. Read on.

Old Software and PADs

Ever heard of PAD files?

Back in a day everyone was selling shareware. To sell shareware you had to publish and promote it. Publishing on one site was easy, publishing on 200 sites is hard. And updating all this was even tougher.

This is why some clever shareware authors (Association of Software Professionals) came up with an idea of a PAD file.

It’s basically a XML file that includes vital information about the software e.g. name, vendor name, web site, and also places you can download the software from. PAD stands for Portable Application Description and apart from the page I linked to you can read more about it on wikipedia. For us, the most important is the juice and these are actual PAD files, and the more the merrier.

Why?

Every single one leads you to an executable. And its future updates.

If you can collect a large sampleset of legitimate software you can actually build a nice repo of so-called clean samples. This can help you to extract e.g. clean strings, signatures of clean functions, actual authenticode signatures of vendors (if software is signed), feed your engine with a list of clean URLs, download stuff on regular basis to ensure they are whitelisted, and so on and so forth.

From today’s perspective there are many caveats, of course. There are many cases of PADs being abused by PUA, adware, etc. Secondly, we now are very aware of supply-chain attacks, so can’t fully trust all the downloaded binaries. Nevertheless PADs are an ‘easy win’ when it comes to a source of many clean samples. Yes, you need them. Even if just for testing your yara sigs, AV definitions, etc.

And… that’s it for the part 2. And there is a part III coming 🙂