Forensic Analysis | Hexacorn

When you look at a large repository of clean files there is always an opportunity to find something interesting. For instance, list of precursors to forensic artifacts that one can find in legitimate software installation packages. Both pre- and post- install.

Why these may come handy?

Well, while this will never be a 100%-reliable solution these may help to automate at least some of the digital forensic triage processdx. And by that, I mean f.ex. exclusions via file names or their clusters (as opposed to hashes).

I wrote about it long time ago in a context of filelighting, but there are perhaps other, simpler avenues to pursue as well. Filelighting idea focuses on looking for file names referenced by files residing in the installed program folder. We can as well expand it to pre-install directories — be it temporary created folders, manually unpacked drivers or software package installation folders, etc.. And while some of this is no longer that important — after all more and more updates and installations happen in a background, often w/o user’s knowledge, via App stores, etc. and frankly, people probably download less and less software directly today than say 10 years ago, well… it still does happen a lot, and if we can help with some automation… why not?

One of the most interesting sources of information about software packages are good old fashioned .inf files. The other is the good ol’ NSRL database. Yes, the latter focuses primarily on post-install, but we should use whatever is available.

The .inf files reference everything that is there to be installed, often in many configurations, and they provide a list of created / modified files, directories, but also – Registry keys, service names – you name it. It’s a gold mine of information of how a ‘good’ Windows software looks like. It’s a gold mine of forensic artifact precursors. The NSRL database is kinda similar, is a superset of everything good really, but it’s also obviously limited to data available in a dump.

Let’s have a look.

The top of the .inf file usually includes [Version] section. You can find description of the .inf files elsewhere, here, we are focused on stats only. I must note here that parsing .inf files is not as easy as it may seem as they heavily rely on self-referencing, multiple .inf files can be merged together, and there is also a mechanism of string substitution (tokens) in play. Lots of quirks to take care of.

The top occurrences of fields within this section are as follows:

Class
DriverVer
Provider
CatalogFile
Signature
ClassGuid

Combing .inf files for say… CatalogFile field can give us a list of all legitimate .cat files out there (with an obvious caveat that the list is as good, as our ‘good files’ repo). Still, this may come handy for filename-based exclusions. There is a double-edged sword lying somewhere there of course — if you are a bad guy, knowing what good file names are available in legit software packs will very well serve your nefarious purposes as you may surely pick up a file name for your payload from the list…. Oh well…

The NSRL database is well known, so doesn’t need any introduction. What is interesting about the set is an often-forgotten ProductCode field. This is an indicator of where the file/hash/tuple comes from. If you cluster the set by ProductCode you may end up with clusters of file names that belong to a specific product. For example, if we look at say product code 196184 we get this result. As a side note, some of the file names seem to be section names of executables, so the drill-down NSRL guys use seems to be going really deep.

So… there you have it… parse your good .inf files, enrich it by clusters of file names extracted from NSRL set and you may generate a nice cluster-based exclusion list! Happy filelighting!

Bonus:

Okay, not everything is rosy. Here’s a list of .cat file names I have collected during this exercise. Lots of them. I think they can only make sense in a context of either a software installation package (hint: the one with .inf file), or ProductCode in NSRL.

It’s been a long time since I did any forensic research, so today is the day.

There is no old phrase coined yet — your forensic investigations’ results are as good as your understanding & context of the data you see — but it’s hard to disagree with it.

EDRs and forensic analysis tools gave us a lot of data to work with, but these often lack that specific context – and despite all the goodness they provide I think vendors can still do a bit better.

Take Chrome browser extensions as an example.

EDR logs are typically very process- and file system-centric and when it comes to browser extensions the most common things we usually see are artifacts like this:

profile\Default\Extensions\cjpalhdlnbpafiamejdnhcphjbkeiagm\…

What the heck is cjpalhdlnbpafiamejdnhcphjbkeiagm?

It is an extension ID (in some weird parallel universe they are kinda an equivalent of ActiveX CLSID). Thanks to Twitter (Thanks Ziyad!), today I learned how extension IDs are actually generated. It doesn’t help with forensic analysis of an extenssion ID though – yes, you can search for their meaning/mapping online, or if you are lucky and installed the very same extension in your browser you may find reference to this specific ID manually on your file system. And eventually, pair it with the actual name of the extension: uBlock Origin.

But…

There are many problems with manual analysis like this. Throughout the years there were at least 400-500K Chrome extensions out there, maybe even more, many with a short life span and either already deleted by authors or forcibly removed from the Chrome Web Store by Google themselves.

Obviously, it would be nice if we could somehow collect the info about all the extension IDs ever registered and use this info to enrich our searches, whether in IR of DF context. Right, Google?

Luckily, someone already did the hard work for us — the chrome-extensions-archive project provides tools to collect and archive Google Chrome extensions. However, and unfortunately really, the project has been suspended for a while now and I am not sure if it will ever be revived. FWIW some parts of the old crx.dam.io website are still available online preserved by Web Archive if you need to access it.

I’ve been using the aforementioned code to collect the list of Extension IDs for a few years now and every once in a while I revisit the Google site to refresh the set and update my local lookup table. Other sources I have used are a bunch of Google Chrome clone sites, primarily in China, that on occasions prove themselves useful to fill in some extension ID gaps, especially for older, or short-lived extensions.

At this very moment, the lookup table is nearly 340K entries strong and since it’s holiday time, I have decided to release the data to anyone who is interested.

Unlike many of my previous research data dump releases, this one is non-public. If you want it, please ping me from a trustworthy email and we take it from there!

Update

See part 2; I have released the file publicly.

Hexacorn

Hexacorn

Category Archives: Forensic Analysis

Putting .inf files and NSRL database to a better use

Mapping Chrome extension IDs to their names