Week of Data Dumps, Part 1 – device names

Reversing is not only hours spent analyzing code. It’s also about collecting interesting data so that it can be used to quickly determine other programs’ functionality in the future.

Recognizing unique strings, GUIDs (classes, interfaces, references to strings, classes of devices, etc.), device names, exported and imported APIs , referenced API names that get resolved dynamically, references to mutexes, atoms, windows classes, windows names, environment variables, and other OS features has been always a great shortcut in analysis, and in my older post I explained a generic way to collect such interesting strings from executables by looking at clusters of strings that reside within close proximity to each other, and where at least one string is on the list of ‘interesting strings’ – resulting in string collections that I called ‘string islands’. Using this approach I collected many strings, and of course, added some manually, and as a result I had a lot of them aggregated in one place which in turn allowed me to add it to my personal sandbox…

This data is now a bit obsolete, so it’s time to release it publicly. With this post I am kicking off a week of ‘data dumps’ which will walk through a number of ‘string islands’ collections that I have built over the years. Not all of them are very useful today, not even trustworthy, and the quality is always to be improved, but hey… maybe someone will find it useful…

Here’s the first one — list of many device driver names

DriverPack – Clean PDB paths

Unique PDB debug paths embedded inside malware are useful to detect other variants of the malicious family (not applicable to more advanced malware families where authors either wipe the paths out, use a randomized string, or use a programming language and compiler that don’t leave these forensic artifacts behind).

The very same approach can be used for a classification of ‘good’ files. The only problem is finding a nice, sorted sampleset of clean files that we can extract a larger list of ‘good’ pdb paths from.

Luckily, there exist very well organized samplesets of good, clean files that can be downloaded easily and quickly. For instance, a DriverPack. After you download their torrent you get 32GB of popular driver files that are neatly sorted and placed in sub directories referring to both classes of drivers (audio, video, etc.), and vendor names aka companies providing the software added to the pack.

The bonus is that many of these files are relatively fresh (although you will find a lot of oldies there too).

Running a simple parser over the extracted I created a quick and dirty list of clean PDB paths mapped to vendor names in no time. How useful is that? Again, you can build automated yara rules, use it in offline analysis, speed up a triage of forensic investigations w/o relying on hash sets, fuzzy hash sets, etc.