When good URLs are bad for business

Analyzing memory dumps comes with a price – ‘good’ information overload. One that annoys me a lot is running URl/domain extraction tools over the memdump and finding tones of legitimate URLs that make it harder to find the juicy stuff I am after. I mean, things like:

http://www.w3.org/2001/XMLSchema-instance
http://www.w3.org/2000/svg
http://www.w3.org/1999/xlink
http://www.w3.org/XML/1998/namespace
http://www.w3.org/1999/xhtml
http://www.w3.org/2000/xmlns/
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
http://update.microsoft.com
http://schemas.microsoft.com/rtc/2009/05/simplejoinconfdoc

There is a lot of ‘good’ URLs embedded in manifests, various resources (e.g. HTML/XML/Json/CSS files), certificates, and many are introduced as a side-effect of linking with static libraries that often include copyright information and URL to author’s page. And of course, there is vendor information either directly in the resources or in binary or its config files.

Not only memory dump analysis suffer from it. The same goes for network log analysis – lots of requests that ‘hide’ the juicy stuff are related to authentication checks, downloads from certificate stores, etc..

In an effort to help with analysis I started building a small repository of these ‘good’ URL (at the moment primarily related to certificates). I extracted these from my ‘good’ sample repository so I believe all of them are legitimate. If you find any error, please let me know.

You can download the repo here.

Hexacorn

Hexacorn

When good URLs are bad for business