You are browsing the archive for Software Releases.

Motu – yet another string extractor

June 17, 2015 in Malware Analysis, motu, Software Releases

String extraction is a daily bread for many of us and there are many tools we can use. In the past I wrote a couple of them myself including:

Today I am posting a simple proof of concept string extractor which I called motu. I came up with the name by looking for some fancy names for ‘an island’. According to wikipedia, motu is a reef islet formed by broken coral and sand, surrounding an atoll. If you google for motu you will find a lot of very picturesque photos.

motusSo it’s a short name, and has some pretty pictures 😉

Anyway, back to the idea.

Algorithms extracting strings often work in a streaming FIFO fashion – data in, data out (if data meets criteria). I thought it would be interesting to extract strings first, do some internal crunching and output them as clusters.

The simple idea I came up with is to look at strings inside a file not as separate chunks, but parts of clusters which are a bit like islands (hence the motu :)) separated from each other. We read data from the file, if it looks like a printable string, we assign it to the current island. If the distance between the area where the string was taken from, and the place where we took the previous string from is significant (distance!), we treat the string as belonging to a new island. In other words, if we see a number of strings close to each other, we will output them. If the strings are sparse, or far away we don’t output them. In the end we only print islands that have at least N strings, and at least one string that contains [a-z]{length_of_the_string}. The latter is just to improve the output quality.

The pros are that you see much less junk strings [printable, but not really meaningful], the cons are that you will miss some strings. Still, it may be quicker to review the file outputted by motu than typical strings. In my tests I was getting various results – some very encouraging, some absolute rubbish.

In any case, I hope the idea can be taken further f.ex. having a list of seeds (known good strings taken either from histograms, or even from a dictionary) we could look for islands that contain these seeds only and output only ‘good’ islands instead of everything. The other idea could be to take all islands (including ones with just one string) and sort them by number of strings / island and output everything. This would ensure all strings are visible and the quality of what would be at the top of the output would be the highest (so one could eyeball the top of the resulting file carefully and pay less attention as we progress skimming through the rest of the file).

You can download script here. If you find any bugs, or have an idea for improvements, please let me know. Thanks.

Examples:

  • Notepad
    • Strings (length = 4)
      vs
    • motu (length=4, distance=128, number of strings/island=at least 10)
  • Some random Delphi sample
    • Strings (length = 4)
      vs
    • motu (length=4, distance=128, number of strings/island=at least 10)

Introducing filighting and the future of DFIR tools, part 2

April 11, 2015 in Clustering, Forensic Analysis, Software Releases, Visualisation

In my yesterday’s post I described a simple clustering algorithm that could be used to group files that contain references to each other. Today I am posting the source code of the program that generated the data in my last post, together with a demo that shows how powerful such clustering could be if combined with proper visualization techniques.

In the example I have shown, I used a relatively small folder where Total Commander was installed. The resulting cluster looks like this:

cluster1You can play with it interactively here.

Imagine that someone adds files to the Total Commander folder. Since they are not referenced by any other file in this folder, they will create separate clusters. After adding 3 such files:

  • orphan1.txt
  • orphan2.txt
  • orphan3.txt

we get the following clusters:

cluster2You can play with it interactively here (you need to drag the orphans away to get the same result as shown on the screenshot).

Finally, we can imagine that a hacker of malware creates a couple of files that are perhaps referencing each other. An example could be:

  • config.bin
  • keystrokes.txt
  • malware.exe – referencing keystrokes.txt and config.bin

If we now cluster this directory, we will get something like this:

cluster3The ‘malware’ files clearly stand out.

You can play with it interactively here (again, you need to drag the nodes away to get the same result as shown on the screenshot).

For more examples see part 3.

I believe there is a lot of opportunities in leveraging clustering to reduce the amount of data we need to analyze and to improve user experience by introducing new ways to look at data. There are a lot of visualization techniques that are not used in forensic software today and it is a pity. Clustering adds an extra dimension on top of a timeline and structure imposed by the organization of a file system – we can only hope that forensic software of the future will take this into account.

For inspiration and really amazing examples of visualization go to https://github.com/mbostock/d3/wiki/Gallery. I used the very same script to create the interactive demos referenced by this post.

The source code of the filighter script that generates these clusters is here.