You are browsing the archive for Software Releases.

Craving for time? Carve some timestamps out… – TimeCraver v0.1

August 22, 2015 in File Formats ZOO, Forensic Analysis, Malware Analysis, Software Releases

Analysis of binary data is always challenging. Data can be encrypted, encoded, and stored in a number of proprietary formats. Understanding of what data represents and how it is stored is non-trivial. It typically involves either analysis of the code that writes stuff to a file, or trying our luck by guessing what is a possible structure of the actual data. The typical approach is to simply look at it and its properties.

This can involve checking its entropy and how it changes over the file, looking for patterns typically associated with popular compression algorithms, attempting to brute-force various trivial encryption algos, checking if any data is recognized as a string, Unicode string, localized string, a potential absolute or relative offset to other data, or maybe a byte-, word-, dword- long length preceding data etc.

One of the most popular tools that is used to analyze unknown data is binwalk and it helped me on many occasions by providing hints on what is possibly ‘in the file’. Sometimes, even if it didn’t recognize anything interesting was also a good hint – typically meaning encryption, or something really unusual/proprietary.

Existing tools are always handy, but I can’t count how many quick & dirty (and often completely stupid) scripts I wrote to get some data to look more ‘reasonable’ and ‘normal’.

In today’s post I am showing a simple example of such ‘unknown data analysis script’.

When we see a binary file, we typically run ‘strings’ on them and we gather a nice readable ‘printable’ data for analysis.The ‘non-printable’ is also interesting though, so another tool I often run is a strings-like script that carves timestamps out. This comes handy for smaller files, especially for these that look like a config, a quarantine, and anything really that looks like may have  a potential timestamps embedded in it.

Carving works following a simple rule – read 4/8 bytes, convert it to an epoch using various conversion algos (based on assumed timestamp format), see if epoch converts to a date between years 2000-2015, and if it does – just print it out, together with the offset and some extra metadata.

Example:

     00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F   0123456789ABCDEF
---------------------------------------------------------------------------
00 : 80 86 F6 34 00 C0 5D CE 56 CF CD 01 00 40 FA 13   ...4..].V....@.. 00
10 : 0F 00 CE 01 00 40 8B B7 0F 16 CE 01 00 80 59 DA   .....@........Y. 16
20 : 6B 2E CE 01 00 00 BE D2 FE 45 CE 01 00 A4 03 01   k........E...... 32
30 : 85 95 C2 01                                       ....             36

Looking at such binary data doesn’t give us much useful information.

Running timecraver over it, gives us the following:

===========================================
 TimeCraver v0.1, Hexacorn.com, 2015-08-23
===========================================
00000000,DOSTIME ,44C257B0,2006-07-22 16:52:00,8086F634
00000004,FILETIME,50B94880,2012-12-01 00:00:00,00C05DCE56CFCD01
0000000A,EPOCH   ,400001CD,2004-01-10 13:44:45,CD010040
0000000C,FILETIME,510B0580,2013-02-01 00:00:00,0040FA130F00CE01
00000012,EPOCH   ,400001CE,2004-01-10 13:44:46,CE010040
00000014,FILETIME,512FEF7F,2013-02-28 23:59:59,00408BB70F16CE01
0000001C,FILETIME,5158CDFF,2013-03-31 23:59:59,008059DA6B2ECE01
00000024,FILETIME,51805B00,2013-05-01 00:00:00,0000BED2FE45CE01
00000026,EPOCH   ,45FED2BE,2007-03-19 18:13:18,BED2FE45
0000002C,FILETIME,3DE3D068,2002-11-26 19:50:00,00A403018595C201

The first column is an offset, followed by the timestamp type, then hexadecimal EPOCH calculated from the data, then its YYYY-MM-DD hh:mm:ss representation and finally the actual bytes from the file that are converted to EPOCH.

The data is immediately more readable and certain conclusions can be drawn. If you look at the offsets, distance between them and type of timestamps you may actually ‘see through’ the data and potentially ‘define’ a reasonable structure.

In this particular case, we can see that FILETIME is

00000004, 0000000C
00000014, 0000001C
00000024, 0000002C

– looks like a sequence of FILETIME records. Following this logic, we can guess that structure of the file is potentially like this:

00000000,DOSTIME ,44C257B0,2006-07-22 16:52:00,8086F634
00000004,FILETIME,50B94880,2012-12-01 00:00:00,00C05DCE56CFCD01
0000000C,FILETIME,510B0580,2013-02-01 00:00:00,0040FA130F00CE01
00000014,FILETIME,512FEF7F,2013-02-28 23:59:59,00408BB70F16CE01
0000001C,FILETIME,5158CDFF,2013-03-31 23:59:59,008059DA6B2ECE01
00000024,FILETIME,51805B00,2013-05-01 00:00:00,0000BED2FE45CE01
0000002C,FILETIME,3DE3D068,2002-11-26 19:50:00,00A403018595C201

I can confirm it since it is one of the test files I created :)

The script can be found here.

Happy craving & carving !

Bonus: if you look at the data in Registry, you will find more timestamps than you thought are actually there. This is a subject for another post :)

Update

Bonus will be here faster than expected – turns out Andrew Case, Jerry Stormo, Joseph Sylve, and Vico Marziale wrote an awesome python script for timestamp carving in Registry

Motu – yet another string extractor

June 17, 2015 in Malware Analysis, motu, Software Releases

String extraction is a daily bread for many of us and there are many tools we can use. In the past I wrote a couple of them myself including:

Today I am posting a simple proof of concept string extractor which I called motu. I came up with the name by looking for some fancy names for ‘an island’. According to wikipedia, motu is a reef islet formed by broken coral and sand, surrounding an atoll. If you google for motu you will find a lot of very picturesque photos.

motusSo it’s a short name, and has some pretty pictures 😉

Anyway, back to the idea.

Algorithms extracting strings often work in a streaming FIFO fashion – data in, data out (if data meets criteria). I thought it would be interesting to extract strings first, do some internal crunching and output them as clusters.

The simple idea I came up with is to look at strings inside a file not as separate chunks, but parts of clusters which are a bit like islands (hence the motu :)) separated from each other. We read data from the file, if it looks like a printable string, we assign it to the current island. If the distance between the area where the string was taken from, and the place where we took the previous string from is significant (distance!), we treat the string as belonging to a new island. In other words, if we see a number of strings close to each other, we will output them. If the strings are sparse, or far away we don’t output them. In the end we only print islands that have at least N strings, and at least one string that contains [a-z]{length_of_the_string}. The latter is just to improve the output quality.

The pros are that you see much less junk strings [printable, but not really meaningful], the cons are that you will miss some strings. Still, it may be quicker to review the file outputted by motu than typical strings. In my tests I was getting various results – some very encouraging, some absolute rubbish.

In any case, I hope the idea can be taken further f.ex. having a list of seeds (known good strings taken either from histograms, or even from a dictionary) we could look for islands that contain these seeds only and output only ‘good’ islands instead of everything. The other idea could be to take all islands (including ones with just one string) and sort them by number of strings / island and output everything. This would ensure all strings are visible and the quality of what would be at the top of the output would be the highest (so one could eyeball the top of the resulting file carefully and pay less attention as we progress skimming through the rest of the file).

You can download script here. If you find any bugs, or have an idea for improvements, please let me know. Thanks.

Examples:

  • Notepad
    • Strings (length = 4)
      vs
    • motu (length=4, distance=128, number of strings/island=at least 10)
  • Some random Delphi sample
    • Strings (length = 4)
      vs
    • motu (length=4, distance=128, number of strings/island=at least 10)