Perfect Timestomping a.k.a. Finding suspicious PE files with clustering

In my previous post about clustering, I mentioned that it can be used as an efficient  data reduction technique. I also provided some examples of timestamps that could be useful for detecting suspicious files on the system. One of them was a compilation time  embedded inside Portable Executables (PE). Turns out that putting this idea into practice is easy and today I wrote a simple perl script that implements this functionality in a few dozen lines of code.

The script scans directory (recursively, if requested) and finds all Portable Executables. It then reads their compilation timestamps and groups them into clusters. Each cluster is a ‘bucket’ holding all binaries compiled within a window of 1 day (86400 seconds). You can play around with the script and change the value of CLUSTER_BOUNDARY to e.g. 30 days and see what happens.

On a screenshot below you can see the script at work – finding all PE files and grouping them into clusters:

And after processing the whole folder, the resulting clusters are printed out:

One needs to quickly scroll through these groups and look at isolated / oprhaned files or small groups and this should hopefully help in finding the bad apples. You can also toy around with the script over clean directories to see what intel you can gather from the compilation timestamps of all PE files inside some specific directory.

For example, after running it over the c:\window\system32 directory of various Windows flavors you may spot some interesting patterns:

  • Portable Executables that are part of Windows OS are not build in an alphabetic order (I originally hoped they are – it could be an interesting pattern to use to spot ‘out-of-orderly’ named executables sandwiched between 2 clean OS files)
  • Still, many OS binaries are compiled sequentially (with a few minutes difference) so many can be easily ignored in analysis
  • On Windows XP and Vista DLLs and EXEs seem to be compiled separately (this is an interesting pattern as  seeing .exe in a sequence of  .dlls should be immediately treated as suspicious; note that system updates may affect this pattern)
  •  On Windows 7 both EXEs and DLLs seem to be compiled w/o any specific pattern 🙁
  • Clean installation has a very small number of clusters within system32 directory; updated/patched binaries make analysis harder (still, updates will be most likely seen as separate clusters)
  • Files dropped by installers, malware, as well as packed executables, compiled scripts e.g. perl32exe, etc. should stand out, even if timestomped – see how psexec service executable stands out below

Compilation time is a very useful characteristic of Portable Executable. Malware authors occasionally zero it or change it to a random value, but this is still not a very common practice. This, of course is a very good news for investigators and forensic analysts. If timestamp is real (not tampered  with), compilation time of a malicious sample is so unique that it is most likely different from ‘typical’ timestamps that can be found e.g. within system32 directory. As mentioned earlier, PECluester should be able to group such randomly dropped files into separate cluster(s) even if the file system (e.g. $MFT) timestamps are timestomped.

Speaking of the devil. I mentioned ‘perfect timestomping’ in the title of this post.

Why?

Perfect timestomping of a Portable Executable would require not only changing the metadata on the file system, but also changing PE file’s compilation time (and all timestamps inside PE file that could reveal its compilation time) to some carefully chosen value that blends with compilation times of system files (especially for malware dropped inside system folders; for malware within application/temp data folders this – of course – is not that useful).

So, how would we go about finding such perfectly timestomped files?

Good news for forensic investigators is that a compilation timestamp is only one of many possible timestamps that can be found inside a typical Portable Executable. Unless malware author takes a really good care of all these timestamps (either understands Portable Executable file format quite well or uses a specialized tool), there is a high chance one may find some inconsistencies. While not many PE timestamps are properly updated during compilation time (e.g. Resources, Import Table have placeholders for timestamps, but are often zeroed by the compiler), some may include timestamps e.g. Debug Directory as show on a screenshot below:

Other clues about the compilation time can be related to

  • embedded files (author might have forgotten to clean up their timestamps)
  • copyright banners for statically linked libraries
  • standard ‘template’ program icon (e.g. icons for win32 applications created via templates in RAD environment utilize always the same standard icon unless authors changes that; icons change between RAD versions and may give some clues as for the ‘age’ of the malware)
  • libraries/compiler signatures – this is difficult as it requires libraries of known patterns, IDA Pro’s FLIRT signatures come to mind here and may give some hints, but associating these with a specific date is close to impossible
  • even harder – specific to the compiler version code of exception handlers, prologue/epilogue code, compilation flags etc.

Back to PECluester – imho you can use it as an alternative to AV scans and a toy for further research. Go ahead and experiment. Enjoy!

You can download script here.

Finding Smoking Gun and going beyond that – Helpful Forensic Artifacts

While I am quite critical about the idea of collecting IOCs (Indicator of Compromise) describing various malware, traces of hacking, etc in a form of hashes, even fuzzy hashes, file names, sizes, etc., etc.  I do believe that there is a certain number of IOCs (or as I call them: HFA –  Helpful Forensic Artifact – as they are not necessary relevant to compromise itself) that are universal and worth collecting. I am talking about artifacts that are common to malware functionality and offensive activities on the system in general as well as any other artifact that may help both attackers and… in investigation (thanks to ‘helpful’ users that leave unencrypted credentials in text files, watch movies on critical systems, etc.).

In this post, I will provide some practical examples of what I mean by that.

Before I kick it off, just a quick reminder – the reasons why I am critical about bloated IOC databases is that they have a very limited applicability in a general sense; and the limitations come as a result of various techniques used by malware authors, offensive teams, etc. including, but not limited to:

  • metamorphism
  • randomization
  • encryption
  • data (e.g. strings) build on the fly (instead of hardcoding)
  • shellcode-like payloads
  • fast-flux
  • P2P
  • covert channels
  • etc.

Notably, antivirus detections of very advanced, metamorphic malware rely on state machines not strings and it’s naive to assume that collecting file names like sdra64.exe is going to save the day…

Anyway…

If we talk about good, interesting HFAs I think of things that:

  • are very often used in malware because of a simple fact they need to be there (dropping files, autostart, etc.)
  • traces of activities that must be carried on the compromised system (recon, downloading toolchests, etc.)
  • also (notably) traces of user activity that support attacker’s work (e.g. a file password.txt is not an IOC, but it’s HFA)
  • traces of system being affected in a negative way e.g. if system has been compromised previously by a generic malware, certain settings could have been changed (e.g. disabled tracing, blocked Task Manager, etc.); they are IOCs in a generic sense, but not really relevant to the actually investigated compromise; you can think of it of these aspects of system security that place the system on the opposite side to the properly secured and hardened box; it also included previously detected/removed malware – imho AV logs are not ‘clear’ IOCs as long as they relate to the event that is not related to investigated compromise

If we talk about a typical random malware, it’s usually stupidly written, using snippets copied&pasted from many sources on the internet. The authors are lazy and don’t even bother to encrypt strings, so detection is really easy. You can grep the file or a memory dump of a suspected process for typical autorun strings with strings, BinText, HexDive and most of the time you will find the smoking gun. If the attacker is advanced, all you will deal with is a blob of binary data that has no visible trace of being malicious unless disassembled – that is, a relocation independent, shellcode-like piece of mixed code/data in a metamorphic form that doesn’t require all the fuss of inline DLL/EXE loading, but it’s just a pure piece of code. It’s actually simple to write with a basic knowledge of assembly language and knowledge of OS internals. I honestly don’t know how to detect such malware in a generic way. I do believe that’s where the future of advanced malware is though (apart from going mobile). And I chuckle when I see malware that is 20MB in size (no matter how advanced the functionality).

When we talk about IOC/HFAs and offensive security practices, it is worth mentioning that we need to follow the mind process of an attacker. Let me give you an example. Assuming that the attacker gets on the system. What things s/he can do? If the malware is already there, it’s easy as the functionality is out there and can be leveraged, malicious payload updated and attacker can do anything that the actual payload is programmed to do and within the boundaries of what environment where it runs permits. On the other hand, if it is an attack that comes through a typical hacking attempt, the situation is different. In fact, the attacker is very limited when it comes to available tools/functionality and often has to leverage existing OS tools. This means exactly what it says – attacker operates in a minimalistic environment and is going to use any possible tool available on OS to his/her benefit. If we talk about Windows system, it can be

  • net.exe (and also net1.exe)
  • telnet.exe
  • ftp.exe

but also

  • arp.exe
  • at.exe
  • attrib.exe
  • bitsadmin.exe
  • cacls.exe
  • certutil.exe
  • cmd.exe
  • command.com
  • compact.exe
  • cscript.exe
  • debug.exe
  • diantz.exe
  • findstr.exe
  • hostname.exe
  • icacls.exe
  • iexpress.exe
  • ipconfig.exe
  • makecab.exe
  • mofcomp.exe
  • more.com
  • msiexec.exe
  • mstsc.exe
  • net1.exe
  • netsh.exe
  • netstat.exe
  • ping.exe
  • powershell.exe
  • reg.exe
  • regedit.exe
  • regedt32.exe
  • regini.exe
  • regsvr32.exe
  • robocopy.exe
  • route.exe
  • runas.exe
  • rundll32.exe
  • sc.exe
  • schtasks.exe
  • scrcons.exe
  • shutdown.exe
  • takeown.exe
  • taskkill.exe
  • tasklist.exe
  • tracert.exe
  • vssadmin.exe
  • whoami.exe
  • wscript.exe
  • xcacls.exe
  • xcopy.exe

and OS commands

  • echo
  • type
  • dir
  • md/mkdir
  • systeminfo

and many other command line tools and commands.

So, if you analyze memory dump from a Windows system, it’s good to search for presence of a file name associated with built-in windows utilities and look at the context i.e. surrounding memory region to see what can be possibly the reason of it being there (cmd.exe /c being the most common I guess).

Back to the original reason of this post – since I wanted to provide some real/practical examples of HFAs that one can utilize to analyze hosts, let me start with a simple classification by functionality/purpose:

  • information gathering
    • net.exe
    • net1.exe
    • psexec.exe/psexesvc.exe
    • dsquery.exe
    • arp.exe
    • traces of shell being used (cmd.exe /c)
    • passwords.txt, password.txt, pass.txt, etc.
  • data collection
    • type of files storing collected data
      • possibly password protected archives
      • encrypted data (e..g credit cards/track data)
    • various 3rd party tools to archive data:
      • rar, 7z, pkzip, tar, arj, lha, kgb, xz, etc.
    • OS-based tools
      • compress.exe
      • makecab.exe
      • iexpress.exe
      • diantz.exe
    • type of collected data
      • screen captures often saved as .jpg (small size)
      • screen captures file names often include date
      • keystroke names and their variants
        • PgDn, [PgDn],{PgDn}
        • VK_NEXT
        • PageDown, [PageDown] {PageDown}
      • timestamps (note that there are regional settings)
      • predictable Windows titles
        • [ C:\WINDOWS\system32\notepad.exe ]
        • [ C:\WINDOWS\system32\calc.exe ]
        • [http://google.com/ – Windows Internet Explorer]
        • [Google – Windows Internet Explorer]
        • [InPrivate – Windows Internet Explorer – [InPrivate]]
      • possible excluded window class names
        • msctls_progress32
        • SysTabControl32
        • SysTreeView32
      • content of the address bar
      • attractive data for attackers
        • regexes for PII (searching for names/dictionary/, states, countries, phone numbers, etc. may help)
        • anything that matches Luhn algorithm (credit cards)
      • input field names from web pages and related to intercepted/recognized credentials
        • user
        • username
        • password
        • pin
      • predictable user-generated content
        • internet searches
        • chats (acronymes, swearwords, smileys, etc.)
  • data exfiltration
    • who
      • username/passwords
    • how
      • ftp client (ftp.exe, far.exe, etc.)
      • browser (POSTs, more advanced: GETs)
      • DNS requests
      • USB stick
      • burnt CD
      • printer
    • how
      • just in time (frequent network connection)
      • ‘coming back’ to the system
    • configuration
      • file
      • registry
      • uses GUI (lots of good keywords!)
    • where to:
      • URLs
      • FTP server names
      • SMTP servers
      • mapped drives (\\foo\c$)
      • mapped remote paths (e.g. \\tsclient)
  • malicious code
    • any .exe/.zip in TEMP/APPLICATION DATA subfolders
    • processes that have a low editing distance between their names and known system processes (e.g. lsass.exe vs. lsas.exe)
    • processes that use known system processes but start from a different path
    • areas of memory containing “islands” with raw addresses of APIs typically used by malware e.g. CreateRemoteThread, WriteProcessMemory, wininet functions
  • mistakes
    • Event logs
    • AV logs/quarantine files
    • leftovers (files, etc.)

Many of these HFAs form a very managable set that when put together can be applied to different data sets (file names, file paths, file content, registry settings, memory content, process dumps, etc.).

In other words – instead of chasing after a sample/family/hacking group-specific stuff, we look for traces of all these things that make a malware – malware, a weak system – weak, a hack – hack and attack-supporting user – victim.