Clustering and Batch Analysis

I have recently been toying around with clustering of various malicious sample sets – running files through a sandbox and static analysis tools, and then applying various normalization and histograms to the output. The results are not mind-blowing, but encouraging. They help in grouping various malware families into separate buckets, improve log parsing routines, and in some cases can be also leveraged to quickly discover hidden properties of the malware e.g. encryption keys, User Agents, HTTP verbs, etc. etc. – these may be then  used for more in-depth analysis of proxy logs, etc.

Here is a short list of ‘clusterable’ attributes just in case you want to design your own clustering solution and are looking for a quick cheat list; it is certainly far from being complete, but may give you some pointers:

STATIC

  • File Name
  • File Extension
  • File Size
  • File Type
    • This will have a lot of ‘subtypes’ – for MZ files see details here and here
    • For executable – sequence of bytes at the entry point, and at the real entry point (for main, wmain, DLLMain, as well as for VB, Delphi code)
    • For PE file – for each of these: their names where applicable, sizes, flags, entropy, strings:
      • sections (for list of known sections see here)
      • import tables
      • export tables
    • For PE file –
      • PE type
      • Image base
      • Compilation/debug time stamps
      • Resources – number, topology
      • Debug strings
  • File Entropy
  • Compiler (PEiD, etc.)
  • Packer, protector
  • File hashes (MD5, SHA1, CTPH, …)
  • Extracted strings
  • Presence and characteristics of appended  data (e.g. installers)
  • Sequences of code
    • Disassembled code
    • Decompiled code
    • Selected code (e.g. map of calls)
  • Detection by various AVs
  • Multimedia properties (e.g. width, height, EXIF data, etc.)

DYNAMIC

  • Accessed IPs
  • Accessed URLs
  • GET and POST Queries
  • User Agents
  • Ports used
  • Created/accessed Mutexes/mutants
  • Created/accessed Atoms
  • Created/accessed Window names
  • Created/accessed Window classes
  • Created/accessed Windows topology
  • Windows’ visibility
  • Windows’ Unicodeness
  • Windows’ topology
  • Windows’ titles
  • Windows’ classes
  • Crypto used + built-in or API-based
  • Popular strings used (e.g. copyright banners as seen here)
  • Execution paths (code, sequences, code blocks, API sequences)
  • Use of location-independent code
  • Use of escalation of privileges tricks
  • Use and type of code injection
  • Use of kernel drivers (including system DLLs)
  • Use of stolen certificates
  • Use of anti-* techniques
  • Use of 0days
  • Use of timestomping
  • Use of dynamically vbuilt strings (run-time)
  • Use of code to adjust privileges)
  • Use of keylogging techniques (and what type: hook, API hook, etc.)
  • Use of external tools (e.g. cmd.exe, reg.exe, net.exe)
  • Use of autoruns.inf
  • Use of DKOM
  • Use of code directly accessing physical drives
  • Use of code directly accessing physical memory
  • Use of code directly accessing BIOS
  • Use of hypervisor
  • MBR – code modification
  • MBR – partition table modification
  • Passwords used for encryption and to access (e.g. FTP/SMTP/IRC)
  • Dropped file locations, names
  • Searched path locations, registry names
  • Targeted applications (e.g. browser, mail, IM and P2P clients, etc.)
  • Added/modified registry entries
  • APIs executed and their arguments
    • Type of APIs (kernel32 win32 APIs or ntdll Zw/NT APIs)
    • Delays used in waiting functions
    • APIs/techniques used for memory allocation (heap, virtual*, stack-based, etc.)
    • APIs/techniques used for self-deletion
    • APIs/techniques used for running other .exes
    • APIs/techniques used for network (winsock or wininet/also Rtl functions from ntdll)
    • APIs/techniques used for network enumeration (Net*, WNet*, Domain*)
    • Process enumeration APis

Let me interrupt you here…

Okay, okay, I get i!!! It is a never ending list!!!

HMFT 0.3 + Extended Attributes, short update

update

fixed the title of the post  – it’s obviously a version 0.3 and not 3.0 🙂

old post

In my last post I talked about detecting Extended Attributes (used by ZeroAccess malware) using HMFT.  Today I got a chance to update it a bit with some more information.

First of all, I clustered some of the ZeroAccess samples I had and I came up with a list of comprehensive (of course it’s limited by a sampleset I have) file locations and their Extended Attributes that are used by the malware:

  • %SYSTEMROOT%\system32\services.exe::731
  • %USERPROFILE%\appdata\local\a4ca9b9c\u::@@@ 
  • %USERPROFILE%\AppData\Local\{0c9c4ca4-c3a9-47cf-2e3e-4db8bf2ad457}\U::001
  • %SYSTEMROOT%\$NtUninstallKB16214$\2764741532\U::CFG

You can find a full list of samples using EAs together with hashes (md5_sha1) here.

Secondly, I added some code to HMFT and now it can dump Extended Attribute’s name (and some printable content of the EA value) as well:

   RESIDENT ATTRIBUTE
      AttributeTypeIdentifierD = 224
      LengthOfAttributeD       = 40
      NonResidentFlagB         = 0
      LengthOfNameB            = 0
      OffsetToNameW            = 0
      FlagsW                   = 0
      AttributeIdentifierW     = 4
      --
      SizeOfContentD          = 16
      OffsetToContentW        = 24
      --
        MFTA_EA
            OfsNextEAD      = 16
            FlagsB          = 0
            EaNameLenB      = 3
            EaValueLenW     = 3
            EaName = FOO
            EaValue= bar

Using newer version of HMFT on one of the ZeroAccess samples gives the following result after postprocessing with eads.pl script:

2013-02-17_zeroaccess_ea1

After HMFT update, eads.pl had to be slightly modified::

use strict;
my $f='';
my $l='';
while (<>)
{
  s/[\r\n]+//g;
  $f = $1 if /FileName = (.+)$/;
  print "$f has $1 record\n" if ($l =~ /(MFTA_EA(_[A-Z]+)?)/);
  print "$f:".":$1\n" if (/EaName = (.+)$/);
  print "$f:$1\n" if ($l =~ /MFTA_DATA/&&/AttributeName = (.+)$/);
  $l = $_;
}

Btw. if you look at the screenshot above you will notice :SummaryInformation ADS used by this sample (5D23ACF4C2221B687BC96A2701786C13/ AB7EEC68F9438E31523D0A67E7612CA666C8F56A) as well – it can be even better seen in the window of Process Monitor during the malware installation:

2013-02-17_zeroaccess_ea2

In terms of APIs used by ZeroAccess to create EAs, I finally came across a few samples that use ZwSetEaFile to do so,. Interestingly. none of the samples used this API to create EA for services.exe – all the samples using this API create the following EA:

  • %USERPROFILE%\appdata\local\a4ca9b9c\u::@@@

(Please refer to the older post for more information about the context of this discussion.)

You can download latest hmft here.