Detecting Extended Attributes (ZeroAccess) and other Frankenstein’s Monsters with HMFT

The topic of Extended Attributes (EA) has been recently covered in an excellent post by Corey. Entitled Extracting ZeroAccess from NTFS Extended Attributes it goes into (amazing) depth explaining on what EA is and how to extract this artifact from the system. It’s a pure forensic gold and if you haven’t read this post yet, please go ahead and do so before reading mine.

Similarly to Corey, I was very interested in researching EA, and I finally took some time tonight to have a deeper look at it myself. I actually wanted to dig in the code more than the $MFT artifacts alone not only to have something to write about (after all, Corey already covered everything! :-)), but also because I wanted to see how the EA is actually created and what system functions/APIs are used by malware. The reason behind this curiosity was improvement of my analysis tools and techniques, and a few other ideas that I will be quiet about for the moment.

I first assumed that the ZeroAccess’ EAs are created using ZwSetEaFile/NtSetEaFile function from ntdll.dll. I saw this API name popping up on some blogs and I saw it being referenced in my ZeroAccess memory/file dumps so it was a natural ‘breakpoint’ choice for OllyDbg analysis:

zeroaccess_ea_1

To my surprise, none of the samples I checked used this function at all!

Curious, I started digging into it a bit more and realized that for the samples I looked at, the EAs are actually created not by  ZwSetEaFile/NtSetEaFile function, but by ZwCreateFile/NtCreateFile.

Surprised?

I was!

Looking at a documentation, you can see the following function parameters described on MSDN:

NTSTATUS NtCreateFile(
  _Out_     PHANDLE FileHandle,
  _In_      ACCESS_MASK DesiredAccess,
  _In_      POBJECT_ATTRIBUTES ObjectAttributes,
  _Out_     PIO_STATUS_BLOCK IoStatusBlock,
  _In_opt_  PLARGE_INTEGER AllocationSize,
  _In_      ULONG FileAttributes,
  _In_      ULONG ShareAccess,
  _In_      ULONG CreateDisposition,
  _In_      ULONG CreateOptions,
  _In_      PVOID EaBuffer,
  _In_      ULONG EaLength
);

Yes, it’s that simple.

One thing to note – the EA is added to files on both windows XP and Windows 7, but only under Windows 7 I observed the modification of services.exe. On Windows XP, it only appended EA to the  ‘U’ file and nothing else.

Okay, I mentioned I had a couple of ideas why I wanted to research this feature. Now it’s time to reveal them!

Idea #1 – POC

Once I found out what APIs are being used by the malware, I was also able to produce a simple snippet of code that reproduces the functionality:

.586
.MODEL FLAT,STDCALL

 o equ OFFSET
 include    windows.inc
 include    kernel32.inc
 includelib kernel32.lib
 include    ntdll.inc
 includelib ntdll.lib
 include    masm32.inc
 includelib masm32.lib

IO_STATUS_BLOCK STRUCT
    union
    Status        dd ?
    Pointer        dd ?
    ends
    Information    dd ?
IO_STATUS_BLOCK ENDS

.data?
 file db 256 dup (?)
 fa   db 256 dup (?)
 _FILE_FULL_EA_INFORMATION struct
   NextEntryOffset dd ?
   Flags           db ?
   EaNameLength    db ?
   EaValueLength   dw ?
   EaName          db ?
 _FILE_FULL_EA_INFORMATION ends
 FEA equ _FILE_FULL_EA_INFORMATION
 io IO_STATUS_BLOCK <>
.code
  Start:
  invoke GetCL,1, o file
  lea    edi,[fa+_FILE_FULL_EA_INFORMATION.EaName]
  invoke GetCL,2, edi
  invoke lstrlenA,edi
  lea    esi,[fa+_FILE_FULL_EA_INFORMATION.EaNameLength]
  mov    [esi],al
  add    edi,eax
  inc    edi
  invoke GetCL,3, edi
  invoke lstrlenA,edi
  lea    esi,[fa+_FILE_FULL_EA_INFORMATION.EaValueLength]
  mov    [esi],al
  add    edi,eax
  invoke CreateFileA, o file, \
                      GENERIC_WRITE, \
                      0, \
                      NULL, \
                      CREATE_NEW, \
                      FILE_ATTRIBUTE_NORMAL, \
                      NULL
  xchg   eax,ebx
  mov    eax,edi
  sub    eax,o fa
  invoke NtSetEaFile,ebx,o io,o fa, eax
  invoke CloseHandle,ebx
  invoke ExitProcess,0
END Start

This code can be used for testing purposes in a lab environment.

You can either compile the code yourself using masm32 or you can use a precompiled binary – download it here.

To run:

ea.exe <full path name to a file> <EA name> <EA value>

e.g.:

ea.exe g:\test.txt foo bar

Remember to specify a full path to a file. Also, choose a non-existing file name for a file (the program won’t work with files that are already present).

Last, but not least – there is no error checks, you can add it yourself if you wish 🙂

Idea #2 – Reduce the FUD factor

While it is a novelty technique, it is not very advanced –  a single API call does all the dirty job to _create_ the EA.

To _detect_ EA is not very difficult either – as long as you have a right tool to do so 🙂

Idea #3 – Show how to detect EA on a live system

Now that I got a POC, I can run it:

g:\test.txt foo bar

and then analyze changes introduced to the file system.

I can do it quickly  with hmft.

hmft -l g: mft_list

I tested the program on a small drive that I use for my tests. I formatted it first to ensure its MFT is clean:
hmft_ea_1

I then opened the mft_list file in a Total Commander’s Lister and searched for MFTA_EA. hmft_ea_2

I am pasting the full record for your reference:

  [FILE]
    SignatureD                    = 1162627398
    OffsetToFixupArrayW           = 48
    NumberOfEntriesInFixupArrayW  = 3
    LogFileSequenceNumberQ        = 1062946
    SequenceValueW                = 1
    LinkCountW                    = 1
    OffsetToFirstAttributeW       = 56
    FlagsW                        = 1
    UsedSizeOfMFTEntryD           = 368
    AllocatedSizeOfMFTEntryD      = 1024
    FileReferenceToBaseRecordQ    = 0
    NextAttributeIdD              = 5
   --

    RESIDENT ATTRIBUTE
      AttributeTypeIdentifierD = 16
      LengthOfAttributeD       = 96
      NonResidentFlagB         = 0
      LengthOfNameB            = 0
      OffsetToNameW            = 0
      FlagsW                   = 0
      AttributeIdentifierW     = 0
      --
      SizeOfContentD          = 72
      OffsetToContentW        = 24
      --
        MFTA_STANDARD_INFORMATION
            CreationTimeQ         = 130036100539989520
            ModificationTimeQ     = 130036100539989520
            MFTModificationTimeQ  = 130036100539989520
            AccessTimeQ           = 130036100539989520
            FlagsD                = 32
            MaxNumOfVersionsD     = 0
            VersionNumberD        = 0
            ClassIdD              = 0
            OwnerIdD              = 0
            SecurityIdD           = 261
            QuotaQ                = 0
            USNQ                  = 0
            CreationTime (epoch)    = 1359136453
            ModificationTime (epoch)  = 1359136453
            MFTModificationTime (epoch)  = 1359136453
            AccessTime (epoch)           = 1359136453
   --

    RESIDENT ATTRIBUTE
      AttributeTypeIdentifierD = 48
      LengthOfAttributeD       = 112
      NonResidentFlagB         = 0
      LengthOfNameB            = 0
      OffsetToNameW            = 0
      FlagsW                   = 0
      AttributeIdentifierW     = 2
      --
      SizeOfContentD          = 82
      OffsetToContentW        = 24
      --
        MFTA_FILE_NAME
            ParentID6             = 5
            ParentUseIndexW       = 5
            CreationTimeQ         = 130036100539989520
            ModificationTimeQ     = 130036100539989520
            MFTModificationTimeQ  = 130036100539989520
            AccessTimeQ           = 130036100539989520
            CreationTime (epoch)    = 1359136453
            ModificationTime (epoch)  = 1359136453
            MFTModificationTime (epoch)  = 1359136453
            AccessTime (epoch)           = 1359136453
            AllocatedSizeQ        = 0
            RealSizeQ             = 0
            FlagsD                = 32
            ReparseValueD         = 0
            LengthOfNameB         = 8
            NameSpaceB            = 3
     FileName = test.txt
   --

    RESIDENT ATTRIBUTE
      AttributeTypeIdentifierD = 128
      LengthOfAttributeD       = 24
      NonResidentFlagB         = 0
      LengthOfNameB            = 0
      OffsetToNameW            = 24
      FlagsW                   = 0
      AttributeIdentifierW     = 1
      --
      SizeOfContentD          = 0
      OffsetToContentW        = 24
      --
        MFTA_DATA
   --

   
    RESIDENT ATTRIBUTE
      AttributeTypeIdentifierD = 208
      LengthOfAttributeD       = 32
      NonResidentFlagB         = 0
      LengthOfNameB            = 0
      OffsetToNameW            = 0
      FlagsW                   = 0
      AttributeIdentifierW     = 3
      --
      SizeOfContentD          = 8
      OffsetToContentW        = 24
      --
        MFTA_EA_INFORMATION
   --

    RESIDENT ATTRIBUTE
      AttributeTypeIdentifierD = 224
      LengthOfAttributeD       = 40
      NonResidentFlagB         = 0
      LengthOfNameB            = 0
      OffsetToNameW            = 0
      FlagsW                   = 0
      AttributeIdentifierW     = 4
      --
      SizeOfContentD          = 16
      OffsetToContentW        = 24
      --
        MFTA_EA

There are two EA-related entries here:

  • MFTA_EA_INFORMATION
  • MFTA_EA record

Manual analysis like this are quite tiring, so we can write a short perl snippet that can help us with postprocessing:

use strict;
my $f='';
my $l='';
while (<>)
{
  s/[\r\n]+//g;
  $f = $1 if /FileName = (.+)$/;
  print "$f has $1 record\n" if ($l =~ /(MFTA_EA(_[A-Z]+)?)/);
  $l = $_;
}

Saving it into ea.pl file, and running it as:

ea.pl mft_list

produces the following output:

hmft_ea_3

Idea #4 – Detect ZeroAccess with hmft

It’s simple 🙂

  • I ran hmft before the ZeroAccess installation
  • Then I infected my test box
  • I then ran hmft after the ZeroAccess installation

zeroaccess_ea_2

At this stage, all I had to do was to run ea.pl on both outputs and I got the following results:

zeroaccess_ea_3

Or, for the sake of copy & paste (and web bots :)):

r:\>ea.pl before_installation
V20~1.6 has MFTA_EA_INFORMATION record
V20~1.6 has MFTA_EA record

r:\>ea.pl after_installation
U has MFTA_EA_INFORMATION record
U has MFTA_EA record
V20~1.6 has MFTA_EA_INFORMATION record
V20~1.6 has MFTA_EA record
U has MFTA_EA_INFORMATION record
U has MFTA_EA record
services.exe has MFTA_EA_INFORMATION record
services.exe has MFTA_EA record/span>

As we can see, the malware activity is immediately visible.

Btw. V20~1.6 is a $MFT FILE record that refers to C:\Windows\CSC\v2.0.6 and is related to Offline files (client-side caching). I don’t have any information about the content of this EA. Perhaps someone will be more curious than me to poke around there 🙂

Idea #5 – Create a Frankenstein’s monster

Using EA and ADS (Alternate Data Streams) with a single file is also possible.

You can use ea.exe to create such Frankenstein’s monster in 2 simple steps:

  • by running it first with a  filename only – this will create EA record
  • and then re-runing it with a stream name, this will create the ADS, but EA for ADS will fail (sometimes it’s OK to fail :))

The result is shown on the following screenshot:
ea_frankensteins_monster_1

Using hmft and a combination of ea.pl and ads.pl (posted in older post related to HMFT) in a single eads.pl script:

use strict;
my $f='';
my $l='';
while (<>)
{
  s/[\r\n]+//g;
  $f = $1 if /FileName = (.+)$/;
  print "$f has $1 record\n" if ($l =~ /(MFTA_EA(_[A-Z]+)?)/);
  print "$f:$1\n" if ($l =~ /MFTA_DATA/&&/AttributeName = (.+)$/);
  $l = $_;
}

we can easily detect such beast as well.

That’s all, thanks for reading!

hstrings (release) – when all strings are attached…

In a recent post, I introduced a new tool – hstrings. Its purpose is to find strings of any sort, not only ANSI (ASCII really) and a Basic Latin subset of Unicode, but many encoding variants as well. Today I am releasing a first version of the tool and in this post I will provide more information about currently available options and modes of operations.

First of all, I  encourage you to read Microsoft’s page listing Code Page Identifiers (Windows) – this is a list that I used as a foundation for hstrings; the tool goes a bit further and splits these into multiple families and also tries to split Unicode sets into more manageable chunks, yet Code Page Identifiers are the best starting point to choose what strings one wants to search.

The tool works in multiple modes and requires a few options that will decide how the input is processed and how the output is generated, plus what encoding are included in the search.

Let’s see a few examples first…

Character Set recognition

Imagine you have a file that is encoded, but you are not sure what character set is being used for encoding and you have no clue what language it may be at all.

The approach one may take to find out more about the file encoding is… a simple brute force which means checking all possible encodings and trying to convert only a small chunk of bytes from the input file to see what happens.

This is how ‘probing’ option mode works in hstrings. Once you select the option, the tool will read 32 bytes of the input file and try to decode it using all the chosen encodings and send it to the standard output or to separate files (depends on output options discussed later).

In the previous article I presented a sample Russian text encoded with various encodings.

If we try to run the hstring over one of these files

hstrings -qpsC test\russian_u16be.txt > out

we will get the following output:

As we can see, the longest meaningful string was produced by Unicode Cyrillic. Indeed, the file name contains suffix ‘u16be’ which is how I named the sample file encoded with a 16-bit Unicode Big Endian encoding.

We can then try running the same command on the data saved with a different encoding:

hstrings -qpsC test\russian_utf8.txt > out

Of course, this time we are not lucky as the ‘C’ option we used only applies Cyrillic encodings (see option details at the bottom of the post), and the result shows that none of them succeeded:

We can extend the list – and since it’s just an example we can be greedy – by using all encodings (option ‘0’)

hstrings -qps0 test\russian_utf8.txt > out

Browsing through results we can see that this time we got the UTF-8 encoding giving quite a good output

Indeed, my naming convention reveals that it is a Russian text saved using UTF8 encoding.

Certainly, what helps in character set recognition is at least basic knowledge on how texts in various languages look like; anyone who saw Russian text previously shouldn’t have a problem picking up the correct output (encoding) presented in this example, but if you have never seen Cyrillic text before, this can be quite challenging. One way of improving the algorithm I have in mind is adding some wordlists to additionally recognize the known words in a specific language.

Extracting all strings

One aspect of the character set recognition is the actual detection of the matching encoding, now one can simply extract all strings in this encoding from the whole file. You can do it by replacing ‘p’ (probing character set) with ‘d’ (dump strings).

Since we now know that the last file has been encoded with UTF8, we can extract all strings using ‘8’ options which means UTF8:

hstrings -qds8 test\russian_utf8.txt > out

The output looks like this:

Due to a number of encodings supported by hstrings, at the moment there is no possibility of specifying a single character set, except for very popular ones and this includes UTF8; I may add option for specific code pages/encodings if there is a demand.

 OPTIONS

Let’s walk through them one by one

  • GENERAL OPTIONS:
    •  – q – quiet (no banner) – basically no copyright information
  • INPUT OPTIONS – dictate whether we read the whole input file or just first 32 bytes
    • – p – probe first 32 bytes of a file
    • – d – dump strings from the whole file
  • OUTPUT OPTIONS provide a choice to save the output in a single file (standard output one can redirect to a file), or multiple files (in such cse file names will have a ‘h_’ prefix and a code page as a name
    • – s – dump strings to standard output (use pipe to save to file)
    • – m – dump strings to multiple files (one encoding=one file)
  • ENCODINGS – these are grouped by families

    • – 0 – All supported encodings
    • – 1 – All Windows ANSI, UTF8, ASCII subset of Uni-LE/Uni-BE
    • – 2 – All Windows ANSI encodings
    • – 7 – UTF7
    • – 8 – UTF8
    • – U – Unicode encodings (except utf8/utf7)
    • – I – All IBM encodings
    • – E – IBM EBCDIC encodings (subset of I)
    • – M – MAC encodings
    • – A – Arabic encodings
    • – C – Cyrillic encodings
    • – H – Hebrew encodings
    • – J – Japanese encodings
    • – K – Korean encodings
    • – Z – Chinese encodings

Final word

This is an experimental tool and it is far from a final – I am personally aware of a few bugs and imperfections that I need to address (e.g. Unicode maps are far from perfect and sometimes produce too much output; generally too much output is still an issue), but if you want to test it feel free and I will appreciate any feedback. Thanks!

Download

You can download the tool here.