You are browsing the archive for Malware Analysis.

The art of disrespecting AV (and other old-school controls), Part 3

February 4, 2016 in Malware Analysis, Preaching

This is the third part of the series (part 1, part 2) which this time is somehow shorter, but it is just an excuse to jot down some notes about the actual engines that AV uses internally.

Many people complain about AV using hashes to detect malware – I would say that AV that detects malware via hashes only should not be even on the market, because it would not survive. Your average AV contains a significant number of engines, and subengines using many algos – many of which are lightning fast. Reducing the discussion about AV internal working to ‘AV uses hashes’ is simply not fair.

Let’s have a look – I use the word ‘engine’ quite loosely here and it does not necessarily help with pure detection-specific logic, but it often facilitates the detection itself – each of these are typically quite serious programmatic efforts that are combined to create the ‘holistic’ coverage – yes, it fails, it contains vulnerabilities like any other software, but take a moment to think about the effort that goes into designing, testing all this clustergoodness:

  • static binary string search
  • binary string with a simple wildcards search
  • binary string with a regex (or regex-like) search
  • multi-pattern search engines that are using lookup tables of any sort/trees/tries and proprietary algorithms
  • container/archiver processor – reads files or streams embedded inside the other files/containers
  • file/specific content analyzer/processor – for each file type, content type there is a dedicated engine f.ex. MBR, old Dos .COM file, Flash, OLE files, Symbian SIS, ISO, etc. – note that many of engine expire due to technologies being no longer in use/popular, but it is _there_
  • unpacker  – decompresses streams of data to present them to other engines
  • emulator – simple state machines with a basic understanding of some opcodes
  • emulator – full-blown emulator with most opcodes supported
  • sandbox – full-blown emulator with support of API & memory
  • hooks – dynamic, for on-access scans
  • heuristics engine
  • whitelisting engine
  • detection engine based on file properties
  • rootkit detection engine
  • native file system parser (for various file systems)
  • memory dumper/file rebuilders
  • online scanner (virustotal-like)
  • behavioral engines
  • reputation engines
  • quarantine engine
  • crc/incremental crc search
  • hash-based search
  • entropy analysis
  • X-rays
  • and finally… removal and repair engine – if none of the above engines impress you… think for a second what effort goes to ensure you can remove a complex polymorphic or metamorphic file virus from a gazillion of files on the system without corrupting the files and crashing the system.

There are probably others which I forgot about, but this is really a lot more than just hashing.

If you talk about AV detection and the only thing you talk about is hash, it is probably because you smoke too much of it… :)

IDAPython – making strings decompiler-friendly

December 21, 2015 in Malware Analysis, Reversing, Software Releases

Update

As pointed out by 0stracon there is an option in Hexrays that actually enables it to print all strings. Go to Hex-Rays Decompiler Analysis Options and untick ‘Print only constant string literals’.

To make it permanent, enable it in hexrays.cfg:

#define HO_CONST_STRINGS   0x0040   // Only print string literals if they reside
                                    // in read-only memory (e.g. .rodata segment).
                                    // When off, all strings are printed as literals.
                                    // You can override decompiler's decision by
                                    // adding 'const' or 'volatile' to the
                                    // string variable's type declaration
HEXOPTIONS               = 0x....   // Combination of HO_... bits

I was not aware of this option and reinvented the wheel :)

Old post

One of the features of IDA is its ability to recognize strings. This is a great feature, especially useful when you combine it with a power of HexRays decompiler – together they can produce a very nice pseudocode.

There is only one annoying bit there: if strings are recognized and defined inside a writable segment, they will not be presented by the decompiler as strings, but as variable names referring to strings.

Let’s have a look at the example.

In the below example (Dexter sample) IDA recognizes the string “UpdateMutex:”

strings_1When we now switch to the decompiler view, we will see that the decompiler changes it to s__Updatemutex:

strings_1a

(the ‘s__’ prefix comes from the string prefix I typically use i.e. ‘s->’ which decompiler ‘flattens’ to ‘s__’). The s__Updatemutex refers to a string as shown below i.e. “UpdateMutex:” :

strings_2Obviously, a  decompiled code that refers to the actual string is much more readable – see the same piece of code as shown above where data is referred to by actual strings:

strings_2aIn order to make the decompiler use these actual strings (not the reference) we have two options:

  • Make the segment where the string is recognized read-only (by disabling ‘Write’ in segment properties):

strings_3Unfortunately, this will confuse the decompiler and will make the output not trustworthy (it is often truncated). You will also receive a friendly reminder that you are doing something stupid 😉 a.k.a. a red card from the decompiler’s authors:

strings_3a

  • The second option is to use a ‘proper’ method of fixing the issue by telling the IDA that the string is a read-only a.k.a. constant i.e. you can change the type of the string from existing one to the one prefixed with a keyword ‘const’:

strings_4Since most of the time strings are static it is handy to convert all the strings in IDA to read-only i.e. retyping all of them using the ‘const’ trick.

This is exactly what the strings_to_const.py script is intended to do.

It enumerates all segments, finds all strings recognized by IDA (note the comment about the prefix I use, you may need to adapt it to your needs), and then converts them to read-only.

The result?

See below – before and after:

strings_before_after