Playing CAPAeira with Yara rules

April 20, 2021 in Yara sigs

Writing Yara rules is easy. Writing good Yara rules is … testing – both as an adjective and a verb.

There is a class of Yara rules – the one that relies on actual machine code – that we can do better now.

How?

Your typical approach to writing code-based Yara sigs is relying on byte streams of machine code extracted from analyzed programs – usually a very specific code sequence of interest (e.g. RC4 algo, Luhn check routine, etc.). We then ‘patch’ offsets in jumps, calls, etc. to account for their variability.

Such Yara rules are common and pretty handy. They work most of the time, but there is a caveat. Compiler and malicious coder’s tricks may shift machine code around and as a result, some code sequences may differ. As such, a pretty decent Yara rule based on a very specific program code may fail on newer samples.

In order to improve efficiency of code-based Yara signatures we can now use capa.

You may be laughing now – capa itself is a detection engine. Given a bunch of samples, we could just run our capa rules over them and get detections we need. The problem is the speed. The second problem is that while Yara is supported by nearly everything that blinkenlights, Capa is not.

The best approach is therefore to analyze the code, write your good capa signature. And then, use it to test your Yara rules. Your Yara rule must detect the very same sampleset that Capa hits on. This is an iterative process, but allows to cherry-pick variants and subtle differences in implementation that can then lead you to improve your Yara sigs. Moreso, if you have other ways to detect samples as belonging to a certain malware family, you can then correlate it against your family-specific Capa- and Yara- rulesets and highlight missing Yara rules. Using the Capa output you could auto-generate Yara rules as well (although this is a bit silly w/o manual oversight /it would literally be like hashing, if blindly automated/).

The task of correlating the capa and yara detections/rules can be delegated to existing Python libraries – something along these lines:

import yara
import capa.main
import capa.rules
from capa.features import ARCH_X32, ARCH_X64, String
from capa.features.insn import Number, Offset
...
yr = yara.compile(filepath='foo.yar')
fm = yr.match(filename)
if fm:
   ... fm[0] ...

cr = capa.main.get_rules('foo.yml', 
    disable_progress=True)
cr.rules.RuleSet(cr)
ex = cr.main.get_extractor (fn, "auto", 
    disable_progress=True)
ca, ccn = cr.main.find_capabilities(cr, ex, 
    disable_progress=True)
try:
   ... capabilities.keys() ...

<print output, match, whatever>
...

Yara & maldoc pics

April 7, 2021 in Yara sigs

Update

It took only a few minutes for @0xkyle to point me to Halogen project. Nice one!

Old post

This is a little trick that you may find handy for clustering malicious documents. I am pretty sure many people use it, and I am too lazy to google it, so here is your potential infosec dose of redundancy 🙂

Most of macromaldocs come with a picture attached to it. The one I received today is this:

You can write a signature for similar docs by focusing not on macros, metadata, etc. but the actual picture. They usually come in as either PNG or JPEG, and often carry additional metadata that is often visible in plain text – and most importantly, they are pretty clearly identifiable inside the malicious document’s body (OLE docs, and inside the Office ZIP archives).

The easiest to find them is look for PNG (‘PNG’) and JPEG (\xFF\xFD) headers or look for references to Adobe inside XML snippets that are often accompanying them.

How do you create a sig for it?

Let’s show what we are after first. Choose some random place like 50-75% length of the file and fill it in with zeroes. Now open it in the image viewer and you should see something like this:

The only reason why I do it here is to demonstrate which data you are overwriting. It’s clear the image data is not properly rendered since I have corrupted it. It’s a good spot.

All you have to do now is write yara using a few bytes you extracted from that exact spot:

rule pic
 {
 strings:
      $ = { AE 31 5A F4 2D 1A 4F 8B A6 48 B5 6C 01 6A 99 02 }
 condition:
      any of them
 }

I ran it on a few samples I received recently, and despite them being scrambled and randomized they got picked up all the time.

It obviously won’t work all the time, but if you have a larger corpora of macro samples you can play around with and also automate the yara sig creation.