You are browsing the archive for Preaching.

The story of an underTAG that tried to wear a mitre…

March 10, 2019 in Mitre Att&ck, Preaching, Random ideas, Sysmon, threat hunting

Today we tag everything with Mitre Techniques. I like it, but I would also want a bit more flexibility. So, I like to mix the ‘proper’ Mitre tags with my own tags (not only subtechnique tags, but also my own specific tags).


Say… you detect that System.Management.Automation.dll or is loaded into a process. Does it mean it is T1086: PowerShell? Or does it mean, that the process is using a DLL that offers automation capabilities? And may not even do anything dodgy? I suspect (s/suspect/know/) that calling it T1086: PowerShell w/o providing that extra info loses a lot of context.

Why not calling it what it is? <some magic prefix>: Powershell Automation DLL?

Is loading WinSCard.dll always an instance of T1098: Account Manipulation, or T1003: Credential Dumping? Or is it just a DLL that _may_ be used for nefarious purposes, but most of the time it is not (lots of legitimate processes use it; sysmon logs analysis is a nice eye opener here).

Why not calling it <some magic prefix>: Smart Card API DLL?

As usual, at the end of this tagged event, or events’ cluster, there is a poor soul, and the underdog of this story that is tasked with analysing it. If our tagging is as conservative as the mindset of politicians who studied at Eton… then so will be the quality of analysis, statistics, and actual response.

And it is easy to imagine confusion of analysts seeing events tagged with a vague name. For example, net.exe command that accesses user/account data, and the loading of WinSCard.dll may make them assume that there is a cluster of ‘account manipulation’ events indicating an attack. A triage of such vague cluster of events is a time wasted… There is a response cost, and there is an opportunity cost at play here. The example is over-simplistic of course, but the devil is in the details. Especially for the analysts.

I’d say… given the way most of events are logged today, often w/o much correlation and data enrichment at source, the event collection process should make any attempt possible to contextualize every single event as much as possible.

We can argue that combining data at its destination, in aggregation systems, SIEMs, collectors, or even ticketing systems, or on a way to them, is possible and actually, desirable… today’s reality is that we need that single event to provide as many answers as possible…

This is because we know that Data Enrichment at the destination is a serious pain in the neck and relies heavily on a lot of dependencies (up to date asset inventories, list of employees, DHCP mapping, then there is a lack or poor support of nested queries, poor performance of nested queries, and this forces us to use a lot of lookup tables that need to be up to date and require a lot maintenance). And if we need to go back to the system that generated event to fetch additional artifacts, or enrich data manually, then we are probably still stuck in a triage process A.D. 2010…

So… if we must tag stuff, let’s make it work for us, our analysts, and make it act as a true data enrichment tool that it was meant to be… If the event, their cluster, or detection based on them is not actionable, then it’s… noise.

A short wishlist for signature/rules writers

January 21, 2019 in Preaching

Oh, no… yet another rant.

Not really.

Today I will try to discuss a phenomenon that I observe in a signature/rules writing space – one that used to be predominantly occupied by antivirus companies. And today it is a daily bread for malware analysts, sample hoarders, threat intel folks, and threat hunters crowd as well. Plus, lots of EDR and forensics/IR solutions use these, and they come really handy in memory analysis, as well as retrohunting and sorting samples collections.

The phenomenon I want to talk about relies on these two factors:

  • writing signatures / rules is very easy
  • there is no punishment for writing bad ones

This phenomenon is, as you probably guessed by now, the uneven quality of signatures / rules. And I am very careful with words here. Some of these rules are excellent, and some, could be better.

Before I proceed further, let me ask you a question:

What is the most important part of a good yara signature?

  • Is it a unique string, or a few of them?
  • A clever Boolean Condition?
  • A filter that cherry-picks scanned files, e.g. for the Windows Executables looks for the ‘MZ’ files, then ‘PE’ header?

These are all equally important, and used in most of the yara rules you will find today. I find it interesting though that most rules don’t include the ‘filesize’. And I wonder why? This filter helps to exclude tones of legitimate files, and malicious files that are outside of the file size range used by the family the specific yara rule covers. If applied properly, it will potentially skip expensive searches inside the whole file.

Update: I stand corrected, it would seem the ‘filesize’ is checked _after_ the strings are checked (thx Silas and Wesley — <hit this thx bit to visit twitter convo>). This is a poor performance optimization choice, in my view. Still, see below what I wrote below about this scenario exactly – it doesn’t matter if yara performs poorly on this condition today, they may improve it tomorrow. Additionally, the way we use yara rules and how they are compiled matters! In a curated ruleset the issues I am referring to don’t make much difference. It does make a difference with individual scans on e.g. file system. In my experience, many of the rules I get from the public sources can’t be combined/compiled into a one single bulky rule, because of conflicts. So I tend to run yara.exe many times, each time using different yara rule files. See the Twitter convo for some interesting back and forth between us. Thanks guys!

I think the practical reason why many analysts forget about this condition is pretty basic. It’s very rare for any of us to write rules that must be optimized, and checked for quality. While the signatures in AV industry go through a lot of testing before they are released, our creations are deployed often as soon as they are written and tested on a bunch of sample files only, and very rarely on larger sampleset that include lots of files, including large ones, clean ones, and tricky ones (intended to break parsers & rules e.g. corrupted files).

Update: Important to mention that based on our Twitter convo, it again depends very much on the circumstances. It is possible that in your environment, or your needs do not require checking this.

Our rules are typically for a local consumption, so performance or accuracy are not necessary a priority. But performance is important. And even more – a different mindset.

We write rules to detect i.e. include matching patterns, but not to exclude non-matching ones. And the latter is important – the faster we can detect that the file doesn’t match our rule, the faster the yara engine can finish testing the file and move on to the next.

And even if yara engine was the worst searching engine ever, and was actually reading the whole file, and the ‘filesize’ condition was not really helping performance, it would still make sense to write rules in ‘the best effort’ way. There is always a new version of the engine, the authors take the feedback in, and one day a future version may optimize code and comparisons for exactly this condition.

Coincidentally, this is actually one of the principles that most antivirus engineers learn very early in their career: learn to exclude non-matching stuff, and learn to do it early in your detection rule.

The sole intention of this post is to highlight the importance of thinking of signatures / rules not only in a category our ways to quickly detect stuff, but in a wider context – a way to ignore the non-matching stuff. The earlier in the rule, the better.

Update: After the Twitter convo I now know I chose a wrong example to illustrate my point. I should have used f.ex. common ‘good’ strings that we can sometimes find in public yara rules (these strings can be found in both malware and good files because they are part of a library). The hits that these strings generate on ‘good’ files can be avoided by testing rules on larger corpora of samples, including ‘good’ files. There are plenty of other examples.