Detect me not

Update

After pinging CRDF via email I got more detailed reply. Turns out their engine added my domain because of the VirusTotal report that highlights 4 detections of a POC file for LdrRegisterDllNotification API I posted in 2015.

Thank you to CRDF for engaging with me and despite our difference leading this to closure.

I have updated my pooplist accordingly, and after removing CRDF I have added the following guys:

  • Fortinet
  • Sophos
  • Scumware.org
  • Trustwave

Finally, need to reiterate that VirusTotal approach towards False Positives is absolutely disgraceful. At the very least they should offer owners of websites/software to follow up with vendors directly from the VirusTotal interface. After all, this is where these damaging detections are presented.

Update

CRDF came back to me on Twitter and we had a battle of wits. I was inclined to understand why the domain was added in a first place to which I have not received any explanation so far. VirusTotal didn’t come back at all.

Update

I submitted the URL to CRDF Labs and now it came back as FP. They say it may take 4 hours to propagate the info. I asked them why it was marked in a first place; no reply yet.

Old post

Yesterday I checked my web site on VirusTotal.

To my surprise there was one detection — I wrote about this on Twitter, and asked VirusTotal to address it. I had no idea what CRDF is at that moment, because VT doesn’t provide any immediate pointers on their report. Later on I googled around and discovered that CRDF is a French web site offering some sort of threat feeds. Today morning I checked what they have to say about my web site and this is what they state:

What the heck?

Where is the evidence to support this claim?

And it is like this for a year?

VT uses CRDF, and CRDF says that about 1 year ago my web site entered their malicious web site repo. This implies that whoever checked my web site on VT during last year, either directly (via their UI), or indirectly (API/3rd party vendor) would be warned that my site is a risk. This may sound trivial, because this is one single detection, but we live in an environment where VT feed is omnipresent. Tones of software solutions leverage VT as a part of their detection/prevention model.

Do I make this stuff up?

Have a look at this…

What happens here? A staff member of MalwareBytes shows how they approach classification. To support their b/s they reframe the discussion making it sound like a lack of TLS certificate is a justification to classify the site as bad. Moreso, they provide a link to VT and guess what… there is one detection.

This is not security. This is irresponsibility. I may sound like I am sitting on a high horse, but I used this situation to highlight how stupid the whole security detection methodology became.

And how do you classify web sites? Yes, community score is a great indicator. But… who vets that? Talking about you, Sdrickert01 (user on VT that gave a score of -1).

VT, MalwareBytes, Fortinet, Sophos, Scumware.org, Trustwave – welcome to my poop list.

And to add some positive bit… the list below shows vendors that quickly engaged and addressed my concerns:

  • CRDF

I present you Splunk panel data

I never liked dashboards so it took me a long time to get convinced that there is a lot of value in them. There are at least three reasons why I didn’t like them, and I think it’s important to highlight them here as it is as well may be that these reasons are why you are not the biggest fan of them either.

My reasons are:

  • too much data (too much data)
  • data is often presented in a way that is not actionable (too many fields, presentation layer doesn’t help analysts)
  • UI over-engineering (pew pew maps, large font, etc. lots of stuff stealing precious space)

As I progressed with my Splunk experience and under a good influence of other expert splunkers, today I look at dashboards with a much friendly eye…

My approach changed so much that I reserve alerting to high-fidelity stuff only; something that is important to justify triggering an alert, even email sent. For the other stuff, one that is typical threat hunting business as usual I just use panels.

When you plan an actionable panel one thing that stands out immediately is the fact that data we present on these mini-canvas is really hard to squeeze in there in a first place. Moreso, it has to fight for space with many other panels and as a result, badly designed panels make badly designed dashboards and in the end, no one wants to look at them.

Let’s look at how we can declutter it all.

Data reduction

I have covered LFO and normalization in my other post. Normalize, remove repetitions, and keep the least possible number of outliers.

Too many fields

Many splunkers are obsessed with presenting every single field on their panels. I am wondering why. The role of a threat hunting dashboard is to help analysts with eyeballing large amount of data, so it is not even triage stage. We are just looking for anything that stands out, a needle that will trigger the actual triage. Less data on the screen doesn’t make this activity any less actionable. We often don’t need to show time, full paths, all hosts where the file was found, its file size, or file attributes, and all the other gore details that are present in logs. You also don’t need a field with a Mitre Att&ck tactic & technique name either. Less is more. Depending on your panel, it may as well be just a file name, a domain name, or a data transfer volumes. Avoid enriching data, unless you have space for that.

Very long fields

This typically applies to urls, very long process names, and incredibly long command line arguments, often observed to be used by java.exe, chrome.exe processes.

The trick number one is to completely exclude these that are very long; this is risky though as malware could use the very same trick.

Another approach is to analyze command line arguments and normalize them. So, yes normalization is very helpful not only for LFO, but also for actual presentation.

You can replace command line arguments with place holders f.ex.:

foo.exe -pid 0x1234 -session 0x798789da9 arg1

could become:

foo.exe -pid <hex> -session <hex> arg1

You can also go a step forward and remove all known placeholders. That is, first normalize, then remove individual tuples of known argument names and their placeholder values. For example shown above it would leave us with a much shorter version to look at:

foo.exe arg1

That result really saved us a lot of space and… didn’t make the data any worse. In fact, we saved not only space, but also time needed to eyeball this chunk of information.

And yes, you could go even further and use the replace function and remove _all_ arguments no matter what name as long as they conform to certain command line regex pattern e.g..

(?i)\s+-+[a-z][a-z0-9,\'=-]+
\s+/[a-z]+:\d+

You could apply similar logic to url variables (separated by ‘&’ and in a form of variable=value).

Yet another take is to shorten the actual content. First we calculate length of reported data. If it is shorter that we care about and fits nicely on a panel, we just do nothing. If it is too long though, there are at least two way to deal with it:

  • use substr to truncate the string (we will only see its prefix)
    • longstring becomes longstr…
  • use substr twice and get smaller chunks from beginning and end of the string (we will see its prefix and suffix)
    • longstring becomes lon…ing

And finally… for certain data types, you can shorten them by:

  • extracting only important features e.g. extension from a file name, file name from a full path, domain from an url, a subset of variables from url, etc.
  • normalizing longer path chunks into made-up, shorter placeholders e.g. c:\windows\system32 –> %sys32%
  • removing superfluous suffixes e.g. TLD, SLD, or even the whole name of domain if it is repetitive

Too many rows of data

We don’t always need to show all rows e.g. multiple host names, account names, command lines, etc. A very handy snippet shown below saves us a lot of space, and also informs us that there is more data:

| eval field=if(mvcount(field)<4,field,mvappend(mvindex(field,0,3),"…"))

It extracts first 3 rows from a multivalue data and truncates the rest.

Consistency of UI metaphore

Data enrichment is easy to spot. It’s just there.

Data reduction/depletion doesn’t manifest itself in any way, until we tell the analysts that it happened. In both cases above you may notice that any data that is truncated is always substituted with ‘…’. This bit acts as a hint for analysts that in order to see the whole data set they need to remove the limitation and run the query in a separate Splunk window.