Some people write badly about you. It stings. But then you wonder… what, why, por que? Is this something I said, something I implied, something I thought of?
In the era of rapid judgment AD 2020 I found myself a subject to Twitter blocks and criticism on more than one occasion. My lesson learned bit is that dudes (so far dudes only) blocked me on Twitter because they don’t agree with my opinion/take/whatever. I always believed that if we were about to drive someone to obscurity by social media banning then it would require us to follow a meticulously explored path of questioning and probing, you know, to understand their point of view, but hell no…. it’s far easier to just block & forget.
Cuz Twitter.
As such I reply to my blockers: come at me with arguments and not blocks. I am not always right, but will listen and will change my mind, if you make me so…
I never liked dashboards so it took me a long time to get convinced that there is a lot of value in them. There are at least three reasons why I didn’t like them, and I think it’s important to highlight them here as it is as well may be that these reasons are why you are not the biggest fan of them either.
My reasons are:
too much data (too much data)
data is often presented in a way that is not actionable (too many fields, presentation layer doesn’t help analysts)
UI over-engineering (pew pew maps, large font, etc. lots of stuff stealing precious space)
As I progressed with my Splunk experience and under a good influence of other expert splunkers, today I look at dashboards with a much friendly eye…
My approach changed so much that I reserve alerting to high-fidelity stuff only; something that is important to justify triggering an alert, even email sent. For the other stuff, one that is typical threat hunting business as usual I just use panels.
When you plan an actionable panel one thing that stands out immediately is the fact that data we present on these mini-canvas is really hard to squeeze in there in a first place. Moreso, it has to fight for space with many other panels and as a result, badly designed panels make badly designed dashboards and in the end, no one wants to look at them.
Let’s look at how we can declutter it all.
Data reduction
I have covered LFO and normalization in my other post. Normalize, remove repetitions, and keep the least possible number of outliers.
Too many fields
Many splunkers are obsessed with presenting every single field on their panels. I am wondering why. The role of a threat hunting dashboard is to help analysts with eyeballing large amount of data, so it is not even triage stage. We are just looking for anything that stands out, a needle that will trigger the actual triage. Less data on the screen doesn’t make this activity any less actionable. We often don’t need to show time, full paths, all hosts where the file was found, its file size, or file attributes, and all the other gore details that are present in logs. You also don’t need a field with a Mitre Att&ck tactic & technique name either. Less is more. Depending on your panel, it may as well be just a file name, a domain name, or a data transfer volumes. Avoid enriching data, unless you have space for that.
Very long fields
This typically applies to urls, very long process names, and incredibly long command line arguments, often observed to be used by java.exe, chrome.exe processes.
The trick number one is to completely exclude these that are very long; this is risky though as malware could use the very same trick.
Another approach is to analyze command line arguments and normalize them. So, yes normalization is very helpful not only for LFO, but also for actual presentation.
You can replace command line arguments with place holders f.ex.:
foo.exe -pid 0x1234 -session 0x798789da9 arg1
could become:
foo.exe -pid <hex> -session <hex> arg1
You can also go a step forward and remove all known placeholders. That is, first normalize, then remove individual tuples of known argument names and their placeholder values. For example shown above it would leave us with a much shorter version to look at:
foo.exe arg1
That result really saved us a lot of space and… didn’t make the data any worse. In fact, we saved not only space, but also time needed to eyeball this chunk of information.
And yes, you could go even further and use the replace function and remove _all_ arguments no matter what name as long as they conform to certain command line regex pattern e.g..
(?i)\s+-+[a-z][a-z0-9,\'=-]+
\s+/[a-z]+:\d+
You could apply similar logic to url variables (separated by ‘&’ and in a form of variable=value).
Yet another take is to shorten the actual content. First we calculate length of reported data. If it is shorter that we care about and fits nicely on a panel, we just do nothing. If it is too long though, there are at least two way to deal with it:
use substr to truncate the string (we will only see its prefix)
longstring becomes longstr…
use substr twice and get smaller chunks from beginning and end of the string (we will see its prefix and suffix)
longstring becomes lon…ing
And finally… for certain data types, you can shorten them by:
extracting only important features e.g. extension from a file name, file name from a full path, domain from an url, a subset of variables from url, etc.
normalizing longer path chunks into made-up, shorter placeholders e.g. c:\windows\system32 –> %sys32%
removing superfluous suffixes e.g. TLD, SLD, or even the whole name of domain if it is repetitive
Too many rows of data
We don’t always need to show all rows e.g. multiple host names, account names, command lines, etc. A very handy snippet shown below saves us a lot of space, and also informs us that there is more data:
It extracts first 3 rows from a multivalue data and truncates the rest.
Consistency of UI metaphore
Data enrichment is easy to spot. It’s just there.
Data reduction/depletion doesn’t manifest itself in any way, until we tell the analysts that it happened. In both cases above you may notice that any data that is truncated is always substituted with ‘…’. This bit acts as a hint for analysts that in order to see the whole data set they need to remove the limitation and run the query in a separate Splunk window.