The Future of SOC

Over last few years we moved away from a SOC that used to be almost solely focused on Network and Windows events and artifacts (probably a strong fintech bias here) towards the one that is a Frankenstein’s monster we see today – very fractured, multi-dimensional, multi-platform, multi-architectural, multi-device and multi-everything-centric, plus certainly multi-regional (regulated markets, data across borders)/privacy-savvy, on- and off-prem, covering *aaS and endpoints/servers/mobile/virtualization/containers/CI/CD pipelines, and did I mention multi-cloud, public and private environments, vendor vs. proprietary, with a bonus of over-eager employees who keep sending ‘dangerous’ stuff to SOC because they have been trained well to report <insert any suspicious events here>’ ? And finally, one where NO ONE knows anymore at least the basics of all the existing, rapidly emerging, and more and more confusing technologies, let alone the gamut of ideas and solutions that help to address (or, at least detect) many of these security problems.

I think we moved away from a fairly understood model known between … let’s say 2000-2018 /COMFORTABLE/ towards the one that is (as of today at least) 2018-+ full of unknown unknowns /VERY UNCOMFORTABLE/…

How do we deal with it today?

Usually a bridge, a Slack or Teams channel with 100-200 people on it.

I think divide and conquer is the only way to deal with it. Also, more and more work has to focus on building bridges with internal owners of technologies / architects than ever before. This includes for instance, a lot of DevSecOps work, shifting left, early involvement in app development improvement and their release cycles, security-oriented feature and LOG requests, heavy red-team footprint on breaking it all, and in opposition to the previous decade – lots of very hands-off work. Lots of commanding and coordination.

Borrowing the quote from dreBlue Teaming is 90 percent social capital today.

Times have definitely changed…

And in parallel:
– stronger than ever reliance on vendors
– real (as in ‘old school’) cyber skills are in a strong decline — what took years to acquire and master is now gamified by vendor offerings that dumbify a lot of problems and requirements; I am not against it, because we need help and while sometimes it comes in a form of a b/s and extrapolations, we must admit that many non-technical analysts today, even w/o reading a single RFC in their life can easily handle many incidents by just… talking and via vendor consoles – this would be impossible 10 years ago
– seriously, tools of today are fantastic: advanced sandboxes, threat intel portals, bug bounty portals, and the whole social media sharing makes it far easier to find and share information that used to be available only to a few in the past
– the environments get more complicated — we need to work towards universal playbooks that cover heavily regulated regional markets
– portability is the key (work in one place -> work everywhere w/o many changes) — affects multiple instances of systems of records, SOPs, detections, metrics (again, regional/regulated markets, plus ability to quickly recover in case of a breach)
– follow the Sun is now more complicated as it includes Follow the Regulated Market
– from log deprivation to log over-saturation — time for some log governance, at source? common models for field naming? not only naming conventions, but also … and I really mean it… one, common, universal, … TIMESTAMP FORMAT?
– optimization efforts should be a norm — most of detection engineering, threat hunting teams add to the workload, we need an opposite force that asks — hmm is it really necessary? the same goes for a ruthless approach towards an email fatigue — convert to tickets or kill at source, disable, decommission
– how many emails your workflow and automation is sending today? can you trim it down?

The above is what scares me. It’s currently hardly manageable, and it’s not sustainable. It’s a whack-a-mole on steroids. We were meant to stop the whack-a-mole. And it not only happens now, it has intensified a lot in recent years and imho this trend will continue. When we all started to really like the idea of having EDRs at our disposal… the *aaS happened and there is no way back. Suddenly all our incident response playbooks, SOPs need to focus on a completely different type of threats. Lots of this work is actually more focused on a proper access management than infiltration by APT actors via well-known TTPs. A lot of work is also focused on shared responsibility — when you deal with alerts on-prem, on endpoints, it’s all nice and cozy, but when you are *aaS, there is a moment where an external password spray hitting the client application ran on the server you host, one has to decide when the transfer of security responsibility occurs. Is it a threat to the hosting environment? The instance of the app? Both? It’s… complicated.

What is the SOC of the future?

I think there is no SOC in the future. There is a cross-organizational incident response committee (you don’t wanna know how much I hate this word!) that actively engages in tackling issues at hand, and ‘incident commands’ respective teams leading the issues to a closure. Security becomes part of the day to day operations. Representatives from many functions actually are talking to each other, often, and the ‘old security’ in isolation is no longer a topic of any conversations. What is though, is addressing the ‘are we affected’ question on VERY REGULAR BASIS? To help with that, advanced Asset inventories covering hardware, software, *aaS, SBOMs, packages, all aiming at exposure assessment and potential containment & closing communication loops are a MUST. It’s no longer a strictly speaking, a technical problem. It’s a problem that has a stage, that stage is not only political, but also visionary. Whoever does the minimum effort to collect and maintain the best asset inventory, then predict, plan to contain, and finally close, will be the winner of the many brownie points to be distributed in this area in the future.

And that’s why the future belongs to TRIAGE function.

That Omelas child, the punchbag, the scapegoat. The first line of defense, and the most important. Yet so often neglected.

Clear SOPs for Triage will help to handle most of incoming ‘requests’. You want that Triage team to be supported as hell. Their procedures must be simple, to the point, and with clear paths of both closure and escalation. Such triage function will train the best IR practitioners of the future. Jacks of all trades, outspoken, cooperative, and assertive.

The game is changing and we need to adapt. It’s time you take your Triage team out for a good dinner.

Shall we say… Good bye, phishing queue?

Imagine you stop processing your phishing reports today.

Just stop.

What could be the worst thing that could happen? Hmm ?

Of course, some people will still get phished, some will become Business Email Compromise (BEC) victims, maybe even launch macromalware, or witness some other badness, scam, but…

This is a very attractive proposal tho! Isn’t it?

10-15 years ago we had a lot of alerts from removable devices, drive-by infections, IDS, IPS, proxy alerts – today they are no longer that important. Should ‘phishing report’ join this group, and retire?

A lot of comms today happen via Instant Messengers (Slack, Teams), Bots, social media, often via always-online, often mobile platforms, and less and less via email. And many users and their systems are already very well protected by mail gateway security, endpoint policies, plus AV, EDRs, etc. And we have MFA in place as well. So what that someone nicks your credentials if they still need your MFA blessing to log in?

And what about submissions quality? Isn’t a person reporting a phish an employee that is actually very security aware? Since they are, there is really not that much to do there… Links are shortlived, often are non-blockable really (hosted on shared platforms), and the user _is_ aware it was bad. And some of them are so aware that, while it perhaps makes us feel really good about our Security Awareness programs, many of them become very trigger-happy and start submitting literally anything that looks ‘off’ in their eyes as… a phishing report, for example:

  • Company invites you to a webcast? Report.
  • Vendor sending you a newsletter? Report.
  • Real Amazon, UPS, DHL, Fedex tracking emails come in? Report.
  • Zoom recording email comes in? Report.
  • Request to reset your password comes in? Report.
  • Your peer shared a cloud-based link to a shared workspace with you? Report.

And so on, and so long. There is no end to it.

Many of these emails are legitimate, many of these nagging emails can be genuinely unsubscribed from. But hey… we are all so AWARE… so these reports end up in our ticket queue, instead.

It’s a dumpster fire.

And perhaps it’s time to start looking at it from a different angle.

There are 3 angles I think that are interesting:

  • auto-closures
  • processing in bulk
  • focus on emails outside of phishing reports

The first two are a natural progression of a tired SOC/triage handler. Thinking of how one could go about closing tickets faster various ideas may come to mind — note these listed below are not ready-to-use recipes – your risk appetite depends on the org you work for, and there are obvious and non-obvious caveats in the below examples:

  • If the emails are internal, you may just want to auto-close (with a risk acceptance of malicious forward, compromised email and internal phish).
  • If the emails come from verified sources (proven not to be spoofed), we can autoclose it. There are tones of low hanging fruits here: anything that uses ‘noreply’, ‘no-reply’, ‘support’, and many variants of ‘unmanaged’ non-spoofed mailboxes like these from ‘big’ players and legit companies (do the stats for your org!) are these that can be auto-closed.
  • If it’s a repetition, we can close. This is non-trivial, given differences in headers, recipients, etc., but feasible.
  • If it is a known pattern from a known address f.ex. ticketing systems sending gazillion of ticket notifications we can add rule to auto close all of these that are reported as phish.
  • Emails w/o attachments? And all extracted links point to well known domains that are known to be non-malicious (need a database of these)? Probably can auto-close too.

And so on and so forth.

We may shave quite a bit with such approach. But there is a need for a risk acceptance for all of these auto-closures, of course.

What about bulk processing?

Imagine a world where you don’t look at phishing reports as individual tickets, but review them in bulk?

A selective, carefully crafted excerpt of data dump of say 20-50 phishing reports one by one, perhaps in a table, could save us a lot of time. Instead of doing gazillion clicks to manually open/close each and every one, individually, as tickets, we cluster them together and close them as a single unit. 5-10 clicks, instead of say 200-500. One person spending 30 minutes on it, instead of 5 people spending 15-30 minutes per each ticket. There are some substantial personhours to win at stake, and it’s worth looking at!

With bulk processing, you can very quickly pick up and rule out emails that are your regular spam, cherry-pick BECs, find emails with attachments (they may need additional time for analysis), etc. And if you look at the above list, you could immediately envision doing separate bulk processing for emails with, and without attachments to speed things up.

Most of email headers are useless for analysis, yet all of them usually make it to the tickets. Why not trimming them down? Showing only these that are relevant? Well… you may ask ‘which ones are those’? Do a brainstorm session with your analysts, look at how they analyze the emails today, and cherry-pick the information you would like to see first for individual tickets, and then inside a bulk report.

In my mind, analyzing emails reported as phish contributes to the highest waste of cyber cycles ever, and we need to start thinking on how to cut this down. You can literally save hundreds of thousands of dollars by adapting a more modern, even borderline risky approach to handling these…

Can you sell it to your CISO, compliance, client’s auditors…? Today, probably no. But if we as an industry start making changes then these approvals will inevitably become a new Business As Usual (BAU).

That leaves me with the last part.

How many of us look at emails that are NOT reported as phish? This is where the real, undetected BAD happens. This is a gold mine of opportunities to detect that badness, but somehow it’s not a priority for many. There are obvious privacy and information sensitivity issues to tackle (access to body of emails should be really heavily restricted), but even with some basic metadata we should be able to find some interesting stuff. We could start by ruling out legitimate emails and campaigns (f.ex. internal emails, emails from the unmanaged/non-responsive emails, etc. etc.) to arrive at a significantly reduced data set which can then eyeballed same way as bulk processing. This in fact, is, a nice compensating control to your current ‘analyze email phishing reports only’ process!

So, imho bulk processing is the future of email analysis. We apply same threat hunting principles as we do to EDR. We hunt for bad, we hunt for good and narrow down the scope saving lots of time, combating alert fatigue, combating FP fatigue, improving morale in the end, and still keeping your organization safe!

There is almost no value in a single phish report, but there is certainly a value in us optimizing our processes. If your queue is made of 30%+ phishing, this post is for you, and it’s time for you to act.