Optimizing the regexes, or not

Every once in a while we all contemplate solving interesting yet kinda abstract threat hunting problems. This post describes one of these…

The problem:

Given a relatively long number of strings, how do you write a regular expression that covers them all, but doesn’t hit on any other string?

The context:

I have extracted file names associated with kernel drivers referenced by all the .inf files present inside all of (unpacked) archives that can be found inside the DriverPack.

The rationale:

Hunting for new kernel drivers introduced to the environment may be easier if I can extract kernel driver names from the telemetry, and only report creation of these that reference files that are NOT present on the ‘known list of good kernel driver file names’.

The solution:

Looking for existing tools that may help to address this problem in a generic way I came across this perl module – Regexp::Optimizer. To my surprise, it actually works quite nicely.

I gave it 7.5K file names associated with ‘known clean kernel module drivers’ and it gave me the following regex. I have tested all the file names from the ‘ServiceBinary2su.txt’ file and the regex worked well. This is the test script:

use strict;
use warnings;
use utf8;

$| = 1;

my $f=’regex.txt’;
open F,”<$f”;
binmode F;
read F,my $regex,-s $f;
close F;

my $x=shift;
if ($x=~/^$regex.sys$/i)
{
print (“$x matched\n”);
}
else
{
print (“$x didn’t match\n”);
}

The final regex is 52624 bytes long. The input data was 103317 bytes long (including new lines). We have achieved a 51% ‘compression rate’, but debugging of such a complicated regex pattern sounds like a heck of a job. It would seem that sometimes solving interesting yet kinda abstract threat hunting problems brings more confusion to the process than we anticipate… And getting fixated on using regexes to solve this kind of problem is actually a bigger problem itself. The multi-pattern search-oriented trie structures are far more suitable to solve this sort of multi-pattern search/comparisons.

The rapidly changing geopolitics and its inevitable effect on cyber

The ‘follow the Sun’ model is dead. No more IOC sharing. Gone are the days where we openly pass valuable information to our peers, partners, friends and sometimes even frenemies. Oh, and did I mention our global coworkers? We don’t share lots of info with them either. Also, the 5 eyes is no longer.

This rather gloomy future is not that far away.

With the dramatic political changes happening in US we all need to quickly rethink how we are going to do ‘global’ cyber a year from now, and in the years that follow.

The ‘cyber’ of the last few decades was very clearly defined: there are some bad guys out there, and they are being chased by the good guys. All of us doing the ‘cyber’ in the Western democracies were obviously assuming the role of the good guys.

But this global collective of the good guys is no longer.

Global companies need to adapt very quickly. My friend suggested FedRamp model to follow and I think it’s a very valid value proposition.

Data transfer between regions needs to stop. Separate systems of records must be introduced in all the global locations. Your global SOC/CERT needs to be decentralized. Cross-regional access restricted. Risk registers split into many local instances.

This is gonna hurt. This is gonna cost.