Requiem for the infosec of 90s and 2000s

Browsing through the results of my recent GDPR experiment took my mind away from the original idea.

Why?

Well…

Once I got quite a batch of PNG files from many web sites I started manually browsing through the results to ensure the script worked okay, and that I could see the actual content (screenshots) and can assess/judge it.

WEB A.D. 2018

After clicking through a lot of snapshots from what appears to be the top of the top most visited domains on the web a pattern started to emerge. And it’s actually a sad one…

Most of the web sites belonged to a few categories only:

  • advertising
  • advanced advertising
  • advanced vertical AI-driven video revenue-enhancing sustainable predictive advertising monetization platforms

The rest are your usual suspects:

  • multi-billion large companies: IT, media, etc.
  • OS, browsers, some software in general, lots of mobile apps
  • VPN, video conferencing, security companies
  • financial companies
  • games, porn

and…

  • 403s, 404s, captchas, some web sites don’t render and report error, some even block me 🙂

plus

  • real nuggets – web sites not updated for ~10 years of more; it’s almost nostalgic to see them still hanging there

The majority of web sites come in English, Russian, Chinese… other languages are scarce. Most of the web sites look pretty much the same, uniform in the looks and content. I may be old-fashioned, but web of 2018 doesn’t look as exciting as it was back in 90s and 2000s.

Now… for the really sad part.

Where are we now?

It is obvious that Web has changed, became monopolized and uniform. But the saddest thing to me is that the most prevalent theme of the screenshots, and one that pretty much drives the narrative of this post, is just these two words:

  • monetization
  • tracking

Some boring personal take

I remember when in the early naughties I joined, and eventually ended up leading a team created specifically to target adware, spyware and trackware. Our mission was simple: combat a plague of adware, spyware, rogue software, dodgy installers, dodgy web sites, and dodgy affiliate programs. The focus was primarily on the desktop software. That war was kinda won at some stage when the number of adware, spyware, fake/rogue antispyware applications dropped, and lots of adware sites ceased to exist, plus lots of dodgy software was killed (remember Bonzi Buddy, Dollar Revenue, or Klik team?).

Then along came the APT, and introduced a dramatic shift of focus for many security companies. The money you can make on APT is much bigger than the old-school malware, or spyware. The ‘low’ flying threats stopped being really attended to, and as a result the quiet progression of this ‘branch’ of dodgy industries stayed kinda under the radar. The ‘low’ priority stuff authors regrouped and developed in many directions taking into account changes in hardware and human behavior associated with browsing the internet. Obviously, smartphones and tablets acted as a huge influencer and a catalyst for software houses (including both good and bad ones) that took content and application development that was traditionally happening on the desktop/laptop computers to the web and/or smartphone apps.

Enter 2018 and the majority of web, apps, and social media sites track our every moment, build our profiles and deliver us the ‘customized’ content a.k.a. ads, or attempt to manipulate us to buy freemium content. These apps are almost always online for NO apparent reason at all.

Over last 10 years creators of traditional desktop OSs and software had to think quickly on how to adapt to the true game changer that iPhone certainly was. And to generate a revenue from a more and more challenging market. They learned from the failures of adware/spyware and took it to a next level. So now it’s all legally sound, EULA is there, and the legal team is big and can fight the world… yet… we still see the same old, same old… e.g. an opt-in as a default.

The plague affects not only application layer, but also the OS, many reputable (in 2000s) software-downloading web sites, freeware turned downware/bundleware, etc. If you are looking for examples: we know Windows 8 and 10 produce lots of telemetry, they also install potentially unwanted apps by default, and communicate with lots of servers very frequently and w/o us knowing what is really being sent out. We also know iOS collects a lot of data that both the Apple and apps can leverage to deliver better ‘content’ and ‘experience’ (e.g. controversial at some stage health data). And there are obviously apps on the smartphones – a completely different area with a lot of ‘crazy’ privacy issues around it.

We very rapidly entered the world of portable supercomputers, got surrounded by myriads of sensors and every single thing that we do is already, or can quickly be, collected, classified, matched against databases, stored, processed, and… sold. And guess what… this is not necessary a bad thing per se, lots of benefits there, and I am actually not preaching against it. I just want to know about it all. ‘It’ being every single bit that is being sent out. As a user I want to be empowered to inspect ‘it’ (in general terms e.g. by looking at raw data, logs, etc. and including any means of automation e.g. via a dedicated blocking software). And yes, I will actually allow some of it to fly. But I want to be the one who makes that call…

Basically, we lost sight of what is being transmitted out ages ago, and then… it got far worse.

What we need?

We need a DATA FIREWALL. We need data models that protect users. We need accountability.

Just think of it for a moment and on a more generic level – don’t we want to inspect every software that sends data out? A software doing an update check? An editor? A CAD application? A new version of your favorite game on smartphone? What does it send out? What about the metadata your AV or EDR is sending out? Lists of metadata feeding large databases to help with the ‘AI’ that is often backed by a simple reputation check? Look at your logs. You may see GBs, TBs of data. No one knows. So… if you are a bored researcher this is a great area for analysis!

Back to the data.  It goes out. No one knows what’s in it.

Shouldn’t there be some common protocol for this sort of stuff?

Other areas we should be looking at

The threat landscape has evolved, its categorization matured, but also got really blurred. And, while perhaps a bit general, the ‘monetization’ category itself, the one that drove the first ‘malware for money’ campaigns is still not a part of the popular threat frameworks!

Same goes for (often) unauthorized / clandestine content collection / aggregation by software that is harvesting data from the systems where the software is simply running. Legitimate or not, it’s just not right w/o users having a right to oversee it. Antivirus companies at least categorize many of these as PUAs, but it’s not enough.

We don’t need APT-only frameworks. We need all-inclusive frameworks.

Where are we now #2?

Add to it the fact that most of the software now can’t even run or be installed offline, or be trusted*** and we now operate in a panopticonish environment where we quickly progress towards the state where NONE of the decisions about hardware and software we purchase (or download for ‘free’) belong to us!!!

Since we allow it, there is no issue. Right?

GDPR is one thing, the real world is another – w/o some sort of (I can’t believe I am saying this) blockchain-like technology to account for any data operation – the accountability supported by the hardware, software, and the network layer, I think we are going to see big breaches for many years ahead.

*** Think of Supply chain attacks for starters. Then think of the ‘free’ websites. Admins often download bundled ‘freeware’ and install it using their admin creds; they believe the ‘free’ downloading sites so much that they can’t fathom the idea that they are voluntarily running PUAs with admin privileges! Recent infosec drama around Filezilla is a good example of ‘free’ going possibly rogue. You CAN’T trust. You MUST verify before you install. Talk to your CSIRT guys to validate the downloads for you!

There is a lot of discussions around the ‘how to sell’ CSIRT to the board of directors. What value does the team add? Being a cost centre is always problematic. However… IMHO we reached the stage where some sort of technical function in the company must be involved in every single decision that involves EXTERNAL code and data sources on a level similar to compliance. Will that be CSIRT? I don’t know, but who else can answer this simple question – is this safe to do/download/run? W/o clearance from that team, I think the risk is not managed well.

Rant du jour

Why there is no new threat categories created to block monetization-oriented behavior, telemetry, OS fingerprinting, any data collecting, storing, and processing? No matter what purpose, legitimate or not.

Why do we need tools like Destroy-Windows-10-Spying and run them on this new shiny OS?

Why the configuration settings are being dumbed down and the ‘disable this stuff completely’ buttons are harder and harder to find? Often there is no way to disable them at all!

Why do we still see Opt-in boxes ticked ON by default to change your default search provider, browser, etc. It should be Opt-out default, and user able to change it manually.

When did we start to feel uncomfortable updating OS and software? Why I can’t have an option to opt-out from updates. Sounds counter-intuitive, insecure, but not being able to do it puts us in a under-privileged position as a user. There are lots of arguments to patch/update, but there are also many against – there is no silver bullet, and it depends on circumstances (on one occasion I stopped my windows updates as the update was causing my system to BSOD; you can’t force updates on that system, because I won’t be able to work). Giving options to users is perhaps old-fashioned, but still very important!

Why do we need more than ever security researchers to poke around in every new update to OSs and apps? And not to discover the traditional security vulns, but to look for yet another mechanism implemented to bypass/violate our privacy? If the OS, software decides to send stuff out w/o a full understanding of the risk by the user perhaps they latter needs to be protected, by default?

The true blame game question 😉

How come we, the infosec, allowed this to happen?

Some more areas to focus on…

If you browse the internet today the amount of annoyances and nagging that was traditionally very visible on the desktop computer now often passes through security monitors w/o a single question. We do have an option of blocking domains, popups, or using ad blockers, but my point is that we are lacking more web-centered categories or detection/containment security mechanisms. The best illustration of this point are coin miners. One day all security vendors woke up to realize their solutions can’t really fully protect the users from such content. It does not infect, it does not drop malware at all (other than cached files), it is not phishing, yet it is a threat and steals CPU cycles in such a clever way.

And the most important question of it all: where is the AI that allowed this to happen??? 😉

Some final thoughts, and suggestions

The APT is not your only enemy. The focus shift from desktop to web and smartphone apps happened. And this requires more work done on the content analysis, and building APIs that support and enforce certain ways of doing things. The browser or the portable device is literally our second (and sometimes even first) desktop now. Judge what you see, and classify it according to user experience and risk. So that these legal, border-line legal and non-malicious, yet annoying thingies can be also blocked.

It’s things like:

  • PAW – Potentially Annoying Web sites (intrusive UI, sing-up popups, ‘better install the app than use the web site’ nags, nagging chats, autoplaying videos, detection of mouse going to a toolbar, or a specific corner of the screen, and reacting to it to keep the user on site, etc.)
  • Clickbait Farms (e.g. searching for file or threat names leads to lots of them)
  • Paywall web sites (I am fine with them, but if I don’t want to pay, why not removing it from my search results /as an option/)
  • Web sites using clickbait techniques / promoting clickbait content
  • Web sites promoting Fake News
  • Web sites full of affiliate links generated using legitimate, often stolen content and preying on gullible users to make that affiliate click to Amazon, or other large site
  • Web sites offering borderline, inflammatory language or visuals
  • Web sites engaging multiple ad trackers
  • Web sites engaging in interaction tracking
  • Web sites and apps promoting freemium content
  • Web sites using obfuscation techniques in the code (not bad per se, but can be suspicious)
  • Blocking comments on sites e.g. Youtube; it’s actually healthy to do so 🙂
  • etc.
    and of course, good stuff too…
  • Clean web sites (yes, no ads, no freemium, etc.) – this could be enabled by default
  • ‘Well-Behaving’ web sites (ads, but keep it under control since they follow isdustry standards, non-aggressive GUI, etc.)

This is just a dump of ideas – a far better place to start is probably the categorization already offered by popular proxy vendors. The metadata is already there for years.

The freemium model is here to stay. If the app needs to nag the user, let’s recognize it as a valid marketing tool and actually allow the user to be nagged, but the app needs to go via a proper ‘nagging’ API to do so. One that keeps the user in control and allows to set arbitrary rules e.g. user can disable freemium notifications,  manually and automatically (e.g. after 3 nags)! So, if I download the ad-supported app that can be upgraded to premium, I can still have it installed 3 years later w/o seeing a single nagging (while of course still seeing ads that can’t be disabled).

And then we have the GDPR. Wouldn’t it be cool to have a standardized way to DISABLE all the cookies-, and gdpr-related notifications with a bunch of browser or account settings? Let the user decide.

As for the search engines… What about a standardized way to do some post-processing of the search results? I know, Google, Bing, Yandex, etc. are great tools, but wouldn’t be good to have additional level of filtering by security vendors _and_ based not only on the domain blacklist, but also on a content blacklist – on the browsing device itself? Not to kill the ads, but to kill obvious sites that are PAWs shown in the results (these that search engine providers don’t feel like, or may not be empowered to remove).

And so that I don’t forget – the trackers (web bugs) from popular social media web sites do need a dedicated category as well… so that they can be safely blocked. Just because I visit the site I don’t want F, T, etc. to track all the info on what led me to enter the site and track my interaction with the GUI.

Lots of it is already possible on a proxy level, using noscript, ad blockers, etc. but we need that on the very basic random user level as well. And enabled by default, or if not possible, the user can be guided to to do so in a option-safe environment (also, if protection opt-in invoked by the user such action potentially protects the vendor from legal issues /I am not a lawyer, so this needs to be evaluated/).

I think the game of 90s and 2000s was kinda easy as it was about plain-vanilla criminals, and individual small companies that were simply dodgy, and also affiliate programs that were too obviously ‘ill-centered’ and very transparent, and content back then was very easy to classify.

Today it’s much harder: big vendors are untouchable. They incorporated lots of old adware and tracking tactics into their products and everyone agrees to it, because otherwise you won’t get the greatest, latest version of the OS, or software.  I think it’s not what users want. At least some of them. I hope.

Last words

So… if you are a security vendor, there is lots of new territories to conquer. Lots of new products can be developed and sold. Perhaps we can come back to 90s, 2000s for a moment and actually think of the user for just a moment.

Update 2019-01: I recently came across this article that paints a future that is even more gloomy and dramatic…

The botryology of anomalies – the AI, machine learning and ze computer security

Disclaimer: I am not an AI/machine learning expert. I am not even a noob. I did study computer science in the past and have a rough idea how it works. I also spent some time recently reading about it to ensure I do understand at least the basics. If you spot any mistakes, or logical fallacies, please let me know.

Throw rotten tomatoes, but have a good reason to do so!

Thank you!

TL;DR; Anyone claiming their product uses AI/machine learning and with that can protect you better than any other technology, and in particular – by replacing that ‘old’ technology – is usually not telling you the whole truth.

The AI and machine learning suffer from the marketing buzz more than any other popular keywords like e.g. blockchain, or cryptocurrency. The reason for this state of affairs is that the AI term itself has been with us for a very long time. Plus, the ‘intelligence’ bit in it establishes a very strong link between what we understand is a unique feature of a human mind with the machine abilities to do the same. Then there are movies that completely distort the picture. If machines can think, we are doomed.

Seriously…

The funny thing is that AI/machine learning is nothing, but a computer program. Only when you reduce it to just that, i.e. a piece of code written by some guy or a team it becomes easier to spot its caveats. Obviously, if more and more devices in a physical world are  controlled by the (buggy/error-prone) AI-driven software we may in the end self-destruct, but… instead of talking about that ultimate apocalypse, let’s try to focus on a ‘simple’ problem facing computer systems for last 30+ years:

  • is the object (file, URL, attachment, etc.), user’s behavior, or set of events observed in the environment bad, or good

The question is simple to ask and is actually the core of any AI design:
– we need to state the problem first before we can try to solve it.

While many companies claim to be using AI in their products I find this statement questionable. I may be wrong, but when I first learned about AI I did so in a context of data classification and decision making; even on that very basic intuitive level I always felt that the role of AI system is primarily to classify objects into groups/clusters more than to make a strict binary determination or precise decision. Perhaps AI progressed beyond the scope of my understanding, I don’t know. And of course, AI _can_ and _does_ make binary decisions in some cases, but these decisions are usually taken within a very strictly codified ‘gray area’ that is somehow arbitrarily defined, and carefully controlled. As such, an algorithm that tries to distinguish between a picture of a turtle and a marigold flower can certainly be trained to do so pretty well, you can certainly use AI to compare an incoming data set against your database of features (e.g. facial recognition and ID-ing people), but you can’t compare against the unknown. Same goes for a self-driving car where there is no suspicion of any of the sensors providing malicious input data.

The last bit is actually of paramount importance; if someone controls the input, the system will make bad decisions (bias introduced this way can destroy AI – abusing the learning process, and decisions that are made based on the incorrect model built on incorrect / poisoned data, but not only that – also on the real-time input to the AI from the analyzed system that can be manipulated).

The bottom line is my hypothesis that the clear-cut binary distinction between a goodware, and badware, or bad action vs. good action is still pretty much impossible today.

The security companies don’t lie 100% when they claim they use AI though. What they usually rely on is an implementation of so-called fuzzy logic which is a code with a crazy number of parameters retrieved from various sensors/routines, each with their own weight, and encapsulated in their software with the tones of the if/then/else statements, and some magic formulas. They do pretty well, but they are nothing but simple heuristics e.g.:

  • if it contains unknown PE section name -> claim it’s a possible malware
  • if hash is unknown in the cloud -> claim it’s a possible malware or it is ‘suspicious’
  • if entropy of the .data section differs from a ‘norm’ -> flag it; and
  • they do use lots of conditional IOC- (yara) or hardcoded artifact names-based statements, etc. e.g.
    • if it contains keylogging API that is unusual in most of the programs -> flag as a possible keylogger; and you can actually generate a lot of good rules from a large corpora of malware…

But is this a real AI tho?

I have my doubts.

Remember that first AI anti-malware software didn’t even know of existence of PE format, let alone its 64-bit version, and then additional complexities introduced by .NET, and Metro applications, let alone new platforms like Android or iOS + a massive list of hacking tricks that can be only observed via a very thorough EDR or auditing. It doesn’t learn on its own to extract new properties or understand new formats, unless you somehow codify it (one can argue there is always a way to feed new file formats or input as they become available, but …). If the AI algo can get an unstructured data set and make good decisions based of it then I will be the first one to convert and become a robot’s slave…

Hypothesis: AV systems can be evaded easily, and same applies to AI.

Proof?

Same as with AV – by examples.

Let’s assume first that the AI system has almost an infinite, omni-seeing ability to collect any possible information from the observed system. It’s a nonsensical utopia, but helps to set up a stage for the borderline thought experiments that follow.

Imagine an insider threat that deals with a customer list and wants to steal it from the system. AI observes this person’s every move. The user accesses the database of clients on regular basis. The only difference between the 2 distinctive paths AI observes between your average day and the day when the user steals the data is when the user takes a photo of the screen with the smartphone and causes a slight, yet negligible delay between the time subsequent keys (that are part of a normal working day) are pressed.

Obviously, using external devices is a cheat, but it highlights the fact AI can only ‘see’ what the system can see – the threat of using an external device that is not connected to the system to take a stupid photo doesn’t disappear. Of course, one could always argue that if AI ‘sees’ everything, there is a cam on that computer that monitors the user all the time and spots the object pointing at the computer screen, identifies it as a smartphone, and not only that – it can distinguish that it was out there to take a photo+confirms it actually happened+can ID the user.

Good luck with that. Also, AI clearly doesn’t care about GDPR, remote desktop access, fake cam feeds, or a sticker on the camera that is there since the day employee was hired… So many ‘ifs’.

Let’s look at another example.

The very same insider starts sending emails with some random data to a newly established email. If asked, says it’s for a testing purposes. Who wouldn’t trust them? AI observes it all and after a while gets used to emails that don’t carry any risks and possibly whitelists the ‘test’ email (especially if SOC analysts who investigate the first 20 alerts tick the ‘not a threat’ box that will provide an important human feedback to the AI system; Assisted Learning helps, right?). Then one day the real data starts being sent out, or better – chunks of it, hidden in the test data; only the thief knows how too interpret it. Same format as test data. From the very same source from which the test data was sent for months and was marked ‘non-threat’. There is a very high chance that the AI system will miss it, and even if it flags it, it will be dismissed ‘based on the history’. And to your possible point – AI could obviously spot the user ‘manufacturing’ the data he plans to steal; what if he used exactly same process of inputting test data as with the stolen data? There is a bit in the cycle that can’t be monitored – a human memory and intention. Besides, AI needs to be flexible; human is not a machine and there will be deviations observed that need to be dismissed. Ruling them out, even from a monotone cycle of some of the jobs e.g. in the call centers is definitely non-trivial.

The concept of active malicious training of AI to ingest a stream of legitimate events, then slowly accept small pattern changes is very tempting and I think crucial to understanding how difficult these binary decisions are. The naive mind will always focus on how the bad activity stands out in a typical hack scenario, forgetting that it is usually the small changes that get unnoticed for a very long time (think: salami attack). And let’s not forget there are e.g. 50000-100000 employees to monitor at any point of time in a large company+show me the company where you have controls covering everything 100%… Alert fatigue is visible today even with relatively simple DLP alerts; if AI starts flagging more events your SOC will quickly be outnumbered…

Okay, maybe still far-fetched. Let’s look at malware samples.

An example sample is an installer of new version of 7Zip detected to be downloaded by an user. AI saw similar downloads before and they were deemed to be legitimate. When executed setup files use the Nullsoft Installer that simply drops files on the system. AI is already trained on Nullsoft installers and knows that in most cases it is a good installer used primarily for non-malicious purposes. It does know of course that some malware did abuse it in the past as well (including e.g. bundled adware). So any execution is carefully monitored. The AI doesn’t know tho that the malware got introduced via a supply chain attack and the installer downloaded from the web site this time already contains malware that is present in the final 7z.exe. Obviously, AI may pick up some funny activity later on from the infected 7z.exe, but have you noticed that the integrity of the system has been already compromised? Will you detect it with AI at the time of download/infection? or only when it actually collects and exfiltrates data? The difference is actually quite substantial. I am personally very strict (at least in theory) when it comes to a definition of an incident. If any of the C(onfidentiality), I(ntegrity), or A(vailability) is affected, it is already an incident. You do want to prevent every single one. Having a malicious software installed on the system and ready to go is to attackers’ advantage. Even if you start reacting to it quickly – that important data might have already left the network… IMHO race conditions like this are the future of many attacks. This is not that far from the concept of Core Wars

We can also dig deeper into a Portable Executable format – e.g. we can ask if static file analysis enough to determine if the file is malicious?

I think it’s not.

It’s so easy nowadays to come across malware that is signed using stolen certificates, there are documented examples of malicious hash collisions.  Many malware authors use advanced techniques of code obfuscation that leverage existing code repositories of ‘real-world’ code snippets. They use them to do a code-injection on a source level so that malware resembles real software after compilation. Given the progress in decompilation, the idea of a generic code integration is also much closer than before. And there is not a single PE file property (or their group) that can be today used to distinguish a badware from goodware on a file level. Of course, you will always find lots of stupid malware that still uses old protectors, hides its code under UPX, uses randomly named PE sections, use PE format tricks to evade analysis, connects out immediately after it runs, but for this you don’t even need AI…this problem is actually already solved. IMHO… for anything a bit more advanced static analysis and simple emulation are not enough…

The sandboxing and dynamic analysis are actually very efficient nowadays, but then again – also prone to errors. I can’t imagine a modern AI system that would not reach out to some sandbox technology to support its judgment. Which obviously leads us to the topic of sandbox evasions. Again, to believe that AI can always pick up the bad behavior from sandbox analysis is simply naive (the usual suspects include: missing command line arguments, dependency on other files, run as a service, run from a specific location, providing input data, and any other anti_* tricks – whether intended, or not, etc.). And besides, to rely on the input of sandbox is very non-AI approach after all. Isn’t it?

So, say you got a file, you calculate a hash, if it is known and clean (reputation check), it’s OK. If it is not, you dig deeper – run static (file properties analysis/emulation)/dynamic analysis (sandbox). If bad stuff detected at any stage via IOC/yara, you mark it. If not, what do you do? You rely on your AI model. Which is a bunch of ifs. And since AI is just a program, as for any code… there is an exit path in that process that says ‘undetermined’. The program runs on the system. System pwned…

And in this context we do need to mention legitimate software doing lots of dodgy stuff. Legitimate software calling cscript, mshta, powershell, injecting code into Windows Explorer to delete files (older installers), word/excel with embedded batch files, direct, clickable links to shares where the batch files reside, etc. and often executed in a way that is clearly malicious (well, but is it?) – at least from a typical threat hunting signature-based perspective. Plus, there is an ongoing and very active  research on LOLbins – Living of the land binaries that can be used to carry on some malicious activities. Is running Lolbins suspicious? How can you tell, AI?

As I often repeat – anomalies are hard to define+we end up doing signatures! Oh, no… not again!

And yes, AI could come up with the model of determining some of these anomalies to be bad, but mind you – some Enterprise solution’s approach to running e.g. powershell snippets is identical with malware.

And coming back to AI bias… Remember what I said about the user controlling the input? It’s a bit like a web pentesting; if user can control the input, you better be very careful… And on any system monitored by AI user is the king. They may not be able to run a lot of software, they may not have an admin, or system account, but they can do dodgy stuff that stays under the radar for a very long time. Because they control the input!

I think there is a vast, unexplored area of research into generating streams of legitimate or legitimately-looking events that will help to train AI systems to… ignore certain types of events. The research may be threat-, vendor- or product-specific, but if it does achieve the AI bypass, then that’s pretty much the game over for any system protected with your ‘AV-replacement’ Next Gen product.

Anytime someone claims they can bypass a product, or an idea the good ol’ defense-in-depth concept comes back to my mind. You just can’t simply rely on that one, new, and ‘almighty’ security control. You can’t rely on their classification/decision process either – AI or not. The redundancy in detection ideas, and/or security controls is to our benefit. It costs more short term, but it costs less long-term. And on that note, one of the Holy Grails for any blue team should be detection of legitimate pentesting activity. If you can’t even detect that with your latest and ‘bestest’ AI technology you do need to ask yourself some serious questions…

Last, but not least – the AI has to deal with a crazy amount of ambiguity ‘by default’, and where there is ambiguity, the bad decisions will _always_ be made. So, now we not only “trust, but verify”, but “trust, verify, and rely on redundant security controls and threat hunting ideas to avoid surprises of these bad decisions made by some of the blindly-trusted controls – and either prevent, or detect as early as possible…”.

I once attended an interview (in the very early stages of my career); the guy asked me – am I 100% sure that I can do the job w/o someone needing to review and reassess what I have done, and possibly correct me. It really shocked me – he assumed I may be a subject to a failure! Only later I realized that it is a very mature question; if you can’t be 100% sure, please do not claim it. And the trust… same as entitlement need to be earned first. As such, we can’t trust AI models today – you need to review!!!

AI and machine learning are very interesting, and promising ideas, but at the moment they are still in the ‘wishful thinking’ bucket. They should not be sold as a replacement to an existing technology, but as a carefully monitored add-on. Otherwise it’s just insincere, and as I mentioned, yes, it actually should add more work to your workload before you can actually trust it!

Coming to an end of this preaching session. I think what I want to highlight the most is probably the fact that all the AI-oriented marketing materials focus on the cases where it works. And this is great. But… that’s the biggest problem.

Only a few days ago Halvar Flake released a very interesting preso about the state of the RE tools and their ‘marketability’. I really like the slide where he states ‘Tools are written for a paper / presentation’:

As of 2018… so are… enterprise AI-based security solutions.

And to close this awful preaching session, I must admit that I really liked FireEye’s article about ‘Reverse Engineering the Analyst: Building Machine Learning Models for the SOC‘. This is a very good example of Machine Learning being presented in a very practical way, and as one of the available tools to support our work. And this is I guess the goal of all this AI/ML buzz: use it to reduce the data to clusters, but still… let us make that binary decision ourselves!