Disclaimer: I am not an AI/machine learning expert. I am not even a noob. I did study computer science in the past and have a rough idea how it works. I also spent some time recently reading about it to ensure I do understand at least the basics. If you spot any mistakes, or logical fallacies, please let me know.
Throw rotten tomatoes, but have a good reason to do so!
Thank you!
TL;DR; Anyone claiming their product uses AI/machine learning and with that can protect you better than any other technology, and in particular – by replacing that ‘old’ technology – is usually not telling you the whole truth.
The AI and machine learning suffer from the marketing buzz more than any other popular keywords like e.g. blockchain, or cryptocurrency. The reason for this state of affairs is that the AI term itself has been with us for a very long time. Plus, the ‘intelligence’ bit in it establishes a very strong link between what we understand is a unique feature of a human mind with the machine abilities to do the same. Then there are movies that completely distort the picture. If machines can think, we are doomed.
Seriously…
The funny thing is that AI/machine learning is nothing, but a computer program. Only when you reduce it to just that, i.e. a piece of code written by some guy or a team it becomes easier to spot its caveats. Obviously, if more and more devices in a physical world are controlled by the (buggy/error-prone) AI-driven software we may in the end self-destruct, but… instead of talking about that ultimate apocalypse, let’s try to focus on a ‘simple’ problem facing computer systems for last 30+ years:
- is the object (file, URL, attachment, etc.), user’s behavior, or set of events observed in the environment bad, or good
The question is simple to ask and is actually the core of any AI design:
– we need to state the problem first before we can try to solve it.
While many companies claim to be using AI in their products I find this statement questionable. I may be wrong, but when I first learned about AI I did so in a context of data classification and decision making; even on that very basic intuitive level I always felt that the role of AI system is primarily to classify objects into groups/clusters more than to make a strict binary determination or precise decision. Perhaps AI progressed beyond the scope of my understanding, I don’t know. And of course, AI _can_ and _does_ make binary decisions in some cases, but these decisions are usually taken within a very strictly codified ‘gray area’ that is somehow arbitrarily defined, and carefully controlled. As such, an algorithm that tries to distinguish between a picture of a turtle and a marigold flower can certainly be trained to do so pretty well, you can certainly use AI to compare an incoming data set against your database of features (e.g. facial recognition and ID-ing people), but you can’t compare against the unknown. Same goes for a self-driving car where there is no suspicion of any of the sensors providing malicious input data.
The last bit is actually of paramount importance; if someone controls the input, the system will make bad decisions (bias introduced this way can destroy AI – abusing the learning process, and decisions that are made based on the incorrect model built on incorrect / poisoned data, but not only that – also on the real-time input to the AI from the analyzed system that can be manipulated).
The bottom line is my hypothesis that the clear-cut binary distinction between a goodware, and badware, or bad action vs. good action is still pretty much impossible today.
The security companies don’t lie 100% when they claim they use AI though. What they usually rely on is an implementation of so-called fuzzy logic which is a code with a crazy number of parameters retrieved from various sensors/routines, each with their own weight, and encapsulated in their software with the tones of the if/then/else statements, and some magic formulas. They do pretty well, but they are nothing but simple heuristics e.g.:
- if it contains unknown PE section name -> claim it’s a possible malware
- if hash is unknown in the cloud -> claim it’s a possible malware or it is ‘suspicious’
- if entropy of the .data section differs from a ‘norm’ -> flag it; and
- they do use lots of conditional IOC- (yara) or hardcoded artifact names-based statements, etc. e.g.
- if it contains keylogging API that is unusual in most of the programs -> flag as a possible keylogger; and you can actually generate a lot of good rules from a large corpora of malware…
But is this a real AI tho?
I have my doubts.
Remember that first AI anti-malware software didn’t even know of existence of PE format, let alone its 64-bit version, and then additional complexities introduced by .NET, and Metro applications, let alone new platforms like Android or iOS + a massive list of hacking tricks that can be only observed via a very thorough EDR or auditing. It doesn’t learn on its own to extract new properties or understand new formats, unless you somehow codify it (one can argue there is always a way to feed new file formats or input as they become available, but …). If the AI algo can get an unstructured data set and make good decisions based of it then I will be the first one to convert and become a robot’s slave…
Hypothesis: AV systems can be evaded easily, and same applies to AI.
Proof?
Same as with AV – by examples.
Let’s assume first that the AI system has almost an infinite, omni-seeing ability to collect any possible information from the observed system. It’s a nonsensical utopia, but helps to set up a stage for the borderline thought experiments that follow.
Imagine an insider threat that deals with a customer list and wants to steal it from the system. AI observes this person’s every move. The user accesses the database of clients on regular basis. The only difference between the 2 distinctive paths AI observes between your average day and the day when the user steals the data is when the user takes a photo of the screen with the smartphone and causes a slight, yet negligible delay between the time subsequent keys (that are part of a normal working day) are pressed.
Obviously, using external devices is a cheat, but it highlights the fact AI can only ‘see’ what the system can see – the threat of using an external device that is not connected to the system to take a stupid photo doesn’t disappear. Of course, one could always argue that if AI ‘sees’ everything, there is a cam on that computer that monitors the user all the time and spots the object pointing at the computer screen, identifies it as a smartphone, and not only that – it can distinguish that it was out there to take a photo+confirms it actually happened+can ID the user.
Good luck with that. Also, AI clearly doesn’t care about GDPR, remote desktop access, fake cam feeds, or a sticker on the camera that is there since the day employee was hired… So many ‘ifs’.
Let’s look at another example.
The very same insider starts sending emails with some random data to a newly established email. If asked, says it’s for a testing purposes. Who wouldn’t trust them? AI observes it all and after a while gets used to emails that don’t carry any risks and possibly whitelists the ‘test’ email (especially if SOC analysts who investigate the first 20 alerts tick the ‘not a threat’ box that will provide an important human feedback to the AI system; Assisted Learning helps, right?). Then one day the real data starts being sent out, or better – chunks of it, hidden in the test data; only the thief knows how too interpret it. Same format as test data. From the very same source from which the test data was sent for months and was marked ‘non-threat’. There is a very high chance that the AI system will miss it, and even if it flags it, it will be dismissed ‘based on the history’. And to your possible point – AI could obviously spot the user ‘manufacturing’ the data he plans to steal; what if he used exactly same process of inputting test data as with the stolen data? There is a bit in the cycle that can’t be monitored – a human memory and intention. Besides, AI needs to be flexible; human is not a machine and there will be deviations observed that need to be dismissed. Ruling them out, even from a monotone cycle of some of the jobs e.g. in the call centers is definitely non-trivial.
The concept of active malicious training of AI to ingest a stream of legitimate events, then slowly accept small pattern changes is very tempting and I think crucial to understanding how difficult these binary decisions are. The naive mind will always focus on how the bad activity stands out in a typical hack scenario, forgetting that it is usually the small changes that get unnoticed for a very long time (think: salami attack). And let’s not forget there are e.g. 50000-100000 employees to monitor at any point of time in a large company+show me the company where you have controls covering everything 100%… Alert fatigue is visible today even with relatively simple DLP alerts; if AI starts flagging more events your SOC will quickly be outnumbered…
Okay, maybe still far-fetched. Let’s look at malware samples.
An example sample is an installer of new version of 7Zip detected to be downloaded by an user. AI saw similar downloads before and they were deemed to be legitimate. When executed setup files use the Nullsoft Installer that simply drops files on the system. AI is already trained on Nullsoft installers and knows that in most cases it is a good installer used primarily for non-malicious purposes. It does know of course that some malware did abuse it in the past as well (including e.g. bundled adware). So any execution is carefully monitored. The AI doesn’t know tho that the malware got introduced via a supply chain attack and the installer downloaded from the web site this time already contains malware that is present in the final 7z.exe. Obviously, AI may pick up some funny activity later on from the infected 7z.exe, but have you noticed that the integrity of the system has been already compromised? Will you detect it with AI at the time of download/infection? or only when it actually collects and exfiltrates data? The difference is actually quite substantial. I am personally very strict (at least in theory) when it comes to a definition of an incident. If any of the C(onfidentiality), I(ntegrity), or A(vailability) is affected, it is already an incident. You do want to prevent every single one. Having a malicious software installed on the system and ready to go is to attackers’ advantage. Even if you start reacting to it quickly – that important data might have already left the network… IMHO race conditions like this are the future of many attacks. This is not that far from the concept of Core Wars…
We can also dig deeper into a Portable Executable format – e.g. we can ask if static file analysis enough to determine if the file is malicious?
I think it’s not.
It’s so easy nowadays to come across malware that is signed using stolen certificates, there are documented examples of malicious hash collisions. Many malware authors use advanced techniques of code obfuscation that leverage existing code repositories of ‘real-world’ code snippets. They use them to do a code-injection on a source level so that malware resembles real software after compilation. Given the progress in decompilation, the idea of a generic code integration is also much closer than before. And there is not a single PE file property (or their group) that can be today used to distinguish a badware from goodware on a file level. Of course, you will always find lots of stupid malware that still uses old protectors, hides its code under UPX, uses randomly named PE sections, use PE format tricks to evade analysis, connects out immediately after it runs, but for this you don’t even need AI…this problem is actually already solved. IMHO… for anything a bit more advanced static analysis and simple emulation are not enough…
The sandboxing and dynamic analysis are actually very efficient nowadays, but then again – also prone to errors. I can’t imagine a modern AI system that would not reach out to some sandbox technology to support its judgment. Which obviously leads us to the topic of sandbox evasions. Again, to believe that AI can always pick up the bad behavior from sandbox analysis is simply naive (the usual suspects include: missing command line arguments, dependency on other files, run as a service, run from a specific location, providing input data, and any other anti_* tricks – whether intended, or not, etc.). And besides, to rely on the input of sandbox is very non-AI approach after all. Isn’t it?
So, say you got a file, you calculate a hash, if it is known and clean (reputation check), it’s OK. If it is not, you dig deeper – run static (file properties analysis/emulation)/dynamic analysis (sandbox). If bad stuff detected at any stage via IOC/yara, you mark it. If not, what do you do? You rely on your AI model. Which is a bunch of ifs. And since AI is just a program, as for any code… there is an exit path in that process that says ‘undetermined’. The program runs on the system. System pwned…
And in this context we do need to mention legitimate software doing lots of dodgy stuff. Legitimate software calling cscript, mshta, powershell, injecting code into Windows Explorer to delete files (older installers), word/excel with embedded batch files, direct, clickable links to shares where the batch files reside, etc. and often executed in a way that is clearly malicious (well, but is it?) – at least from a typical threat hunting signature-based perspective. Plus, there is an ongoing and very active research on LOLbins – Living of the land binaries that can be used to carry on some malicious activities. Is running Lolbins suspicious? How can you tell, AI?
As I often repeat – anomalies are hard to define+we end up doing signatures! Oh, no… not again!
And yes, AI could come up with the model of determining some of these anomalies to be bad, but mind you – some Enterprise solution’s approach to running e.g. powershell snippets is identical with malware.
And coming back to AI bias… Remember what I said about the user controlling the input? It’s a bit like a web pentesting; if user can control the input, you better be very careful… And on any system monitored by AI user is the king. They may not be able to run a lot of software, they may not have an admin, or system account, but they can do dodgy stuff that stays under the radar for a very long time. Because they control the input!
I think there is a vast, unexplored area of research into generating streams of legitimate or legitimately-looking events that will help to train AI systems to… ignore certain types of events. The research may be threat-, vendor- or product-specific, but if it does achieve the AI bypass, then that’s pretty much the game over for any system protected with your ‘AV-replacement’ Next Gen product.
Anytime someone claims they can bypass a product, or an idea the good ol’ defense-in-depth concept comes back to my mind. You just can’t simply rely on that one, new, and ‘almighty’ security control. You can’t rely on their classification/decision process either – AI or not. The redundancy in detection ideas, and/or security controls is to our benefit. It costs more short term, but it costs less long-term. And on that note, one of the Holy Grails for any blue team should be detection of legitimate pentesting activity. If you can’t even detect that with your latest and ‘bestest’ AI technology you do need to ask yourself some serious questions…
Last, but not least – the AI has to deal with a crazy amount of ambiguity ‘by default’, and where there is ambiguity, the bad decisions will _always_ be made. So, now we not only “trust, but verify”, but “trust, verify, and rely on redundant security controls and threat hunting ideas to avoid surprises of these bad decisions made by some of the blindly-trusted controls – and either prevent, or detect as early as possible…”.
I once attended an interview (in the very early stages of my career); the guy asked me – am I 100% sure that I can do the job w/o someone needing to review and reassess what I have done, and possibly correct me. It really shocked me – he assumed I may be a subject to a failure! Only later I realized that it is a very mature question; if you can’t be 100% sure, please do not claim it. And the trust… same as entitlement need to be earned first. As such, we can’t trust AI models today – you need to review!!!
AI and machine learning are very interesting, and promising ideas, but at the moment they are still in the ‘wishful thinking’ bucket. They should not be sold as a replacement to an existing technology, but as a carefully monitored add-on. Otherwise it’s just insincere, and as I mentioned, yes, it actually should add more work to your workload before you can actually trust it!
Coming to an end of this preaching session. I think what I want to highlight the most is probably the fact that all the AI-oriented marketing materials focus on the cases where it works. And this is great. But… that’s the biggest problem.
Only a few days ago Halvar Flake released a very interesting preso about the state of the RE tools and their ‘marketability’. I really like the slide where he states ‘Tools are written for a paper / presentation’:
As of 2018… so are… enterprise AI-based security solutions.
And to close this awful preaching session, I must admit that I really liked FireEye’s article about ‘Reverse Engineering the Analyst: Building Machine Learning Models for the SOC‘. This is a very good example of Machine Learning being presented in a very practical way, and as one of the available tools to support our work. And this is I guess the goal of all this AI/ML buzz: use it to reduce the data to clusters, but still… let us make that binary decision ourselves!