Recently there is a lot of buzz about OpenIOC. The idea is great – you come across a new IOC (Indicator of Compromise), you describe it in XML, add it to a globally shared collection, and next time you or someone else runs a new case, you or them may get lucky and can use the global intel and solve the case in no time.
Hmmm not really.
Why it may work
This is the shortest part of this article. Sharing intel is a no-brainer. It is truely a great idea. Seriously. Finally someone not only realized that talking about sharing data about forensic artifacts in an unified, foolpropof way in a forensics world is not good enough, but it is time to actually create a platform to do so. I personally don’t like XML, but hard to think of a better way of sharing data w/o losing time on parsing badly-formatted CSV files, proprietery formats, etc.. The other great thing about OpenIOC is that someone was brave enough to admit that forensic artifacts shared so far within the community go beyond just a file name, its size, MD5, and as many versions of SHA as possible. Plus, of course a fuzzy hash .
This MAY actually work. But before it works, it’s good to try analyze what it may not.
Why it may not work
Tl;DR; Writing good quality IOCs is NON-TRIVIAL.
The examples of IOC presented on OpenIOC web site are decent, yet they focus on specific samples of malware. This may be a great approach for a family of malware that doesn’t change often, is usually targeted, and stays under radar of Antivirus companies because… they never see the samples until it is really too late. Usually this kind of malware is distributed within a small, niche environment and Stuxnet, ATM malware, or malicious software used by carders to attack POS systems is a good example. Good as an example, but not that useful practically speaking.
So, what are the potential challenges of OpenIOC?
The first problem is that malware / intruders’ activities:
- are highly intelligent (never underestimate your opponent)
- are randomized
- are metamorphic or recompiled for new environments
- use constantly changing reg entries, file names, domain names and rely on peer to peer network & fast-flux and many other techniques to avoid static and heuristic detection by whatever forensic artifact possible
- rely on tools that prevent finding artifacts (timestomping, secure deletion, browser cache cleanup, DNS cache cleanup, etc.)
Second problem comes as a result of the first one – knowing about constantly changing artifacts potential IOC creators are forced to either not write IOC at all (waste of time, etc.), or… automate it.
But you see.. you can’t automate it.
Depending on dynamic analysis is really not good enough. Malware often creates, modifies, deletes, and amends system’ state in many different ways, often affecting ‘clean’ system areas that are accessed/created by clean components indirectly loaded/used by malware, let it be a WebControl, external application called directly by malware, or even registry settings of protectors used by both legitimate software and malware. Of course, the other problem is that dynamic analysis is not good enough to discover hidden paths in an analyzed code, e.g. code dependent on a file location, current account name, presence of specific triggers, let alone code that is executed based on incoming messages from Command & Control.
Is it worth then to do in-depth analysis for a sole purpose of creating a very good IOC?
No. If the malware shows signs of randomization, anti-heuristic behaviour, fast-flux, you better leave it.
Now, imagine that IOCs are actually created as a side effect of in-depth malware analysis. How often will you really see these specific artifacts in the future? I tried similar approach in the past and it does work for a specific environment or niche market, but not _globally_. Maybe I was just not lucky enough, yet the question remains. Working your future cases, ask yourself – even if you see similar stuff, how often it is really something that can be so easily framed into a simple logical formula? It is rare, and that’s the exact reason that makes this job so interesting!
The first and second problem create a combined problem. Loosely collected IOCs based on specific patterns/artifacts, even if shared globally are nothing more than a blacklist. And blacklists simply don’t work (AV vendors struggle with the same problem for years and that’s why they focus their efforts so much on improving behavioural and heuristical detection instead of pattern or even algorithmical (but still static) detection).
The third problem is the speed. It is not visible at the moment, but assuming that new IOC entries will start being created often and eagerly and then shared globally, tools that handle them won’t be able to cope with a number of checks they need to perform on the evidence. Think of IDS/IPS systems or AV engines. The rules used to detect network anomalies or malware patterns are always converted to some clever structures, let it be a trie, DFA, NFA, or some patent-covered solutions – they are superoptimized, because they _need_ to be superfast to scan files for hundreds of thousands if not millions of signatures in a processing time as close as possible to linear! I don’t know the internals of tools currently relying on OpenIOC, but just pointing out that search speed should maintain relatively constant independently on number of sigs.
The fourth problem is quality. IOCs need to be reviewed, revised, rejected. FPs or one-off IOCs will bring a lot of noise and managing a collection of IOC is as challenging as it is with IDS/IPS/AV sigs, if no more (AV sigs are maintained within 1 company, OpenIOC could potentially be a hivemind) .
The fifth problem is the ‘open’. Yes, sadly. There already exist portals to verify safely (no submission to AV vendors) if a new malware sample is detected by any AV. Adding a feature to pull latest OpenIOC repository and check if system modifications introduced by sample to the system trigger any OpenIOC entries is trivial.
How to make it work?
The focus should be not on malware, constantly changeable intrusion artifiacts, but on highlighting artifacts that have a universal use to forensic investigators. Examples include artifacts that can be parsed, analyzed, basically, can help to solve the case.
- names and locations of log files + their format/parsers
- all possible known autorun entries
- timeline anomalies
- discrepancies that could be described as subtle variation from the OS baseline (but not hashes!) – examples include calculating Levenshtein distance between active process names and known clean process names, anomalous locations for active processes using names of known clean processes, discovering homographic attacks by finding Unicode characters in files/process names, clustering file system entries,
- and many others
Collecting IOCs may be helpful short-term, but collecting HFAs (Helpful Forensic Artifact) is the way to go.
- reviewed and clarified the language to remove ambiguity