Speeding up case processing

A few years back I was looking at a data from my first forensic case: few images, hundreds of thousands files and only a very limited time to look through it. Like many before me I found it overwhelming and hard to manage.

I started a typical (and painful) journey through evidence playing around with data filtering using various criteria e.g. date, size, file extensions, I also tried hiding some of the data, and manually go through its subsets (e.g. by just looking at specific folders) and of course did some simple timeline analysis as well.

I thought there must be a better way to walk through this mess than just clicking through a graphical user interface (GUI).

As many investigators before me and always wanting to automate things I soon started toying around with various optimization ideas. I ended up developing various one-off, quick and dirty scripts and solutions with the aim of speeding up my analysis. Some of them worked, some of them were complete non-sense. Here and in the future posts, I will demonstrate (I love using this big-mouth word :)) some of them. At least, these that worked for me :).

For starters, a couple of general optimization ideas – later I will come back to more specific examples:

  • Obvious ones first
    • Invest in hardware – bigger, faster, more
    • Invest in software, but do it wisely (better more hardware with no expensive software, than less hardware with more expensive software)
    • Experiment, read and pick up new techniques from others
    • Automate stuff
    • Benchmark everything you can
  • Exit your comfort zone and:
    • Learn to program; this will enable you to code stuff, often, even smallest snippets of code can do lots of magic
    • Move from GUI to command line (CLI); it is just faster & often OS-independent (+Linux CLI tools are faster); I am a Windows guy and it was EXTREMELY difficult to break through; I had a good Linux mentor though at that time and thanks to him I made a huge progress in adopting at least CLI interface and tools (this is actually funny, because in the past I was finding it really hard to change from CLI to GUI after I moved from DOS to Windows; what a sweet irony….)
    • Move from CLI to use your own scripts/tools; it is faster and is also a way to automate and instrument things to work for you; even the best grep or CLI caterpillar (as I call endless list of CLI commands separated by pipes) cannot do what a simple script with a state machine/regexes can do
    • Work on mounted data instead of data loaded into application (maybe it is subjective, but to me it always worked faster – I will come back to it in the future)
    • Work on the same data on as many boxes as you can; at times, I have been working on the same data on 6 different machines via RDP and later combining data into one report; it is VERY HARD to manage, you will lose your mind, but it gives you an edge as you can simultaneously do different things (run strings on the whole image on one system, extract files on second, run multiple AV on another, and so on and so forth)
    • Use data in raw DD format for analysis – if all fancy tools fail, you can quickly switch to CLI and save the day (it happened to me lots of times); Raw data also allows to run strings over it, so later if you need to grep for stuff, you can search within extracted strings reducing search time; (instead of DD you can also of course mount images)
    • Divide work into steps that can be batch processed and/or processed simultaneously and independently; examples include:
      • Once you extract all .exe/.dll/etc. you can run AV over them, you can also run PE tools that highlight ‘funny’ stuff like high entropy, suspicious APIs, etc.
      • Look at logical drives separately; don’t run massive searches on the whole evidence in one go on one system; in case something breaks, you can at least preserve some part of work done, and it  easier to restart on a subset of data than on the whole evidence
    • Actively search, collect and install tools; don’t just bookmark pages – when time is important (and it always is), having proper tools at hand saves a lot of time [downloading time, installation, etc.]
  • Change the mindset and don’t just look at data – act on it
    • Get a full copy of evidence data to your workstation on a local drive
    • When you walk through it, analyze something and if it is not important  – delete it e.g. walking through folders/files that you have already seen:  just remove them; this way you can get rid of a lot of noise
    • Use better file explorer e.g. Total Commander, FAR to walk through content of files (I strongly advice NOT to use CLI for walking through files – Total Commander with a Quick Preview on allows to walk through many files in no time);
  • As mentioned earlier – benchmark – both tools and ideas; it can’t be stressed enough; just because strings/grep work, doesn’t mean they are the fastest; your regex may be also wrong and as programmers know – not everything can be searched for using regex; state machine or some fancy dedicated algorithm is often a much better option, not to mention a script that at least partially understands file format being scanned and can choose to ignore e.g. certain file types
  • Certain things in forensics are done, because ‘everyone does so’, even if it doesn’t make sense in certain cases, examples include:
    • Calculating hashes of all files (it is a good idea ONLY if you will actually use them)
    • Running clean tools from read-only media (malware can obviously hook/patch/disable these when they are loaded from a file to memory)
    • Scanning with multiple AV systems (custom malware is omnipresent; let’s face the facts: AV will never detect them)

That’s it for now. This is to a great extent a subjective list of mine and should not be treated as a silver bullet. What worked for me & for my cases may not work for you. And quite frankly – forensics analysis is very often less sexy as an outsider may think – it is struggling against time, customer expectations and… fatigue. If faster case processing can at least reduce the workload it is definitely worth thinking of.

Go ahead and create your own subjective list.