Speeding up case processing

A few years back I was looking at a data from my first forensic case: few images, hundreds of thousands files and only a very limited time to look through it. Like many before me I found it overwhelming and hard to manage.

I started a typical (and painful) journey through evidence playing around with data filtering using various criteria e.g. date, size, file extensions, I also tried hiding some of the data, and manually go through its subsets (e.g. by just looking at specific folders) and of course did some simple timeline analysis as well.

I thought there must be a better way to walk through this mess than just clicking through a graphical user interface (GUI).

As many investigators before me and always wanting to automate things I soon started toying around with various optimization ideas. I ended up developing various one-off, quick and dirty scripts and solutions with the aim of speeding up my analysis. Some of them worked, some of them were complete non-sense. Here and in the future posts, I will demonstrate (I love using this big-mouth word :)) some of them. At least, these that worked for me :).

For starters, a couple of general optimization ideas – later I will come back to more specific examples:

  • Obvious ones first
    • Invest in hardware – bigger, faster, more
    • Invest in software, but do it wisely (better more hardware with no expensive software, than less hardware with more expensive software)
    • Experiment, read and pick up new techniques from others
    • Automate stuff
    • Benchmark everything you can
  • Exit your comfort zone and:
    • Learn to program; this will enable you to code stuff, often, even smallest snippets of code can do lots of magic
    • Move from GUI to command line (CLI); it is just faster & often OS-independent (+Linux CLI tools are faster); I am a Windows guy and it was EXTREMELY difficult to break through; I had a good Linux mentor though at that time and thanks to him I made a huge progress in adopting at least CLI interface and tools (this is actually funny, because in the past I was finding it really hard to change from CLI to GUI after I moved from DOS to Windows; what a sweet irony….)
    • Move from CLI to use your own scripts/tools; it is faster and is also a way to automate and instrument things to work for you; even the best grep or CLI caterpillar (as I call endless list of CLI commands separated by pipes) cannot do what a simple script with a state machine/regexes can do
    • Work on mounted data instead of data loaded into application (maybe it is subjective, but to me it always worked faster – I will come back to it in the future)
    • Work on the same data on as many boxes as you can; at times, I have been working on the same data on 6 different machines via RDP and later combining data into one report; it is VERY HARD to manage, you will lose your mind, but it gives you an edge as you can simultaneously do different things (run strings on the whole image on one system, extract files on second, run multiple AV on another, and so on and so forth)
    • Use data in raw DD format for analysis – if all fancy tools fail, you can quickly switch to CLI and save the day (it happened to me lots of times); Raw data also allows to run strings over it, so later if you need to grep for stuff, you can search within extracted strings reducing search time; (instead of DD you can also of course mount images)
    • Divide work into steps that can be batch processed and/or processed simultaneously and independently; examples include:
      • Once you extract all .exe/.dll/etc. you can run AV over them, you can also run PE tools that highlight ‘funny’ stuff like high entropy, suspicious APIs, etc.
      • Look at logical drives separately; don’t run massive searches on the whole evidence in one go on one system; in case something breaks, you can at least preserve some part of work done, and it  easier to restart on a subset of data than on the whole evidence
    • Actively search, collect and install tools; don’t just bookmark pages – when time is important (and it always is), having proper tools at hand saves a lot of time [downloading time, installation, etc.]
  • Change the mindset and don’t just look at data – act on it
    • Get a full copy of evidence data to your workstation on a local drive
    • When you walk through it, analyze something and if it is not important  – delete it e.g. walking through folders/files that you have already seen:  just remove them; this way you can get rid of a lot of noise
    • Use better file explorer e.g. Total Commander, FAR to walk through content of files (I strongly advice NOT to use CLI for walking through files – Total Commander with a Quick Preview on allows to walk through many files in no time);
  • As mentioned earlier – benchmark – both tools and ideas; it can’t be stressed enough; just because strings/grep work, doesn’t mean they are the fastest; your regex may be also wrong and as programmers know – not everything can be searched for using regex; state machine or some fancy dedicated algorithm is often a much better option, not to mention a script that at least partially understands file format being scanned and can choose to ignore e.g. certain file types
  • Certain things in forensics are done, because ‘everyone does so’, even if it doesn’t make sense in certain cases, examples include:
    • Calculating hashes of all files (it is a good idea ONLY if you will actually use them)
    • Running clean tools from read-only media (malware can obviously hook/patch/disable these when they are loaded from a file to memory)
    • Scanning with multiple AV systems (custom malware is omnipresent; let’s face the facts: AV will never detect them)

That’s it for now. This is to a great extent a subjective list of mine and should not be treated as a silver bullet. What worked for me & for my cases may not work for you. And quite frankly – forensics analysis is very often less sexy as an outsider may think – it is struggling against time, customer expectations and… fatigue. If faster case processing can at least reduce the workload it is definitely worth thinking of.

Go ahead and create your own subjective list.

Automation vs. In-depth Malware Analysis – practically

You don’t need to read it if you are an experienced reverse engineer. You have been warned 🙂

In my old post about Automation vs. In-depth Malware Analysis I mentioned that dynamic analysis has its limitation. Just talking about this is not good enough though and I always wanted to provide some real-case example to back it up.

Today I came across a post from  Webroot written by Dancho Danchev; the post is talking about two client-side exploits serving malware campaigns. Since the blog entry provided the IP of a malicious web site, I visited it immediately to… well… get my test box infected 😉

The web site is serving blackhole exploit pack, and while it is an interesting subject for malware analysis, I was hoping more to find something to look at inside the payload – it’s good to see what happens to the system after it actually gets exploited by the latest badness. I didn’t need to wait long, the payload arrived pretty much right after I visited the malicious web site using old IE 6.0 (it’s very handy for exercises like this :))

The web page shows familiar BlackHole exploit loading screen:

In a background, browser is being served various exploits and once page started loading, I immediately spotted a piece of malware running happily from my Application Data folder.

I collected the piece from the sandbox (together with its dropper that was actually dropped and executed by an exploit pack, but then quickly stopped is execution) and loaded the code of  a payload dumped from memory into IDA. The code turned out to be a typical malware stuff (downloads&executes stuff from remote site), so not much to say about it really. What I spotted though is that there were two code branches inside WinMain that are dependent on the command line argument. And this gave me an idea to follow up on my old post.

Turns out the malicious .exe accepts two different command line arguments ‘a’ and ‘s’:

One code branch is for a regular win32 application, and one for a service process started via StartServiceCtrlDispatcherW.

Not only the service process executable may be relying on command line arguments that are hard to guess, but it also needs to be handled differently – one can’t just execute service process from a command line or explorer and observe its behaviour (service needs to be created first, then started e.g. via sc.exe;  attempting to run a service process from a command line will bring ERROR_FAILED_SERVICE_CONTROLLER_CONNECT error).

See how typical ‘service process’ testing would fail if command line ‘s’ argument is not provided, and what happens when the correct argument is actually there:

Note one more time that I have been communicating with the malicious .exe via sc.exe program, and not running it directly from a command line (this is how most of the dynamic analysis kick off).

In other words, dynamic analysis has a long way to go ‘to cover all angles’ i.e. manual code inspection and analyzing the code using a good disassembler as you walk through code with a debugger and/or other helper software is the best way to fully understand what’s going on.