You are browsing the archive for Preaching.

Finding Smoking Gun and going beyond that – Helpful Forensic Artifacts

August 23, 2012 in Compromise Detection, Forensic Analysis, Malware Analysis, Preaching, Tips & Tricks

While I am quite critical about the idea of collecting IOCs (Indicator of Compromise) describing various malware, traces of hacking, etc in a form of hashes, even fuzzy hashes, file names, sizes, etc., etc.  I do believe that there is a certain number of IOCs (or as I call them: HFAHelpful Forensic Artifact – as they are not necessary relevant to compromise itself) that are universal and worth collecting. I am talking about artifacts that are common to malware functionality and offensive activities on the system in general as well as any other artifact that may help both attackers and… in investigation (thanks to ‘helpful’ users that leave unencrypted credentials in text files, watch movies on critical systems, etc.).

In this post, I will provide some practical examples of what I mean by that.

Before I kick it off, just a quick reminder – the reasons why I am critical about bloated IOC databases is that they have a very limited applicability in a general sense; and the limitations come as a result of various techniques used by malware authors, offensive teams, etc. including, but not limited to:

  • metamorphism
  • randomization
  • encryption
  • data (e.g. strings) build on the fly (instead of hardcoding)
  • shellcode-like payloads
  • fast-flux
  • P2P
  • covert channels
  • etc.

Notably, antivirus detections of very advanced, metamorphic malware rely on state machines not strings and it’s naive to assume that collecting file names like sdra64.exe is going to save the day…

Anyway…

If we talk about good, interesting HFAs I think of things that:

  • are very often used in malware because of a simple fact they need to be there (dropping files, autostart, etc.)
  • traces of activities that must be carried on the compromised system (recon, downloading toolchests, etc.)
  • also (notably) traces of user activity that support attacker’s work (e.g. a file password.txt is not an IOC, but it’s HFA)
  • traces of system being affected in a negative way e.g. if system has been compromised previously by a generic malware, certain settings could have been changed (e.g. disabled tracing, blocked Task Manager, etc.); they are IOCs in a generic sense, but not really relevant to the actually investigated compromise; you can think of it of these aspects of system security that place the system on the opposite side to the properly secured and hardened box; it also included previously detected/removed malware – imho AV logs are not ‘clear’ IOCs as long as they relate to the event that is not related to investigated compromise

If we talk about a typical random malware, it’s usually stupidly written, using snippets copied&pasted from many sources on the internet. The authors are lazy and don’t even bother to encrypt strings, so detection is really easy. You can grep the file or a memory dump of a suspected process for typical autorun strings with strings, BinText, HexDive and most of the time you will find the smoking gun. If the attacker is advanced, all you will deal with is a blob of binary data that has no visible trace of being malicious unless disassembled – that is, a relocation independent, shellcode-like piece of mixed code/data in a metamorphic form that doesn’t require all the fuss of inline DLL/EXE loading, but it’s just a pure piece of code. It’s actually simple to write with a basic knowledge of assembly language and knowledge of OS internals. I honestly don’t know how to detect such malware in a generic way. I do believe that’s where the future of advanced malware is though (apart from going mobile). And I chuckle when I see malware that is 20MB in size (no matter how advanced the functionality).

When we talk about IOC/HFAs and offensive security practices, it is worth mentioning that we need to follow the mind process of an attacker. Let me give you an example. Assuming that the attacker gets on the system. What things s/he can do? If the malware is already there, it’s easy as the functionality is out there and can be leveraged, malicious payload updated and attacker can do anything that the actual payload is programmed to do and within the boundaries of what environment where it runs permits. On the other hand, if it is an attack that comes through a typical hacking attempt, the situation is different. In fact, the attacker is very limited when it comes to available tools/functionality and often has to leverage existing OS tools. This means exactly what it says – attacker operates in a minimalistic environment and is going to use any possible tool available on OS to his/her benefit. If we talk about Windows system, it can be

  • net.exe (and also net1.exe)
  • telnet.exe
  • ftp.exe

but also

  • command.com
  • cmd.exe
  • debug.exe
  • makecab.exe
  • diantz.exe
  • netsh.exe
  • netstat.exe
  • route.exe
  • hostname.exe
  • sc.exe
  • arp.exe
  • shutdown.exe
  • findstr.exe
  • at.exe
  • attrib.exe
  • cacls.exe
  • xcacls.exe
  • ping.exe
  • tracert.exe
  • runas.exe
  • more.com

and OS commands

  • echo
  • type
  • dir
  • md/mkdir
  • systeminfo

and many other command line tools and commands.

So, if you analyze memory dump from a Windows system, it’s good to search for presence of a file name associated with built-in windows utilities and look at the context i.e. surrounding memory region to see what can be possibly the reason of it being there (cmd.exe /c being the most common I guess).

Back to the original reason of this post – since I wanted to provide some real/practical examples of HFAs that one can utilize to analyze hosts, let me start with a simple classification by functionality/purpose:

  • information gathering
    • net.exe
    • net1.exe
    • psexec.exe/psexesvc.exe
    • dsquery.exe
    • arp.exe
    • traces of shell being used (cmd.exe /c)
    • passwords.txt, password.txt, pass.txt, etc.
  • data collection
    • type of files storing collected data
      • possibly password protected archives
      • encrypted data (e..g credit cards/track data)
    • various 3rd party tools to archive data:
      • rar, 7z, pkzip, tar, arj, lha, kgb, xz, etc.
    • OS-based tools
      • compress.exe
      • makecab.exe
      • iexpress.exe
      • diantz.exe
    • type of collected data
      • screen captures often saved as .jpg (small size)
      • screen captures file names often include date
      • keystroke names and their variants
        • PgDn, [PgDn],{PgDn}
        • VK_NEXT
        • PageDown, [PageDown] {PageDown}
      • timestamps (note that there are regional settings)
      • predictable Windows titles
        • [ C:\WINDOWS\system32\notepad.exe ]
        • [ C:\WINDOWS\system32\calc.exe ]
        • [http://google.com/ - Windows Internet Explorer]
        • [Google - Windows Internet Explorer]
        • [InPrivate - Windows Internet Explorer - [InPrivate]]
      • possible excluded window class names
        • msctls_progress32
        • SysTabControl32
        • SysTreeView32
      • content of the address bar
      • attractive data for attackers
        • regexes for PII (searching for names/dictionary/, states, countries, phone numbers, etc. may help)
        • anything that matches Luhn algorithm (credit cards)
      • input field names from web pages and related to intercepted/recognized credentials
        • user
        • username
        • password
        • pin
      • predictable user-generated content
        • internet searches
        • chats (acronymes, swearwords, smileys, etc.)
  • data exfiltration
    • who
      • username/passwords
    • how
      • ftp client (ftp.exe, far.exe, etc.)
      • browser (POSTs, more advanced: GETs)
      • DNS requests
      • USB stick
      • burnt CD
      • printer
    • how
      • just in time (frequent network connection)
      • ‘coming back’ to the system
    • configuration
      • file
      • registry
      • uses GUI (lots of good keywords!)
    • where to:
      • URLs
      • FTP server names
      • SMTP servers
      • mapped drives (\\foo\c$)
      • mapped remote paths (e.g. \\tsclient)
  • malicious code
    • any .exe/.zip in TEMP/APPLICATION DATA subfolders
    • processes that have a low editing distance between their names and known system processes (e.g. lsass.exe vs. lsas.exe)
    • processes that use known system processes but start from a different path
    • areas of memory containing “islands” with raw addresses of APIs typically used by malware e.g. CreateRemoteThread, WriteProcessMemory, wininet functions
  • mistakes
    • Event logs
    • AV logs/quarantine files
    • leftovers (files, etc.)

Many of these HFAs form a very managable set that when put together can be applied to different data sets (file names, file paths, file content, registry settings, memory content, process dumps, etc.).

In other words – instead of chasing after a sample/family/hacking group-specific stuff, we look for traces of all these things that make a malware – malware, a weak system – weak, a hack – hack and attack-supporting user – victim.

Speeding up case processing, part 2

May 21, 2012 in Forensic Analysis, Preaching

In my older post I talked about various things one can do to speed up case processing – this post is a quick follow up with some more hints; again, it is very Windows-centric.

Let’s start with simple things:

  • Use multiple computers with one keyboard and mouse – use Synergy to control them
  • Use multiple monitors
  • Use VNC to peep at the guest system if you use VMware – it’s often faster
  • Use VirtualKD to work faster with windbg/vmware
  • Rename tools – change long names to shorter e.g.
    • grep -> g
    • strings -> s
    • hexview ->h
    • and so on and so forth
  • If you do the same task more than once, write quick and dirty batch files, scripts (bat, cmd, vbs, vba, powershell, autoit, etc.) and keep them all in a repository so you can always leverage the snippets; you don’t need to build libraries, simple copy&paste is often good enough
  • example: if you often unpack SQLITE databases, avoid dumping the databases manually; write a batch file e.g. u_sql.bat and put there sth along the lines
md unpacked
for %%k in (*.*) do echo .dump|sqlite3 "%%k" > "unpacked\%%k.txt"

when ran, it will dump the databases into text files that can be easily grepped
  • another example: if you often unpack archives, avoid clicking the GUI; write a batch file e.g. u_arc.bat and put there sth along the lines
md unpacked
for %%k in (*.*) do "c:\program files\winrar\winrar.exe" -IBCK x -r -y "%%k" "unpacked\%%k\"

when ran, it will unpack all archives into unpacked\archive_name
  • Record macros and replay them (for mundane tasks – eg. if you need to fill in some stupid forms multiple times)
  • Learn to efficiently use Excel, in particular:
    • keyboard shortcuts (go ahead, and try: CTRL+`, CTRL+1, CTRL+;, CTRL+SHIFT+8, CTRL+PAGE DOWN, CTRL+PAGEUP, and then go to Excel help and read about all shortcuts)
    • Pivot tables (great for histograms and quick statistics)
    • Excel formulas e.g. VLOOKUP, HLOOKUP, CHAR, LEFT, FIND, etc.
    • Useful functions like Copy Formulas, Copy with Transposition, Copy Values only
    • Sorting and Advanced filtering
  • The same applies to Word
    • Learn about styles and stylesheets
    • Avoid changing default settings
    • Disable irritating functions
  • Avoid lower-quality software
    • You will lose time fighting with random crashes, badly designed UI, and lots of imperfections that steal your time; good (bad) example is OpenOffice – it is good for simple editing tasks, but it does not solve problems that MS Office solved many years ago and productivity-wise is way behind
  • Avoid tools that are NOT ready to be used immediately after downloading
    • The rule of a thumb is that you want to use the tool, not waste your time compiling/fixing bugs, etc.
    • If you are into research it’s of course fine, but if you want to do your work faster – AVOID wasting time on it; if it doesn’t compile, don’t try to build it and fix bugs
  • Set up environment to include paths to all your tools; if you run a tool and Windows Explorer pops up instead, you are doing it wrong :) and your PATH should be fixed
  • Use PATHEXT to run scripts directly from command line w/o specifying the interpreter
  • Use Registry tweaks to disable animations, and other fancy stuff
  • Autostart all the tools/services you frequently use and kill all the tools you don’t use (be brutal with services.msc or autoruns)
  • Use Registry tweaks to have a decent context menu that you use to quickly run some tool over the analyzed file e.g.:
    • HKEY_CLASSES_ROOT\*

as seen in the Regedit:

    • HKEY_CLASSES_ROOT\exefile

as seen in the context menu

That’s all for today.

Speeding up case processing

April 20, 2012 in Preaching

A few years back I was looking at a data from my first forensic case: few images, hundreds of thousands files and only a very limited time to look through it. Like many before me I found it overwhelming and hard to manage.

I started a typical (and painful) journey through evidence playing around with data filtering using various criteria e.g. date, size, file extensions, I also tried hiding some of the data, and manually go through its subsets (e.g. by just looking at specific folders) and of course did some simple timeline analysis as well.

I thought there must be a better way to walk through this mess than just clicking through a graphical user interface (GUI).

As many investigators before me and always wanting to automate things I soon started toying around with various optimization ideas. I ended up developing various one-off, quick and dirty scripts and solutions with the aim of speeding up my analysis. Some of them worked, some of them were complete non-sense. Here and in the future posts, I will demonstrate (I love using this big-mouth word :) ) some of them. At least, these that worked for me :) .

For starters, a couple of general optimization ideas – later I will come back to more specific examples:

  • Obvious ones first
    • Invest in hardware – bigger, faster, more
    • Invest in software, but do it wisely (better more hardware with no expensive software, than less hardware with more expensive software)
    • Experiment, read and pick up new techniques from others
    • Automate stuff
    • Benchmark everything you can
  • Exit your comfort zone and:
    • Learn to program; this will enable you to code stuff, often, even smallest snippets of code can do lots of magic
    • Move from GUI to command line (CLI); it is just faster & often OS-independent (+Linux CLI tools are faster); I am a Windows guy and it was EXTREMELY difficult to break through; I had a good Linux mentor though at that time and thanks to him I made a huge progress in adopting at least CLI interface and tools (this is actually funny, because in the past I was finding it really hard to change from CLI to GUI after I moved from DOS to Windows; what a sweet irony….)
    • Move from CLI to use your own scripts/tools; it is faster and is also a way to automate and instrument things to work for you; even the best grep or CLI caterpillar (as I call endless list of CLI commands separated by pipes) cannot do what a simple script with a state machine/regexes can do
    • Work on mounted data instead of data loaded into application (maybe it is subjective, but to me it always worked faster – I will come back to it in the future)
    • Work on the same data on as many boxes as you can; at times, I have been working on the same data on 6 different machines via RDP and later combining data into one report; it is VERY HARD to manage, you will lose your mind, but it gives you an edge as you can simultaneously do different things (run strings on the whole image on one system, extract files on second, run multiple AV on another, and so on and so forth)
    • Use data in raw DD format for analysis – if all fancy tools fail, you can quickly switch to CLI and save the day (it happened to me lots of times); Raw data also allows to run strings over it, so later if you need to grep for stuff, you can search within extracted strings reducing search time; (instead of DD you can also of course mount images)
    • Divide work into steps that can be batch processed and/or processed simultaneously and independently; examples include:
      • Once you extract all .exe/.dll/etc. you can run AV over them, you can also run PE tools that highlight ‘funny’ stuff like high entropy, suspicious APIs, etc.
      • Look at logical drives separately; don’t run massive searches on the whole evidence in one go on one system; in case something breaks, you can at least preserve some part of work done, and it  easier to restart on a subset of data than on the whole evidence
    • Actively search, collect and install tools; don’t just bookmark pages – when time is important (and it always is), having proper tools at hand saves a lot of time [downloading time, installation, etc.]
  • Change the mindset and don’t just look at data – act on it
    • Get a full copy of evidence data to your workstation on a local drive
    • When you walk through it, analyze something and if it is not important  – delete it e.g. walking through folders/files that you have already seen:  just remove them; this way you can get rid of a lot of noise
    • Use better file explorer e.g. Total Commander, FAR to walk through content of files (I strongly advice NOT to use CLI for walking through files – Total Commander with a Quick Preview on allows to walk through many files in no time);
  • As mentioned earlier – benchmark – both tools and ideas; it can’t be stressed enough; just because strings/grep work, doesn’t mean they are the fastest; your regex may be also wrong and as programmers know – not everything can be searched for using regex; state machine or some fancy dedicated algorithm is often a much better option, not to mention a script that at least partially understands file format being scanned and can choose to ignore e.g. certain file types
  • Certain things in forensics are done, because ‘everyone does so’, even if it doesn’t make sense in certain cases, examples include:
    • Calculating hashes of all files (it is a good idea ONLY if you will actually use them)
    • Running clean tools from read-only media (malware can obviously hook/patch/disable these when they are loaded from a file to memory)
    • Scanning with multiple AV systems (custom malware is omnipresent; let’s face the facts: AV will never detect them)

That’s it for now. This is to a great extent a subjective list of mine and should not be treated as a silver bullet. What worked for me & for my cases may not work for you. And quite frankly – forensics analysis is very often less sexy as an outsider may think – it is struggling against time, customer expectations and… fatigue. If faster case processing can at least reduce the workload it is definitely worth thinking of.

Go ahead and create your own subjective list.

Automation vs. In-depth Malware Analysis – practically

February 9, 2012 in Malware Analysis, Preaching

You don’t need to read it if you are an experienced reverse engineer. You have been warned :)

In my old post about Automation vs. In-depth Malware Analysis I mentioned that dynamic analysis has its limitation. Just talking about this is not good enough though and I always wanted to provide some real-case example to back it up.

Today I came across a post from  Webroot written by Dancho Danchev; the post is talking about two client-side exploits serving malware campaigns. Since the blog entry provided the IP of a malicious web site, I visited it immediately to… well… get my test box infected ;)

The web site is serving blackhole exploit pack, and while it is an interesting subject for malware analysis, I was hoping more to find something to look at inside the payload – it’s good to see what happens to the system after it actually gets exploited by the latest badness. I didn’t need to wait long, the payload arrived pretty much right after I visited the malicious web site using old IE 6.0 (it’s very handy for exercises like this :) )

The web page shows familiar BlackHole exploit loading screen:

In a background, browser is being served various exploits and once page started loading, I immediately spotted a piece of malware running happily from my Application Data folder.

I collected the piece from the sandbox (together with its dropper that was actually dropped and executed by an exploit pack, but then quickly stopped is execution) and loaded the code of  a payload dumped from memory into IDA. The code turned out to be a typical malware stuff (downloads&executes stuff from remote site), so not much to say about it really. What I spotted though is that there were two code branches inside WinMain that are dependent on the command line argument. And this gave me an idea to follow up on my old post.

Turns out the malicious .exe accepts two different command line arguments ‘a’ and ‘s’:

One code branch is for a regular win32 application, and one for a service process started via StartServiceCtrlDispatcherW.

Not only the service process executable may be relying on command line arguments that are hard to guess, but it also needs to be handled differently – one can’t just execute service process from a command line or explorer and observe its behaviour (service needs to be created first, then started e.g. via sc.exe;  attempting to run a service process from a command line will bring ERROR_FAILED_SERVICE_CONTROLLER_CONNECT error).

See how typical ‘service process’ testing would fail if command line ‘s’ argument is not provided, and what happens when the correct argument is actually there:

Note one more time that I have been communicating with the malicious .exe via sc.exe program, and not running it directly from a command line (this is how most of the dynamic analysis kick off).

In other words, dynamic analysis has a long way to go ‘to cover all angles’ i.e. manual code inspection and analyzing the code using a good disassembler as you walk through code with a debugger and/or other helper software is the best way to fully understand what’s going on.

OpenIOC – why it may work, but more importantly – why it may not… OpenHFA?

January 13, 2012 in Preaching

Recently there is a lot of buzz about OpenIOC. The idea is great – you come across a new IOC (Indicator of Compromise), you describe it in XML, add it to a globally shared collection, and next time you or someone else runs a new case, you or them may get lucky and can use the global intel and solve the case in no time.

Everyone wins.

Hmmm not really.

Why it may work

This is the shortest part of this article. Sharing intel is a no-brainer. It is truely a great idea. Seriously. Finally someone not only realized that talking about sharing data about forensic artifacts in an unified, foolpropof way in a forensics world is not good enough, but it is time to actually create a platform to do so. I personally don’t like XML, but hard to think of a better way of sharing data w/o losing time on parsing badly-formatted CSV files, proprietery formats, etc.. The other great thing about OpenIOC is that someone was brave enough to admit that forensic artifacts shared so far within the community go beyond just a file name, its size, MD5, and as many versions of SHA as possible. Plus, of course a fuzzy hash :-) .

This MAY actually work. But before it works, it’s good to try analyze what it may not.

Why it may not work

Tl;DR; Writing good quality IOCs is NON-TRIVIAL.

The examples of IOC presented on OpenIOC web site are decent, yet they focus on specific samples of malware. This may be a great approach for a family of malware that doesn’t change often, is usually targeted, and stays under radar of Antivirus companies because… they never see the samples until it is really too late. Usually this kind of malware is distributed within a small, niche environment and Stuxnet, ATM malware, or malicious software used by carders to attack POS systems is a good example. Good as an example, but not that useful practically speaking.

So, what are the potential challenges of OpenIOC?

The first problem is that malware / intruders’ activities:

  • are highly intelligent (never underestimate your opponent)
  • are randomized
  • are metamorphic or recompiled for new environments
  • use constantly changing reg entries, file names, domain names and rely on peer to peer network & fast-flux and many other techniques to avoid static and heuristic detection by whatever forensic artifact possible
  • rely on tools that prevent finding artifacts (timestomping, secure deletion, browser cache cleanup, DNS cache cleanup, etc.)

Second problem comes as a result of the first one – knowing about constantly changing artifacts potential IOC creators are forced to either not write IOC at all (waste of time, etc.), or… automate it.

But you see.. you can’t automate it.

Depending on dynamic analysis is really not good enough. Malware often creates, modifies, deletes, and amends system’ state in many different ways, often affecting ‘clean’ system areas that are accessed/created by clean components indirectly loaded/used by malware, let it be a WebControl, external application called directly by malware, or even registry settings of protectors used by both legitimate software and malware. Of course, the other problem is that dynamic analysis is not good enough to discover hidden paths in an analyzed code, e.g. code dependent on a file location, current account name, presence of specific triggers, let alone code that is executed based on incoming messages from Command & Control.

Is it worth then to do in-depth analysis for a sole purpose of creating a very good IOC?

No. If the malware shows signs of randomization, anti-heuristic behaviour, fast-flux, you better leave it.

Now, imagine that IOCs are actually created as a side effect of in-depth malware analysis. How often will you really see these specific artifacts in the future? I tried similar approach in the past and it does work for a specific environment or niche market, but not _globally_. Maybe I was just not lucky enough, yet the question remains. Working your future cases, ask yourself – even if you see similar stuff, how often it is really something that can be so easily framed into a simple logical formula? It is rare, and that’s the exact reason that makes this job so interesting! :-)

The first and second problem create a combined problem. Loosely collected IOCs based on specific patterns/artifacts, even if shared globally are nothing more than a blacklist. And blacklists simply don’t work (AV vendors struggle with the same problem for years and that’s why they focus their efforts so much on improving behavioural and heuristical detection instead of pattern or even algorithmical (but still static) detection).

The third problem is the speed. It is not visible at the moment, but assuming that new IOC entries will start being created often and eagerly and then shared globally, tools that handle them won’t be able to cope with a number of checks they need to perform on the evidence. Think of IDS/IPS systems or AV engines. The rules used to detect network anomalies or malware patterns are always converted to some clever structures, let it be a trie, DFA, NFA, or some patent-covered solutions – they are superoptimized, because they _need_ to be superfast to scan files for hundreds of thousands if not millions of signatures in a processing time as close as possible to linear! I don’t know the internals of tools currently relying on OpenIOC, but just pointing out that search speed should maintain relatively constant independently on number of sigs.

The fourth problem is quality. IOCs need to be reviewed, revised, rejected. FPs or one-off IOCs will bring a lot of noise and managing a collection of IOC is as challenging as it is with IDS/IPS/AV sigs, if no more (AV sigs are maintained within 1 company, OpenIOC could potentially be a hivemind) .

The fifth problem is the ‘open’. Yes, sadly. There already exist portals to verify safely (no submission to AV vendors) if a new malware sample is detected by any AV. Adding  a feature to pull latest OpenIOC repository and check if system modifications introduced by sample to the system trigger any OpenIOC entries is trivial.

How to make it work?

The focus should be not on malware, constantly changeable intrusion artifiacts, but on highlighting artifacts that have a universal use to forensic investigators. Examples include artifacts that can be parsed, analyzed, basically, can help to solve the case.

Examples include:

  • names and locations of log files + their format/parsers
  • all possible known autorun entries
  • timeline anomalies
  • discrepancies that could be described as subtle variation from the OS baseline (but not hashes!) – examples include calculating Levenshtein distance between active process names and known clean process names, anomalous locations for active processes using names of known clean processes, discovering homographic attacks by finding Unicode characters in files/process names, clustering file system entries,
  • and many others

Alternative solution

Collecting IOCs may be helpful short-term, but collecting HFAs (Helpful Forensic Artifact) is the way to go.

OpenHFA anyone?

Updates:

- reviewed and clarified the language to remove ambiguity

Some good stuff to look at

December 10, 2011 in Preaching

There is a set of simple rules that we apply to all blog posts on Hexacorn:

  • No news (other bloggers and news aggregators do it very well already).
  • No previously published material (why repost).
  • No shameless plug (you read our stuff, you will know that we know what we are talking about; oops, this will never happen again, we promise :) ).
  • No censorship (facts).
  • Minimalistic number of external links (as per wikipedia article on Antoine de Saint Exupéry… perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away)

This post is an exception. And it is because there are websites/articles that stand out and need to be re-published over and over again so it is not lost. These are a must read for any security professionals/managers and real knowledge seekers:

 

 

How to learn reverse code engineering (RCE)

December 8, 2011 in Preaching

RCE is getting really popular and is really needed. It is helpful in malware analysis, debugging your own apps, solving crackmes, fixing bugs in abandowanware, and it can be handy in localization. It makes you a better programmer as well. Of course, it also helps to steal and plagiarize code, bypass software protections, discover vulnerabilities, write shellcodes and jailbreaks, stuxnets, rootkits and make people’s lifes miserable and/or interesting in many other creative ways.

So, how to learn RCE and/or malware analysis?

There are many answers online and they vary a lot. Many people suggest books, tutorials, ebooks… on IDA, on assembly, on Reverse Engineering in general, some suggest doing courses and certificates, others watching youtube videos and some advise new adepts of RCE to simply stop wasting their time.

I would like to provide you with my own version making it as minimalistic and practical at the same time as possible. Yes, it is not full, yes it is far from being perfect, yes you are not going to analyze rootkits just yet and yes – it is Windows oriented.

But…

If you read the stuff I point to and really focus on spending a few hours/week on actually making tones of mistakes plus avoid claiming victories easily achieved by using automation and tools developed by others, you are going to get there before you even realize.

  • First, you need to learn about programming in general and actually start coding. You can’t reverse engineer if you don’t program. It is simple as that. If you programmed before, move on to the next point. If you didn’t – don’t buy heavy C++, C#, Java, Python reference books just yet. Buy a book with silly, but practical examples of simple programs explaining the fundamental architecture of Windows. Try this classic book from Charles Petzold. Read it inside out, and take your time to actually _type_ the code listings. Yes, you heard that right. It’s mundane, it’s error-prone, yet this is how learning to program works. The only way is through a keyboard So, get ready to invest quite a lot of time - you will be fixing typos, compiler errors, getting completely unpredictable results and will encounter a lot of pain and stress as you go along. Hint: It really helps if you are 10-15 years old right now :) .

 

  • Read other peoples’ code. Skim through it, and if you find something interesting, read more thoroughly and ‘get it’. Again, no need to understand everything, but if you want to understand, google around until you do. No, do not start reading Linux code just yet. Start with short code snippets on educational web sites. Look at the source code of some small, but interesting and potentially malware-related projects. Just see how people do stuff, try to figure it out. This is actually the most crucial part of reverse engineering – it is not only about reading the code, browsing through listings, spotting known APIs, running ‘strings’ on a file, or playing around with ‘Procmon’, ‘Dependency Walker’ and ‘GMER’. It is trying to wear authors’ shoes for a moment. If you can figure out his or her thought process that led to this and that implementation, you will be making a huge progress very quickly.

 

  • Learn a small subset of x86 assembly language. No, Intel and AMD manuals are not a good start. Try Win32ASM tutorials from Iczelion first. Pick up his examples one by one and read it thoroughly. Pay attention to syntax, conventions, comments, names, etc.; the code you see there is what you will most likely encounter while doing your first reversing exercises.

 

  • Refer to MSDN often. Anytime you come across a new function name, either google or MSDN it. Read the concepts associated with the function (usually functions are associated with some ‘high level’ topic e.g. CreateFile with File Operations). Read the full description, don’t be lazy. Bits you pick up as you read stuff will provide you with an invaluable insight in the future.

 

  • Only now start googling for tutorials on how to reverse/crack/debug applications or buy books that will expand your knowledge. Yes, reversing requires a solid foundation from many aspects of IT; if you don’t know these basics, you will continue to be a tool user and no youtube video or book on IDA can help you here…

 

Automation vs. In-depth Malware Analysis

November 21, 2011 in Malware Analysis, Preaching

Nowadays many web sites offer services that can be called ‘malware analysis for the crowd’. Web sites like VirusTotal, ThreatExpert, JSUnpack and many others provide a file scan/analysis functionality utilizing multiple antivirus scanners and/or sandbox/live analysis bundled with a bunch of other tools e.g. file format analyzers, packer detectors, and so on and so forth. They actually do a really great job and submitting samples to these services is one of the very first steps taken by many Incident Response handlers and Forensic Investigators all over the world. This post is my attempt to summarize my thoughts on the topic of both automated malware analysis in general and consensual submission of files to a web site owned by a third party.

You see… while it is a great source of immediate intel, submitting samples to the publicly available services is not always the best choice. There are real-life situations where it is not only a bad idea, but it also may be very costly to your company, or your customer. Both on the PR and financial side of things. So, while I do not oppose these services , I do believe that some serious thought needs to be given to it first, and of course, _before_ the submission. It is also my strong belief that you can’t rely on information you cannot yourself verify (if asked to). And if you do, you not only deprive yourself from a pleasure of finding things out, but also risk drawing incorrect conclusions.

The list below is obviously far from being complete:

  • The sample may be a part of the targeted attack
    • Samples submitted to these services are shared; they are shared for a good purpose of course, to produce AV signatures and provide better detection, but… sooner or later one of these sensitive sampless may fall into hands of a person that will eagerly write a cool blog about it (and frankly speaking, that will be a great blog entry!)
    • Malware including passwords, credentials for data extortion, as well as data that would clearly identify the customer is getting more and more common; trust me, there are many malicious samples out there that contain very sensitive data inside its code and you really don’t want them to be shared; researchers working for security companies know about it – they actively search and look for interesting samples because any new technique, new Rustock, Stuxnet, etc will not only boost the company’s profile and researcher’s own personal image - more importantly – it also allows them to escape a daily routine of writing signatures to focus on a cool stuff (you know who you are ;) )
  • AV scan is helpful to identify the malware, yet…
    • With a number of malware samples collected by AV companies being extremely high, it’s easy for a particular file to be detected incorrectly
    • Many AV companies use generic names like ‘trojan horse’, ‘trojan generic’, ‘heuristic badness’ etc.; this doesn’t really answer the question ‘what does this malware do’
    • AV companies may use other AV vendor’s scanners to automatically process large sample sets; a mistakenly classified malware can easily transfer the incorrect classification to other vendors (a fun fact: in 2010, one of the leading AV vendors pulled a leg of other vendors by generating 20 dummy malware samples for which they created detections and submitted these samples to VirusTotal; within less than 2 weeks, more than 10 vendors detected these files as malicious!)
    • Even scans with products from multiple AV vendors don’t guarantee detection – most AV engines do not detect new samples fast enough; you will be often left on your own with a new or targeted malware (take a note of this point: AV is still more a service that is reactive than proactive – someone needs to submit the sample first for the signature to be created)
    • False Positives are still there
  • Sandbox/live analysis is by its nature limited
    • It is not interactive, or interaction is very limited; it is easy to use, but this is its trade-off; you only see a data dump and a subset of artifacts, but without understanding the code and the context in which these artifacts have been created (of course, it is often enough to answer: is it malicious?, but not ‘what does it really do’)
    • It doesn’t rely on your company’s baseline build; thus, tested malware will run in an environment completely different from your company’s and may behave differently; practically speaking, if you are an incident responder interested in domains you want to block, or a forensic investigator, you can’t rely on the result of this analysis only; you may miss some of the artifacts that malware could produce have it got a chance to be executed within a slightly different environment or at a different time
    • Many malicious samples come with an anti-sandboxing technology; it is very simple to use and quite hard to bypass
  • Dynamic analysis in general is also very limited by its nature
    • It misses a lot of code branches, including dead code (some malware authors still use older compilers and these can produce executables like this); in some cases dead code helps to find some crucial information about malware authors or their modus operandi
    • It misses a lot of code/data/generated at runtime, decrypted at runtime, etc.
    • It misses the metadata associated with the sample – coding style, copied&pasted routines, hidden messages, config data, etc.
    • It assumes malware immediately does its dirty work; this can be easily slowed down by a long delay or other tricks e.g. built-in ‘expiration date’ or system/hardware ID (that is, some malware is pre-compiled to work on specifis system only)
    • Many malware samples used in targeted attacks won’t work in an environment not having specific files/paths/registry keys and will immediately exit; Stuxnet and credit card dumpers are good examples
    • Certain functions of malware are executed only if a specific application is running (e.g. browser, IM software)
    • It doesn’t work well for components e.g. DLL files (if they export functions, you don’t know what arguments to pass)
    • It doesn’t work well for kernel mode drivers, as well as PDFs, SWF, Java, DEX, SIS, and hundreds of other file formats that you will come across in your career
    • It doesn’t work for server-side malware
    • It also doesn’t work well for malware that expects… command line arguments
    • and million other reasons…
  • Last, but not least – if you are using older browser, you are providing a web site with a full path to a sample location on your hard drive; this may look innocent, but you may be revealing information about your customer, current case or even your own company or credentials (%USERPROFILE%\Desktop\ACMECASE\sample.zip is a really bad idea to place your samples)

As you can see, there are many reasons why you should be careful when you handle samples extracted from yours or your customers’ systems. There are companies out there that have been exposed because the samples targetting their systems have leaked to the public.

It also makes sense to invest time and learn on how to do in-depth malware analysis in-house, or at least find a trusted specialist to help you with this task. You can stand by any claim coming out from your analysis, and more importantly – you will als have a lot of fun while cracking the malware.

The bottom line is:

  • Use automation as much as you can
  • Think twice before you submit the samples to web sites owned by third party and more importantly – assume and accept the fact that you lose control over the distribution of your samples
  • Use data from multi AV scan/sandbox/live analysis as a foundation for further analysis, not as a final conclusion
  • Do not trust threat names provided by automated tools, and understand that the difference between threats is getting more and more blurry; even if some malware is called virus or trojan, it may also include worm’s capability, rootkit functionality and MBR infection routines
  • If you add results of automatic analysis to your reports, do your homework and confirm findings manually, or state that it is impossible (and provide the reasons)
  • Do learn and use in-depth malware nalysis techniques but also understand that it has limitations as well – some malware takes months to develop and is improved over the time, often reaching level of complexity making its analysis really hard; sometimes it is just not worth it
  • Read other blogs – just because one guy says something, doesn’t mean it is correct – learn to question everything and trust only stuff that is peer reviewed