You are browsing the archive for Batch Analysis.

Enter Sandbox – part 7: Hello, مرحبا, 您好, здравствуйте, γεια σας

June 27, 2015 in Batch Analysis, Malware Analysis, Sandboxing

Most of modern applications use Windows APIs that rely on Unicode (or, at least its subset) and as such they rely on ‘W’ versions of the APIs as opposed to older apps that used ANSI ‘A’ versions (f.ex. CreateFileW vs. CreateFileA). Of course, the native APIs rely on Unicode for a long time. Unicode makes it easy and avoids ambiguities associated with the ANSI encodings which can always be mapped to many character sets – depending on the OS/application version. This is why running old localized applications on English OS leads to some unrecognizable garbage characters shown on the UI.

The number of old apps that rely on ANSI functions is still very huge and not taking them into account makes it harder to cherry-pick some interesting clues from the samples. Some of these clues can make it to the final report as well and actually enrich it a lot.

Let’s look at an example.

An application does something, and then displays a message box with a caption ‘Îøèáêà’ saying ‘Çàïðàøèâàåìûé ôàéë íå íàéäåí’.

Obviously, it doesn’t tell us much.

What if we attempted to translate it blindly into Unicode using the most popular ANSI encodings?

We would get sth like this:

1250 (Central Europe)           = Îřčáęŕ
1251 (Cyrillic)                 = Ошибка
1252 (Latin I)                  = Îøèáêà
1253 (Greek)                    = Ξψθακΰ
1254 (Turkish)                  = Îøèáêà
1255 (Hebrew)                   = ־רטבךא
1256 (Arabic)                   = خّèلêà
1257 (Baltic)                   = Īųčįźą
1258 (Vietnam)                  = Îøèáêà
 874 (Thai)                     = ฮ๘่แ๊เ
 932 (Japanese Shift-JIS)       = ホ碎
 936 (Simplified Chinese GBK)   = 硒栳赅
 949 (Korean)                   = 丘矮魏
 950 (Traditional Chinese Big5) = 昮魨罻

for the caption, and for the message:

1250 (Central Europe)           = Çŕďđŕřčâŕĺěűé ôŕéë íĺ íŕéäĺí
1251 (Cyrillic)                 = Запрашиваемый файл не найден
1252 (Latin I)                  = Çàïðàøèâàåìûé ôàéë íå íàéäåí
1253 (Greek)                    = Ηΰοπΰψθβΰεμϋι τΰιλ νε νΰιδεν
1254 (Turkish)                  = Çàïğàøèâàåìûé ôàéë íå íàéäåí
1255 (Hebrew)                   = ַאןנארטגאולי פאיכ םו םאיהום
1256 (Arabic)                   = اàïًàّèâàهىûé ôàéë يه يàéنهي
1257 (Baltic)                   = Ēąļšąųčāąåģūé ōąéė ķå ķąéäåķ
1258 (Vietnam)                  = Çàïđàøèâàǻûé ôàéë íå íàéäåí
 874 (Thai)                     = วเ๏๐เ๘่โเๅ์๛้ ๔เ้๋ ํๅ ํเ้ไๅํ
 932 (Japanese Shift-JIS)       = ヌ瑜籵褌隆 鴉 淲 浯鱠褊
 936 (Simplified Chinese GBK)   = 青镳帏桠噱禧?羿殡 礤 磬殇屙
 949 (Korean)                   = 행穽星外齧荏?牒雨 張 壯藕孼
 950 (Traditional Chinese Big5) = 瀔僤魤馲檞?邍澣 翴 縺毈樇

Even without the knowledge of the specific languages it’s easy to pick up the correct mapping which is ‘Ошибка’ (meaning ‘Error’) for the caption, and ‘Запрашиваемый файл не найден’ (meaning ‘File not found’) in Russian.

We can confirm it by running it on the Russian OS:


The exercise above my friend is an attempt to make a sandbox polyglottic. Add some modules to recognize the most common languages and who knows, maybe it will be able to recognize that these calls to FindWindow know no linguistical boundaries and are… not too friendly:

  • Скрытый процесс запрашивает сетевой доступ
  • Hidden Process Requests Network Access
  • Ein versteckter Prozess verlangt Netzwerkzugriff.
  • Un proceso oculto solicita acceso a la red
  • Un processus cache requiert une connexion reseau.
  • Внимание: некоторые компоненты изменились
  • Warning: Components Have Changed
  • Warnung: Einige Komponenten wurden verandert.
  • Advertencia: Los componentes han cambiado
  • Avertissement : Les composants ont change
  • Menedżer Zadań Windows
  • Создать правило для
  • Create rule for
  • Regel fur
  • Crear regla para
  • Creer une regle pour
  • 瑞星杀毒软件
  • 登录信息
  • 文件保护
  • 월드 오브 워크래프트
  • 삼국지
  • 하이로우2

Enter Sandbox – part 6: The Nullsoft hypothesis and other installers’ conundrums

June 26, 2015 in Batch Analysis, Clustering, Malware Analysis, Sandboxing

Monitoring system services, Native APIs, Windows APIs and COM is a good start, but the monitoring capabilities can be always extended. In this post I will focus on one particular category of monitoring which I believe is often overlooked.

I am talking about installers and their plug-ins – with a main focus on the Nullsoft Installer (although I will talk about various installers in general).

I wrote about, or at least mentioned installers a couple of times before:

The reason why monitoring installers can be interesting is that it may provide an extra insight in the working of a binary (mind you, my focus in this series is on manual in-depth analysis more than automated analysis /although the latter could still benefit from discussed topics too/).

And why is that interesting?

  • There are classes of malware that rely on installers as a main way of infection
  • There are plug-ins (DLLs) dropped and loaded by installers that do stuff that may explain why certain things work/don’t work
  • Some plug-ins offer decoding/decompression/decryption/anti-sandboxing capabilities – intercepting the function calls responsible for this functionality can sched some more light on the internal working of the malware (and help to write behavioral rules)
  • Some installers use encryption and passwords can be only intercepted by analyzing the internals of the installer/plug-ins
  • Some plug-ins use novelty techniques to do stuff – they leverage COM, .NET, WMI to retrieve information about the system, download payloads, etc. – some of these cannot be intercepted on the Native/Windows API level and even if they are, the context is lost
  • etc.

Before we talk about the dynamic analysis of installers let’s look at a typical installer first.

Static analysis of installers

A typical installer contains a stub followed by a compressed/encrypted data blob. The stub is a legitimate program written by whoever designed the installer. As such, its detection cannot be used as a reliable mechanism to detect the malware using the installer since the signature (at least, not w/o looking at the appended data) would ‘hit’ many legitimate applications sharing the very same stub. In other words, the actual malware is never ‘seen’ as it is hidden in the package handled by ‘the always-clean’ stub. Static analysis of such installer files is hard, because it requires someone to write an unpacker – one that is dedicated to that particular installer (or, more specifically – its particular version), test it and then it may hopefully manage to handle a class of installers.

The problem of course is that:

  • there are lots of different installers
  • there are lots of very similar installers, but created either as a result of evolution (subsequent versions, localized versions, ANSI/Unicode, etc.), or private/customized versions that are spin-offs of legitimate repos available online (these f.ex. wipe out markers and change bits here and there to make the unpacking impossible w/o manual effort)

The net result is that it’s hard for the unpacker to handle it all and static analysis may simply not work (again, unless someone uses a dedicated naive algorithmic detection that could work f.ex. like this: detect known stub, check size of appended data, if matches the range, check some specific values in the appended data region, then ‘detect'; of course, such detection is useless since it is not generic as it only matches 1-1. i.e. each file requires a dedicated signature).

Detection of dropped files makes it a bit easier for the AV that can pick them up during run-time, but there is really nothing that could prevent malware author from keeping the main payload inside the actual installer and making it simply run persistently on the system. Having only the installer to look at may be then a relatively difficult target for manual analysis.

Luckily, the RCE community has addressed unpacking of many popular installers and there are plenty of tools that help with a task of a decompilation; including, but not limited to:

  • 7Z decompiles many files, including SFX files; earlier versions of 7z decompile even some versions of the Nullsoft installers
  • InnoSetup installers can be decompiled by Inno Setup Unpacker
  • AutoIt executables (not really installers, but kinda similar) can be unpacked with Exe2Aut
  • There are many decompilers for less popular installers (it’s easy to find them online) – there is also a dedicated project that focuses on decompiling all the possible goodness called Universal Extractor – at this stage project seems to be stalled, but it’s still a very good tool for many installers
  • Many installers can be decompiled by the Plugins for Total Commander: InstallExplorer 0.9.2, InstallExporer Port
  • RARSFX can be decompiled using winrar/rar
  • etc.

There are also ways to extract some data from files f.ex.:

  • 7ZSFX files
    • <installer filename>  -sfxconfig foo – ‘sfxconfig’ is a command that allows to extract comments from the sfx file and save it to a ‘foo’ file – an example of an extracted config file that instructs SFX to execute malware.exe file after the installation is shown below (GUIMode = “2” – hides the GUI/silent mode/, “hidcon:” prefix is used to hide the console window)
  • RAR SFX files
    • rar cw foo – ‘cw’ is a command that allows to extract comments from the .rar archive (including sfx) and save it to a ‘foo’ file – an example of a SFX file that instructs SFX to execute malware.exe file after the installation is shown below
      ;The comment below contains SFX script commands

So, static analysis are often possible to certain extent and all the available tools can make it relatively painless. I think it would definitely interesting to see some of these commands and tools deployed by commercial sandboxes as well – the reporting capabilities would increase a lot.

Dynamic analysis of installers

Coming back to the dynamic analysis of installers – you may still wonder why would we even go this path if there are so many tools and tricks available on the static level.

Here is why – the example below shows some logs from my PoC monitor for some of the Nullsoft installer Plug-Ins:

Call::in (*(i,i,i,i)i.r1)
Call::in (USER32::GetWindowRect(ir2,ir1))
Call::in (USER32::MapWindowPoints(i0,ir0,ir1,i1))
Call::in (*1545016(i.r6,i.r7))
Call::in (USER32::GetClientRect(ir2,ir1))
Call::in (*1545016(i,i,i.r8,i.r9))
Call::in (*1545016(i,i,i.r3,i.r4))
Call::in (USER32::SetWindowPos(ir2,i,i,i,ir3,ir4,i6))
Call::in (USER32::CreateWindowEx(i0,t "Button",t "Make JeezBar my default search engine",i 0x40000000|0x10000000|0x04000000|0x00010000|0x00000000|0x00000C00|0x00000003|0x00002000,ir6,ir7,ir8,ir9,ir0,i666,i0,i0)i.r2)
Call::in (USER32::CreateWindowEx(i0,t "Button",t "Make JeezBar my home page",i 0x40000000|0x10000000|0x04000000|0x00010000|0x00000000|0x00000C00|0x00000003|0x00002000,ir6,ir7,ir8,ir9,ir0,i667,i0,i0)i.r3)
Call::in (USER32::CreateWindowEx(i0,t "Button",t "Restart IE (if running)",i 0x40000000|0x10000000|0x04000000|0x00010000|0x00000000|0x00000C00|0x00000003|0x00002000,ir6,ir7,ir8,ir9,ir0,i668,i0,i0)i.r4)
Another example:
Call::in (kernel32::GetTickCount()i .r0)
get (szURL=hxxp://
Call::in (kernel32::CreateMutexA(i 0, i 0, t "thlp_mutex") ?e)
Call::out=SendRequest Error
get (szURL=hxxp://

As you can see, you can instantly discover the internal working of the Nullsoft Script used by the installer. None of these can be seen using standard API monitoring. If you do offline analysis, intercepting some of these calls may give you clues that otherwise would be very difficult to obtain (e.g. URLs, sequence of internal instructions, possible log messages, etc.).

Now, how would you go about monitoring these calls?

Since I mentioned I am going to talk about Nullsoft Installer I will focus only on this particular installer here.

At this moment of time, there are at least 430+ versions of Nullsoft Installer in my repository. These come in various forms and flavors – lots of various versions, localized, etc. there is probably more than I counted, since many of the custom ones are modified in a way that makes it harder to distinguish them in a generic fashion. In any case, it’s quite a variety.

What is common about all these varieties is that they share plug-ins; there are a couple of plug-ins that are very popular f.ex. system, or inetc, execcmd, nsprocess, etc., but this is just a tip of the iceberg…

After running some scripts and hacking things around I counted over 5K different Nullsoft plugins and it was quite a clumsy work, so I bet there is at least twice as much :)

Where do we take it from here? Pick up the most common ones, recognize them when they are loaded, hook their functions and … log them.

Just in case it was not obvious yet – note that despite sharing the same name you may find out that many of them are different between each other – some process ANSI, some Unicode. Some implement additional functionality and all they share is a name and… a list of exported functions.

In any case, for the most popular ones it may add an extra layer of information and help in creating behavioral rules as well as getting to understand the inner workings of the samples.