hstrings – when all strings are attached…

TL;DR;

a new strings tool that attempts to extract localized strings e.g. French, Chinese from an input file; see example below

Intro

Traditional strings utilities are usually limited to ANSI/Unicode-LE/Unicode-BE strings. This is understandable as these are the most prevalent type of strings that we come across in our daily work.  However, many files exist that contain more strings – these we usually miss as they contain accented letters and these break the typical string extraction algorithms. On top of that there are a lot various character encodings out there that make it non-trivial to pick up right bytes in a regular expression or a state machine. One can have accented letters saved as Unicode-LE, Unicode-BE, UTF8, or using one of many legacy encodings e.g. Windows Code Pages or IBM EBCDIC encodings.

For quite some time I had in mind an idea to write a smarter strings extraction program that would take this localization/encoding mess into account so even before I released RUStrings I had been already thinking to write something more generic. In other words, I wanted to write a tool that can extract strings from a file in any well-known encoding and language possible.

As usual – I didn’t know what trouble I am getting myself into when I began :).

As mentioned earlier, there are many encodings used by various platforms and the same string of bytes can be… a random garbage… or it can be  representing a string of characters encoded in one of at least 150 encodings possible including not only legacy encodings, but also Unicode. And not Unicode seen as a subset of characters belonging to ASCII set interleaved by zeros  (‘simplified Unicode’ that string extraction tools rely on), but Unicode that includes blocks dedicated to specific languages and letters e.g. Chinese, Cyrillic, Hangul, etc.

The tool I present below attempts to:

  • read an input file,
  • walk through the file content
  • apply heuristics and find characters encoded as:
    • bytes (ANSI and other legacy character sets)
    • words (Unicode LE, Unicode BE, and DBCS)
    • byte sequences (utf-8, utf-7, MBCS – multibyte encodings e.g. iso-2022-jp (Japanese) , GB18030 Simplified Chinese etc.)
  • it then normalizes these code points to Unicode LE
  • and appends the strings to an output file for a specific encoding

At this stage program is in alpha stage as I am still not sure how to present the output properly. Currently the program generates a lot of output files. Way too many. But it is not trivial to make it simpler.

From a data processing perspective it is actually quite a complex problem – since bytes can be interpreted in many ways, the program needs to show all of all the possible strings extracted from a file. The same string of bytes can be easily interpreted as some legacy ANSI code page (actually, simultaneously almost all of them), or as Chinese multibyte encoding – it then needs to normalize the output to unicode, so we have multiple unicode streams coming out of multiple decoders and in the same location of the file. My detection algorithm relies on state machine-like heuristics and it outputs data as it goes through the data. Since the various encoding heuristics are applied at once (one pass through a file), outputting data to a file may cause race conditions and streams from various decoders can start interleaving – leading to a mess. So, currently the output is in different files. I have a few ideas on how to solve, but each has a trade off associated with it, so stay tuned 🙂

Okay, enough babbling and boring theory – let’s look at some example.

EXAMPLE

First, we need to create a a few text sample files that contain some random text in various languages encoded in many different encodings.

I generated a few non-sensical lorem-ipsum texts by Lorem Ipsum Generator.

Russian

Нам аутым убяквюэ нолюёжжэ ад. Нам граэкы компльыктётюр нэ. Квуй видырэр ёнэрмйщ ку, прё ат фиэрэнт элььэефэнд эррорибуз. Ан нам фэюгаят юлламкорпэр интылльэгэбат. Пэр декам квюаэчтио эа, эним витаэ июварыт вэл экз, эа емпэтюсъ элыктрам шэа. Ед съюммо ыльигэнди мэль, ыам эи кхоро кэтэро зальютатуж, одео нюмквуам мэнтётюм эа квуй.

Chinese

主谷三間機望飼営電時始能快本面一界。約握企曜回金忙出行場説必確天下員週。連芸止嘩健集人説火忘冠率庭泉。田位国以供地紹臣同旅百出済理強波。球告続況時心断主別重並行県邦不康。記悪暮投氏性善治地長中消。小作解共供小田民覧花伝聞団点。止都要空性難改大境新真権軽降真細登皇。読道決集房休講員軟渡慎無告書。社風理載当宿竹金来簡月教。

Greek

Ιδ φιμ ιλλυδ αλικυαμ συσιπιθ, ετ ηαβεο σανστυς κυι, θεμπορ λυπταθυμ σομπρεχενσαμ μει αν. Υθροκυε νολυισε νες ετ, αδχυς οφφισιις ινφιδυντ αδ σεα. Συ νες λιβρις θιμεαμ. Φιξ μαζιμ λυπταθυμ δελισαθισιμι υθ. Περ υθ πωσε μυνερε.

Luxembourgish

As Fläiß ménger Stieren dat. An och sinn Stret gewalteg, wär am gutt d’Land hinnen, wäit eraus ménger si dee. Feld löschteg mä gei. Fu sou deser Riesen, Blummen löschteg hun jo.

 I then saved these files with different encodings:

  • Russian: 1251, koi8-R, Unicode-BE, Unicode-LE, UTF8
  • Chinese: utf8, GB2312, GB18030
  • Greek: Unicode-BE, 1253
  • Luxembourgish: 1252, Unicode-LE

Once done, I combined all of the files into one large file – now the sample file contains multiple texts in multiple different languages saved in multiple different character encodings:

Running htrings over the file produces multiple output files:

Yes, it’s quite a lot and reviewing them all is atm an overkill; I have already mentioned that I am still thinking how to improve the presentation layer 🙂

The rule of a thumb is to start with Windows ANSI code pages, UTF8, Unicode-LE (ULE*) and Unicode-BE (UBE*) and of course cheat – we can go ahead and look at the files associated with the encodings we used in the example above i.e. Russian, Greek, etc. – after all, it’s just an example :):

Previewing the result files gives us the following:

  • h_GB18030,GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)

  • h_windows-1253,ANSI Greek; Greek (Windows)

  • h_windows-1251,ANSI Cyrillic; Cyrillic (Windows)

  • h_windows-1252,ANSI Latin 1; Western European (Windows)

So, it would seem that it works…

 

I will be releasing the first version of hstrings soon.

Thanks for reading!

Zeus trivia

Update

After another chat (with @push_pnx, Thanks!), one more clarification – it appears to be a sample from a Citadel family – a spinoff from Zeus src code that is developed further by most likely a different programming group.

Interestingly, the distinction between families is not easy as ‘Brian Krebs’ string is often associated with Zeus/Zbot. VirusTotal scan of the sample is associating it with these two as well. Go figure 🙂

Update

After I posted this entry Twitter chat with Malware Crusaders ‏@MalwareMustDie (Thanks!) allowed me to fill-in some blanks  + I also did a bit more code analysis myself, so entry below is updated with more details.

Old post (with updates)

Looking at one of recent Zeus samples I noticed the following:

  • lots of strings decrypted during runtime – see below
  • zeus accepts command line arguments (this has been highlighted previously by Karthik Selvaraj in his 2010 article  A Brief Look at Zeus/Zbot 2.0)

    • -n – prevents dropper’s self-deletion; this is achieved by not executing the temporary batch file with the following content:

    • -z – shows messagebox with a familiar info on Brian Krebs – see screenshot above

    • -v – starts VNC server
    • -f – as per Symantec, it alters Registry operations (I am not sure how yet); from the code I see that it introduces a call to Sleep function before a call to hooked GetFileAttributesExW API which is executed with the magic values normally used by a bot builder to communicate with a client

 

The original Zeus source code refers to the following command line options:

 

  • -i – provide information about the bot – this option has been changed to -z in a newer version
  • -n – don’t remove the dropper
  • -f – force update of a client disregarding the bot versions (the delay has been added in a newer version)
  • -v – run as VNC

As it seems, sometimes it’s easier to just read the source code 😉

Strings decrypted during runtime (good for memory searches – notice info stealing stuff):

  • “Module: %u\r\nType: %s\r\nTitle: %s\r\nInfo: %s\r\n”
  • “ERROR”
  • “FAILURE”
  • “SUCCESS”
  • “UNEXPECTED”
  • “UNKNOWN”
  • “rurl”
  • “surl”
  • “furl”
  • “uid”
  • “mask”
  • “post”
  • “extensions”
  • “rules”
  • “patterns”
  • “%tokenspy%”
  • “url”
  • “buid”
  • “ruid”
  • “puid”
  • “session”
  • “data”
  • “get_status”
  • “status”
  • “status_cache_time”
  • “Can’t compile tokenspy rules.”
  • “fileName=[%S], fileSize=[%u], fileCRC32=[0x%08X].”
  • “set_url”
  • “data_before\r\n”
  • “data_inject\r\n”
  • “data_after\r\n”
  • “data_end\r\n”
  • “%webinject%”
  • “Can’t compile webinjects.”
  • “fileName=[%S], fileSize=[%u], fileCRC32=[0x%08X], processedInjects=[%u].”
  • “Webinjects has been compiled.”
  • “result=[%u], fileName=[%S], fileSize=[%u], fileCRC32=[0x%08X], processedInjects=[%u].”
  • “*vmware*”
  • “*sandbox*”
  • “*virtualbox*”
  • “*geswall*”
  • “*bufferzone*”
  • “*safespace*”
  • “*.ru”
  • “*.con.ua”
  • “*.by”
  • “*.kz”
  • “cmd.exe”
  • “powershell.exe”
  • “\r\nexit\r\n”
  • “\r\nprompt $Q$Q$Q$Q$Q$Q$Q$Q$Q$Q[ $P ]$G\r\n”
  • “screenshots\\%s\\%04x_%08x.jpg”
  • “unknown”
  • “image/jpeg”
  • “Software\\Microsoft\\Windows\\Currentversion\\Run”
  • “SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\ProfileList\\%s”
  • “ProfileImagePath”
  • “unknown\\unknown”
  • “:d\r\nrd /S /Q \”%s\”\r\nrd /S /Q \”%s\”\r\nrd /S /Q \”%s\”\r\nif exist \”%s\” goto d\r\nif exist \”%s\” goto d\r\nif exist \”%s\” goto d”
  • “videos\\%S_%02u_%02u_%02u_(%02u-%02u).webm”
  • “grabbed\\%S_%02u_%02u_%02u.txt”
  • “Grabbed data from: %s\n\n%S”
  • “%s%s\nUser-Agent: %S\nCookie: %S\nAccept-Language: %S\nAccept-Encoding: %S\nScreen(w:h): %u:%u\nReferer: %S\nUser input: %s\n%sPOST data:\n\n%S”
  • “*EMPTY*”
  • “*UNKNOWN*”
  • ” *BLOCKED*”
  • “Content-Type: %s\r\n”
  • “ZCID: %S\r\n”
  • “application/x-www-form-urlencoded”
  • “HTTP authentication: username=\”%s\”, password=\”%s\”\n”
  • “HTTP authentication (encoded): %S\n”
  • “%s://%s:%s@%s/”
  • “ftp”
  • “pop3”
  • “anonymous”
  • “Software\\Microsoft\\Internet Explorer\\Main”
  • “Start Page”
  • “Software\\Microsoft\\Internet Explorer\\PhishingFilter”
  • “Enabled”
  • “EnabledV8”
  • “Software\\Microsoft\\Internet Explorer\\Privacy”
  • “CleanCookies”
  • “Software\\Microsoft\\Windows\\CurrentVersion\\Internet Settings\\Zones\\%u”
  • “1406”
  • “1609”
  • “Accept-Encoding: identity\r\n”
  • “TE:\r\n”
  • “If-Modified-Since:\r\n”
  • “\nPath: %s\n”
  • “%s=%s\n”
  • “*@*.txt”
  • “Low”
  • “Wininet(Internet Explorer) cookies:\n%S”
  • “Empty”
  • “*.sol”
  • “Mozilla\\Firefox”
  • “user.js”
  • “profiles.ini”
  • “Profile%u”
  • “IsRelative”
  • “Path”
  • “user_pref(\”network.cookie.cookieBehavior\”, 0);\r\nuser_pref(\”privacy.clearOnShutdown.cookies\”, false);\r\nuser_pref(\”security.warn_viewing_mixed\”, false);\r\nuser_pref(\”security.warn_viewing_mixed.show_once\”, false);\r\nuser_pref(
  • “user_pref(\”browser.startup.homepage\”, \”%s\”);\r\nuser_pref(\”browser.startup.page\”, 1);\r\n”
  • “Mozila(Firefox) cookies:\n\n%S”
  • “Empty”
  • “Macromedia\\Flash Player”
  • “flashplayer.cab”
  • “*.sol”
  • “Windows Address Book”
  • “SOFTWARE\\Microsoft\\WAB\\DLLPath”
  • “WABOpen”
  • “Windows Contacts”
  • “A8000A”
  • “1.0”
  • “EmailAddressCollection/EmailAddress[%u]/Address”
  • “Windows Mail Recipients”
  • “Outlook Express Recipients”
  • “Outlook Express”
  • “account{*}.oeaccount”
  • “Software\\Microsoft\\Windows Mail”
  • “Software\\Microsoft\\Windows Live Mail”
  • “Store Root”
  • “Salt”
  • “0x%s”
  • “Windows Mail”
  • “Windows Live Mail”
  • “MessageAccount”
  • “Account_Name”
  • “SMTP_Email_Address”
  • “%sAccount name: %s\nE-mail: %s\n”
  • “%s:\n\tServer: %s:%u%s\n\tUsername: %s\n\tPassword: %s\n”
  • “%s_Server”
  • “%s_User_Name”
  • “%s_Password2”
  • “%s_Port”
  • “%s_Secure_Connection”
  • “SMTP”
  • “POP3”
  • “IMAP”
  • ” (SSL)”
  • “ftp://%s:%s@%s:%u\n”
  • “ftp://%s:%s@%s\n”
  • “ftp://%S:%S@%S:%u\n”
  • “yA36zA48dEhfrvghGRg57h5UlDv3”
  • “sites.dat”
  • “quick.dat”
  • “history.dat”
  • “IP”
  • “port”
  • “user”
  • “pass”
  • “SOFTWARE\\FlashFXP\\3”
  • “datafolder”
  • “*flashfxp*”
  • “FlashFXP”
  • “wcx_ftp.ini”
  • “connections”
  • “default”
  • “host”
  • “username”
  • “password”
  • “SOFTWARE\\Ghisler\\Total Commander”
  • “ftpininame”
  • “installdir”
  • “*totalcmd*”
  • “*total*commander*”
  • “*ghisler*”
  • “Total Commander”
  • “ws_ftp.ini”
  • “_config_”
  • “HOST”
  • “PORT”
  • “UID”
  • “PWD”
  • “SOFTWARE\\ipswitch\\ws_ftp”
  • “datadir”
  • “*ipswitch*”
  • “WS_FTP”
  • “*.xml”
  • “/*/*/Server”
  • “Host”
  • “Port”
  • “User”
  • “Pass”
  • “*filezilla*”
  • “FileZilla”
  • “SOFTWARE\\Far\\Plugins\\ftp\\hosts”
  • “SOFTWARE\\Far2\\Plugins\\ftp\\hosts”
  • “hostname”
  • “username”
  • “user”
  • “password”
  • “FAR manager”
  • “SOFTWARE\\martin prikryl\\winscp 2\\sessions”
  • “hostname”
  • “portnumber”
  • “username”
  • “password”
  • “WinSCP”
  • “ftplist.txt”
  • “;server=”
  • “;port=”
  • “;user=”
  • “;password=”
  • “ftp*commander*”
  • “FTP Commander”
  • “SOFTWARE\\ftpware\\coreftp\\sites”
  • “host”
  • “port”
  • “user”
  • “pw”
  • “CoreFTP”
  • “*.xml”
  • “FavoriteItem”
  • “Host”
  • “Port”
  • “User”
  • “Password”
  • “SOFTWARE\\smartftp\\client 2.0\\settings\\general\\favorites”
  • “personal favorites”
  • “SOFTWARE\\smartftp\\client 2.0\\settings\\backup”
  • “folder”
  • “SmartFTP”
  • “userinit.exe”
  • “pass”
  • “certs\\%s\\%s_%02u_%02u_%04u.pfx”
  • “grabbed”
  • “os_shutdown”
  • “os_reboot”
  • “url_open”
  • “bot_uninstall”
  • “bot_update”
  • “bot_transfer”
  • “dns_filter_add”
  • “dns_filter_remove”
  • “bot_bc_add”
  • “bot_bc_remove”
  • “bot_httpinject_disable”
  • “bot_httpinject_enable”
  • “fs_path_get”
  • “fs_search_add”
  • “fs_search_remove”
  • “user_destroy”
  • “user_logoff”
  • “user_execute”
  • “user_cookies_get”
  • “user_cookies_remove”
  • “user_certs_get”
  • “user_certs_remove”
  • “user_url_block”
  • “user_url_unblock”
  • “user_homepage_set”
  • “user_ftpclients_get”
  • “user_emailclients_get”
  • “user_flashplayer_get”
  • “user_flashplayer_remove”
  • “module_execute_enable”
  • “module_execute_disable”
  • “module_download_enable”
  • “module_download_disable”
  • “info_get_software”
  • “info_get_antivirus”
  • “info_get_firewall”
  • “search_file”
  • “upload_file”
  • “download_file”
  • “ddos_start”
  • “ddos_stop”
  • “webinjects_update”
  • “tokenspy_update”
  • “tokenspy_disable”
  • “close_browsers”
  • “Not enough memory.”
  • “Script already executed.”
  • “Failed to load local configuration.”
  • “Failed to save local configuration.”
  • “Failed to execute command at line %u.”
  • “Unknown command at line %u.”
  • “OK.”
  • “firefox.exe”
  • “*Mozilla*”
  • “iexplore.exe”
  • “*Microsoft*”
  • “chrome.exe”
  • “*Google*”
  • “Winsta0”
  • “default”
  • “dwm.exe”
  • “taskhost.exe”
  • “taskeng.exe”
  • “wscntfy.exe”
  • “ctfmon.exe”
  • “rdpclip.exe”
  • “explorer.exe”
  • “V\t%08X\r\nC\t%08X\r\nPS\t%08X”
  • “BOT NOT CRYPTED!”
  • “SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion”
  • “InstallDate”
  • “DigitalProductId”
  • “%s_%08X%08X”
  • “fatal_error”
  • “unknown”
  • “wtsapi32.dll”
  • “WTSEnumerateSessionsW”
  • “WTSFreeMemory”
  • “WTSQueryUserToken”
  • “userenv.dll”
  • “GetDefaultUserProfileDirectoryW”
  • “user32.dll”
  • “MessageBoxW”
  • “ntdll.dll”

The strings are decrypted in various places in a whole code by a procedure that takes 2 arguments: ID of the string + offset to a destination buffer. In case you are wondering how I decrypted all of them in one go, I did a quick and dirty patch to a call that calls a decryption routine. The patch is easy to write in OllyDbg and to preserve info on all decrypted strings, I put a conditional breakpoint without pausing with an option to log all decrypted strings to the Olly Log Window. I then run this piece of code incrementing ID in each iteration until I got an access violation: simple, but effective trick w/o writing dedicated decrypter (a.k.a. lazy reversing :)).

The original source code of ZeuS 2.0.8.9 version contains most of these encrypted strings in a source\client\cryptedstrings.txt file; a diff between the list pasted above and the list from the ZeuS 2.0.8.9 allows to generate a list of new strings  – indicative of a new functionality

  • anti-vm
  • more info stealing capabilities
  • modification of firefox privacy settings

The new added strings are:

  • Module: %u\r\nType: %s\r\nTitle: %s\r\nInfo: %s\r\n
  • ERROR
  • FAILURE
  • SUCCESS
  • UNEXPECTED
  • rurl
  • surl
  • furl
  • mask
  • post
  • extensions
  • rules
  • patterns
  • %tokenspy%
  • url
  • buid
  • ruid
  • puid
  • session
  • data
  • get_status
  • status
  • status_cache_time
  • Can’t compile tokenspy rules.
  • fileName=[%S], fileSize=[%u], fileCRC32=[0x%08X].
  • set_url
  • data_before\r\n
  • data_inject\r\n
  • data_after\r\n
  • data_end\r\n
  • %webinject%
  • Can’t compile webinjects.
  • fileName=[%S], fileSize=[%u], fileCRC32=[0x%08X], processedInjects=[%u].
  • Webinjects has been compiled.
  • result=[%u], fileName=[%S], fileSize=[%u], fileCRC32=[0x%08X], processedInjects=[%u].
  • *vmware*
  • *sandbox*
  • *virtualbox*
  • *geswall*
  • *bufferzone*
  • *safespace*
  • *.ru
  • *.con.ua
  • *.by
  • *.kz
  • cmd.exe
  • powershell.exe
  • \r\nexit\r\n
  • \r\nprompt $Q$Q$Q$Q$Q$Q$Q$Q$Q$Q[ $P ]$G\r\n
  • :d\r\nrd /S /Q \”%s\”\r\nrd /S /Q \”%s\”\r\nrd /S /Q \”%s\”\r\nif exist \”%s\” goto d\r\nif exist \”%s\” goto d\r\nif exist \”%s\” goto d
  • videos\\%S_%02u_%02u_%02u_(%02u-%02u).webm
  • Grabbed data from: %s\n\n%S
  • %s%s\nUser-Agent: %S\nCookie: %S\nAccept-Language: %S\nAccept-Encoding: %S\nScreen(w:h): %u:%u\nReferer: %S\nUser input: %s\n%sPOST data:\n\n%S
  • ” *BLOCKED*
  • Content-Type: %s\r\n
  • ZCID: %S\r\n
  • application/x-www-form-urlencoded
  • HTTP authentication: username=\%s\””, password=\””%s\””\n”
  • Profile%u
  • user_pref(\”network.cookie.cookieBehavior\”, 0);\r\nuser_pref(\”privacy.clearOnShutdown.cookies\”, false);\r\nuser_pref(\”security.warn_viewing_mixed\”, false);\r\nuser_pref(\”security.warn_viewing_mixed.show_once\”, false);\r\nuser_pref(
  • user_pref(\”browser.startup.homepage\”, \”%s\”);\r\nuser_pref(\”browser.startup.page\”, 1);\r\n
  • Mozila(Firefox) cookies:\n\n%S
  • Outlook Express Recipients
  • %s_Server
  • %s_User_Name
  • %s_Password2
  • %s_Port
  • %s_Secure_Connection