You are browsing the archive for File Formats ZOO.

Going BAT…mode crazy

March 12, 2020 in Anti-*, Anti-Forensics, Batch Analysis, File Formats ZOO, Random ideas

What will the following bat file print? Foo, or Bar?

@echo off

 mode con cp select=65000 > nul
 set jump=+ACQ-
 mode con cp select=437 > nul
 goto %jump%

:+ACQ-
 echo Foo
 goto :eof

:$
 echo Bar
 goto :eof

Here’s the answer:

Batch files can be saved as text files using different encodings, including UTF7, and UTF8 as well as MBCS/DBCS characters sets.

One can therefore enforce encoding and change it not only outside of a batch file, but also on the fly, as is the case in the example above. As a result, the part of the code that executes after first ‘mode’ is encoded in UTF7 (‘+ACQ-‘ is an encoded ‘$’ sign), and the second is OEM-US English.

The below example replaces UTF7 in the above example with Traditional Chinese:

@echo off

 mode con cp select=950 > nul
 set jump=§A¦n
 mode con cp select=65001 > nul
 goto %jump%

:§A¦n
 echo Foo
 goto :eof

:你好
 echo Bar
 goto :eof

If you look at this code using 950 character set (big5) you will see this:

@echo off

 mode con cp select=950 > nul
 set jump=你好
 mode con cp select=65001 > nul
 goto %jump%

:你好
 echo Foo
 goto :eof

:雿末
 echo Bar
 goto :eof

and if you choose to preview as UTF8:

@echo off

 mode con cp select=950 > nul
 set jump=§A¦n
 mode con cp select=65001 > nul
 goto %jump%

:§A¦n
 echo Foo
 goto :eof

:你好
 echo Bar
 goto :eof

Misleading, isn’t it?

When you run this version of script you will see an error from the interpreter – this is a result of it interpreting superfluous UTF8 prefixes that seem to be appearing out of nowhere within the interpreter. Perhaps further study of cmd.exe internals can help to eliminate this quirk. Still, the jump goes to the proper label & errors can be always hidden with standard error redirection:

Appended data — goodware

September 7, 2019 in Batch Analysis, Clustering, File Formats ZOO

When you take a look at large corpora of appended data — the data that is a part of many PE files, but is not loaded as a part of PE image loading into memory (when a program starts) — patterns emerge.

For malware, this usually means an abuse of a popular installer.

For goodware, it’s a business as usual.

Using the state machine script I discussed in my other post today, I extracted 4 top hexadecimal values from the appended data of many goodware installers.

There are no surprises there — many of appended data blobs are typically in a format utilized by popular and ‘genuine’ installer packages (stub+appended data):

 181472 00 00 00 00 
 131876 4D 53 43 46 - CAB file
  36369 2E 66 69 6C - .file
  36359 7A 6C 62 1A - Inno Setup
  31960 13 00 00 00 
  27981 3B 21 40 49 - 7z SFX
  24883 50 4B 03 04 - Zip
  21721 40 55 41 46 - AMI Flash Utility
  13896 01 00 00 00 
   9489 A3 61 4A 6A 
   9470 5C 73 65 6C -  \self\bin\x86\msvcp60.pdb. 
   8021 52 61 72 21 - Rar!
   7077 0E 00 00 00 
   6855 5F 45 4E 5F - _EN_CODE.BIN

There is an appended that is a CAB, ZIP, RAR file, as well as some proprietary appended data file formats as well.

How can we utilize it from a detection perspective?

Some of them that are not popular among malware samples could become exclusions.

Outliers are a perfect test bed for any PE parser testing. Yes… Does your parser parse every PE file structures properly? While analyzing data for this blog post I have spotted many badly parsed PE files. This is quite a slap in my face. My parser has grown organically over many years and I was quite confident that it ‘handles’ many outliers. I know now that I have to improve it. A humble lesson for any sample collector really…

Finally, knowing what types of installers are being used by a goodware, you can use it as a hint on how to craft your red team tools not to stand out. It may sound silly, but if ‘next gen’/AI/ML algos really exist and they train on a crazily large corpora of samples… chances are that they will learn to ignore many of these popular file setups…