Appended data — goodware

September 7, 2019 in Batch Analysis, Clustering, File Formats ZOO

When you take a look at large corpora of appended data — the data that is a part of many PE files, but is not loaded as a part of PE image loading into memory (when a program starts) — patterns emerge.

For malware, this usually means an abuse of a popular installer.

For goodware, it’s a business as usual.

Using the state machine script I discussed in my other post today, I extracted 4 top hexadecimal values from the appended data of many goodware installers.

There are no surprises there — many of appended data blobs are typically in a format utilized by popular and ‘genuine’ installer packages (stub+appended data):

 181472 00 00 00 00 
 131876 4D 53 43 46 - CAB file
  36369 2E 66 69 6C - .file
  36359 7A 6C 62 1A - Inno Setup
  31960 13 00 00 00 
  27981 3B 21 40 49 - 7z SFX
  24883 50 4B 03 04 - Zip
  21721 40 55 41 46 - AMI Flash Utility
  13896 01 00 00 00 
   9489 A3 61 4A 6A 
   9470 5C 73 65 6C -  \self\bin\x86\msvcp60.pdb. 
   8021 52 61 72 21 - Rar!
   7077 0E 00 00 00 
   6855 5F 45 4E 5F - _EN_CODE.BIN

There is an appended that is a CAB, ZIP, RAR file, as well as some proprietary appended data file formats as well.

How can we utilize it from a detection perspective?

Some of them that are not popular among malware samples could become exclusions.

Outliers are a perfect test bed for any PE parser testing. Yes… Does your parser parse every PE file structures properly? While analyzing data for this blog post I have spotted many badly parsed PE files. This is quite a slap in my face. My parser has grown organically over many years and I was quite confident that it ‘handles’ many outliers. I know now that I have to improve it. A humble lesson for any sample collector really…

Finally, knowing what types of installers are being used by a goodware, you can use it as a hint on how to craft your red team tools not to stand out. It may sound silly, but if ‘next gen’/AI/ML algos really exist and they train on a crazily large corpora of samples… chances are that they will learn to ignore many of these popular file setups…

Share this :)

Comments are closed.